MCP breaks when auth and schema versioning are afterthoughts.

I copied a stdio MCP tutorial into staging on a Thursday. It worked on the first try, which should have scared me more than it did. The server spoke the protocol, the agent called the tools, the demo was clean. What I had actually shipped was a bearer token that was god-mode: any client that held it could call any tool, against any tenant's data, with no log that tied a call back to a human. By Friday afternoon a support agent had used the "read invoice" tool to read a different customer's invoices, and I was on the phone explaining why we had no audit trail. The model did nothing wrong. The seams did.

A cyanotype-style blueprint of an MCP server rack drawn as a labeled machine, with three stacked bays marked AUTH, GATEWAY and TOOLS, routed orthogonal wiring between them, a faint measurement grid and corner registration marks in navy and cyan. — The rack I wish the tutorials drew: an MCP server is three bays, not one. *Auth* at the bottom, a *gateway* in the middle, *tools* on top. The stdio quickstart ships you the top bay and pretends the other two are someone else's problem.

So this is the piece I needed that Thursday. Not a "hello world" server, but the production checklist: what breaks first, the four topologies worth knowing, an auth stack that survives an audit, and the gateway that keeps a growing estate from becoming a liability. I have wired more than fifteen webhook integrations and I am now doing the same job for MCP servers, and the lesson rhymes every time. The protocol is the easy part. The contracts and the governance are the work.

What breaks first, and it is never the model

When an MCP deployment falls over in production, the post-mortem almost never lands on the model. It lands on three seams, in roughly this order. One: token passthrough, where the server reuses whatever credential the client handed it instead of getting its own scoped token, so a leaked or over-broad token becomes total access. Two: missing schema versioning, where a tool definition changes shape under a running agent and every caller silently starts sending malformed arguments. Three: a missing audit triple, where you cannot answer "which user, through which agent, called which tool" after the fact. Notice that none of these are MCP-specific genius. They are the same integration hygiene we have owed every API for twenty years. MCP just makes it trivially easy to skip them, because the quickstart does.

The audit triple deserves a beat of its own, because it is the one that turns a bad day into a bad week. In a classic API you usually have a user identity and a route, two of the three. With agents you gain a middle layer: the agent acted on the user's behalf, and the tool ran on the agent's behalf. If your log only records "the server got a call," you can reconstruct neither intent nor accountability when someone in compliance asks. Log all three, on every call, or accept that your incident reviews will end in a shrug.

Step 1Pick a topology before you pick a transport

The first real decision is not stdio versus HTTP. It is what shape your estate takes, because that determines everything downstream about isolation and auth. Digital Applied's enterprise patterns writeup sorts production deployments into four topologies, and naming yours up front saves you a migration later.

Figure 1 · Deployment topologies

Four shapes an MCP estate actually takes

Pick the topology, then the transport follows. stdio is the leftmost card and it is a dev artifact, full stop. Everything you ship to remote traffic is one of the other three, and each one moves the auth and audit boundary to a different place. Adapted from the enterprise deployment patterns in Digital Applied.

My rule of thumb: most SaaS-shaped products land on row-isolated multi-tenant, and the moment you have more than two or three servers you graduate to a federated gateway. Edge-cached read-only is the underused one. If a tool only reads, caching it at the edge gives you speed and shrinks the blast radius to nothing, because there is nothing to corrupt.

The cost of guessing wrong here is a migration, not a config flag. Row isolation that you bolt on after launch means re-keying every write you already shipped, and a gateway you add late means rewriting auth in every server at once. Spend the thirty minutes to name the topology before the first commit. It is the cheapest decision in this whole piece and the most expensive one to defer.

Step 2 · Authorization

Auth that survives an audit

Here is the part the tutorials skip entirely. The MCP spec revision dated 2025-11-25 does not treat auth as optional for remote servers: it mandates OAuth 2.1 with PKCE using S256 for remote HTTP servers, per Digital Applied's reading of the spec. That is the floor, not the ceiling. A production auth stack has three more pieces stacked on it.

First, publish Protected Resource Metadata (RFC 9728) so clients can discover where to get a token and what scopes exist. Second, use Resource Indicators (RFC 8707) on token requests so a token minted for your server cannot be replayed against a different one. Third, and this is the one teams fight me on, support the Enterprise-Managed Authorization extension so an IdP like Okta or Azure AD sits between client and server and an admin can revoke access from the same console they use for everything else.

Figure 2 · The token path

Client to gateway to server, with a scoped token at each hop

Every arrow is a token with a job. The cyan tokens are scoped and resource-bound: a token good for the gateway is not good for the server, and a token good for one server is useless against another. That single property is the difference between a leaked credential being an incident and being an apocalypse.

It helps to see auth, observability, and governance as named modules rather than a vague "we'll secure it later." This is the platform view I sketch for stakeholders who think MCP is just a library import.

A light-blue corporate platform grid in the style of an enterprise architecture deck, showing a matrix of MCP production modules: an authorization column with OAuth 2.1, PKCE and Protected Resource Metadata cells; an observability column with audit log, tracing and rate-limit cells; and a governance column with policy, schema registry and blast-radius cells, each as a rounded card with a small status pill. — The estate as a module matrix: authorization, observability, governance. Reads like a slide because it should: this is the deck you take to whoever owns risk. Figure adapted from enterprise MCP deployment patterns in Digital Applied.
Source: adapted from Digital Applied, MCP Server Patterns for Enterprise AI Agents (2026).

Why is token passthrough a fireable mistake?

I want to be blunt about the single most common gap, because it is the one that paged me. Token passthrough is when your MCP server takes the credential the client presented and reuses it to call upstream systems. It feels efficient. It is forbidden. Digital Applied states it flatly:

"Token passthrough is explicitly forbidden." (Digital Applied, MCP Server Patterns 2026)

The reason is blast radius. A passed-through token carries the client's full authority, so the instant the server is compromised or simply buggy, the attacker inherits everything that token could do, which in my Friday incident was every tenant's invoices. The fix is mechanical: the server obtains its own scoped credentials and exchanges or upstreams with those. The security writeup at Systems Hardening frames why this gap is so widespread:

"Most production deployments treat auth as 'the bearer token grants everything.' The specific gaps..." (Systems Hardening, MCP Authentication Patterns)

If you take one thing from this section: a bearer token should authorize a specific tool at a specific scope, never the whole server. Per-tool authorization is the cheap insurance the quickstart never sells you.

Once you have more than a couple of servers, the auth and audit logic stops wanting to live in each one. It wants a single front door.

The gateway earns its keep at scale

A gateway is just the federated topology from Figure 1 with teeth. It centralizes the three things you do not want to reimplement per server: audit, rate limits, and policy. WorkOS calls the gateway pattern a 2026 roadmap item for exactly this reason: as estates grow, per-server governance does not scale, and admins want one console. As they put it:

"IT administrators should be able to manage MCP server access from the same identity provider console where they manage everything else." (WorkOS, Everything your team needs to know about MCP in 2026)

The mental model that made this click for me: the gateway sees every call, so it is the only place where "who called what" is a complete record rather than a fragment scattered across servers. Cross-app token exchange patterns like WorkOS Cross App Access ride on the same idea: one mediated boundary, not N ad hoc ones.

A fair counterpoint before you over-build: the fully enterprise-managed auth extensions are still maturing, pre-RFC by WorkOS's own account, so do not wait for a finished standard to ship something safe. The pragmatic move today is an AuthKit or IdP bridge in front of the gateway now, with a clean seam so you can swap in the standardized extension when it lands. Boring, incremental, and it keeps you out of the Friday pager rotation while the spec settles.

Figure 3 · The choke point you want

Every call passes the gateway, so every call is logged

The gateway is a choke point on purpose. Reference architectures like Obot's gateway lean on the same shape: funnel every call through one mediated boundary so audit and rate limiting are complete rather than best-effort. Per-server logging always leaves a gap; the gateway closes it.

Blast radius

Version your tool definitions like an API

The last seam is the quiet one, and it is the one I see skipped even by teams that nailed auth. Tool definitions are an API contract, and they change. When you rename an argument or tighten a type without versioning, every running agent that cached the old schema starts sending calls that are subtly wrong, and because the model will gamely keep trying, you get garbage rather than a clean failure. Treat tool definitions like any other public API: version them, and let callers pin a version.

Pair versioning with per-tool blast-radius limits. A "send email" tool should have a rate ceiling and a scope that cannot touch the billing system, independent of whatever the caller's token theoretically allows. The principle is the same one from retries and backoff: assume the call will misbehave, and bound what it can do when it does. A tool that can fire a thousand times a second because nobody set a ceiling is not a tool, it is an outage waiting for a trigger.

One more habit from years of webhook work: make the failure mode boring. Return a typed error with a version stamp, not a stack trace or a half-written payload, so the calling agent can back off and retry instead of looping on garbage. Versioned schemas and bounded tools are what let a flaky dependency degrade quietly rather than cascade. Here is the production checklist I keep taped to the laptop, drawn as a ladder because each rung depends on the one below it.

Figure 4 · Ship checklist

Five rungs between a tutorial and production

The ladder is the answer to "are we ready?" If any rung is missing, you are running a tutorial in production, not a service. Rungs one through three are auth and audit; four and five are the schema and blast-radius work that almost everyone defers and then regrets.

Do this Monday

Open your live MCP server and check exactly two things. First, does any tool authorize on the bearer token alone, with no per-tool scope? If yes, that is your token-passthrough hole, fix it before anything else. Second, can you produce the user, agent, and tool for a call that happened last week? If you cannot, you have no audit triple, and the next incident will be unexplainable. Both are an afternoon of work and they are the two that page you.

None of this is exotic, and that is the point. It is the boring glue: scoped tokens, a gateway that logs, versioned contracts, bounded tools. The stdio quickstart is genuinely great for local development, so keep using it there. Just do not let it walk into production wearing a tie. For where MCP servers sit inside an agent's package manifest alongside skills and hooks, the breakdown of hooks and skills is worth a look. And if you are still placing MCP in the bigger picture, it's worth taking a look at how the components of an agentic system fit together, where this is the tools and integration layer.

The protocol takes an afternoon. The contracts and the governance take the rest of the quarter, and they are the actual job.

Wire one server properly this week, climb all five rungs, and the second one is a template. That is the whole trick to an estate that survives traffic: do the unglamorous half first, so the demo that dazzles on Thursday does not page you on Friday.

Comments (4)

Join the discussion

Omar SinghAwakened12/9/2025

Schema versioning as an afterthought is how you get a 3am page. We shipped an MCP server, changed a tool output shape without a version bump, and every agent depending on it started failing silently because the model just hallucinated around the missing field. Version your tool contracts like APIs, because that is what they are.

Uma YamamotoAwakened12/9/2025

This. And validate the response server side before the model ever sees it. Half my MCP incidents were the server returning a slightly wrong shape and the LLM gamely pretending it was fine. Fail loud at the boundary, do not let the model paper over it.

Grace KimUnproven12/10/2025

The auth boundary section is the one I would make mandatory reading. In a regulated org the question is never can the agent call this tool, it is which identity is it calling as and who approved that. Most MCP demos run as a god token and that is a non starter the second audit shows up.

Carlos FernandezAwakened12/11/2025

Solid. One open source caution: a lot of the popular MCP servers vendor their auth assumptions in ways you cannot easily override. Read the source before you put one on the blast radius, do not trust the readme.