An agent isn't a chat loop with tools bolted on.

The demo always works. That is the problem. I have watched the same movie at more than one company: a sharp prototype lands in a Friday review, everyone claps, and by week two in front of real traffic it is quietly returning nonsense to a fraction of tenants and nobody can say why. The model did not get dumber over the weekend. What broke was an interface nobody had bothered to specify, between the planner and the thing that runs its steps, or between memory and the model that reads it.

A cyanotype-style isometric blueprint of an agent system drawn as a labeled machine in a control room, with modules labeled MODEL, MEMORY, PLANNER, ORCHESTRATOR, TOOLS and EVAL, plus dimension lines and a faint measurement grid. — The whole system as one schematic: six modules on a measured grid. The labels are the easy part. This piece is about the lines *between* them.

So I have stopped letting teams describe their agent as "an LLM with some tools." That framing hides the failure surface. After enough of these post-mortems I draw the same picture every time, and it has six components, not because six is magic, but because that is where the seams are. The components are the easy part. The contracts between them are the actual engineering, and they are what this piece is about.

Figure 1 · The decomposition

Six components, and the signals that pass between them

Read the arrows, not the boxes. Every solid line is a contract: a shape of data one component promises to hand the next. The dashed gold path is the eval loop, the difference between a system that recovers and one that confidently ships garbage.

One framing note before we walk them. Vendor blueprints (Kore.ai, Exabeam, Akka and friends) tend to draw this as tidy "tiers" with the rough edges sanded off, because they are selling the platform that supposedly handles them for you. The decomposition is genuinely useful and I lean on Kore.ai's blueprint myself. Just remember that everything past the boxes, the multi-tenant isolation, cost ceilings, audit, and the governance that keeps you out of a compliance review, is custom, and it is the part the diagram quietly omits.

Component 1The model is a part, not the system

The reasoning core gets the headlines, so let me get its role out of the way fast: it is an interchangeable part. Problem: teams couple their business logic to a specific model's quirks, its exact JSON habits, its tolerance for a giant system prompt, and then cannot move when a cheaper or better one ships. Constraint: in a regulated, multi-tenant shop you will change models, sometimes mid-quarter, sometimes because procurement said so. Recommendation: treat the model as a node behind a contract, with typed inputs, typed outputs, and evals that do not care which weights are underneath. If swapping the model means rewriting the orchestrator, the model was never really a component. It was the foundation, and you poured it wrong.

Component 2Memory: two stores that get mistaken for one

Here is where I see the most expensive confusion. People say "the agent has memory" as if that is one thing. It is two, with completely different contracts.

Figure 2 · Component 2, split

Short-term state and long-term store are different machines

Short-term is a checkpointer; long-term is a store. Conflate them and you either lose state mid-run or, worse in a multi-tenant system, write one customer's facts into a shared index. The patterns for getting long-term right are a whole post of their own. Treat this as the contract, not the implementation.

Problem: short-term state (what happened in this run) and long-term memory (what is true across runs) want opposite things. One is fast and disposable, the other is durable and governed. Constraint: the moment you are multi-tenant, every long-term write is a potential cross-account leak. Recommendation: a session checkpointer for short-term, a deliberately separate vector or graph store for long-term, and every long-term write scoped to a tenant key and reviewed like the liability it is. The deeper patterns, what to remember, when to forget, how to summarize, I have split into a dedicated piece, because doing them justice here would double the length.

Component 3The planner turns a goal into steps you can audit

The planner is the component that takes a business objective and decomposes it into an ordered, explainable sequence. Kore.ai's blueprint puts it plainly: the planner's job is to break the goal into steps a system can actually execute and a human can actually read back. That second half, explainable, is the part enterprise teams underweight.

Problem: an opaque planner produces a result you cannot defend in an incident review ("why did the agent issue that refund?"). Constraint: in regulated workflows, "the model decided" is not an acceptable answer. Recommendation: make the plan a first-class, inspectable artifact, a list of steps with reasons, logged before execution, not reconstructed after. If you cannot print the plan, you cannot audit the agent, and if you cannot audit it, legal will not let it near a customer.

Component 4The orchestrator is the manager, treat it like one

If the planner decides what, the orchestrator decides who and in what order. This is the component that earns the "agentic" label, and it is the one I spend the most review time on.

"The orchestrator is the manager of an agentic system. It decides which agents should take on a given task, in what order, and how results should be merged into a clean result." (Kore.ai, Agentic architecture blueprint)

That "clean result" is doing a lot of work. Renney and colleagues, writing on production multi-agent systems, formalize three patterns the orchestrator can run, and choosing among them is a real architectural decision, not a default.

Figure 3 · Component 4, three shapes

Supervisor, planner-executor, and swarm. Pick on purpose

Three patterns from the production multi-agent literature. My bias is loud here: start with supervisor. It is the easiest to reason about, log, and explain in an incident review. Swarm is elegant in a paper and a nightmare in an audit. Reach for it only when the work genuinely has no central order.

Problem: teams pick the fanciest orchestration pattern because it is interesting, then cannot trace why the system did what it did. Constraint: token cost and auditability both scale with how distributed the control is. A swarm is expensive and opaque. Recommendation: default to a supervisor, give it hard recursion and termination limits, and only graduate to looser patterns when a concrete requirement forces it. The orchestrator is also where your cost ceiling lives. If nobody owns "max steps per request," nobody owns the bill.

Component 5Tools and sub-agents, the under-invested layer

This is the component everyone treats as plumbing, and it is the one that takes systems down. The Clarion AI team, writing on multi-agent systems in production, put it bluntly, and it matches every post-mortem I have sat through:

"This layer is where most teams under-invest, and where most production failures originate." (Clarion AI, on the tools and memory layer)

Problem: a tool is an external system with its own latency, error modes, rate limits, and auth, but it gets wired in as if it were a pure function. Constraint: tools fail partially, slowly, and at the worst time, and a sub-agent is just a tool that can also be wrong in fluent prose. Recommendation: give every tool a real contract, with typed inputs and outputs, explicit timeouts, and defined failure behavior, and wire the tool layer last, after planner and memory contracts exist. I will go deeper on tool contracts and the MCP interface in companion pieces. For now, the architectural point is that this layer deserves the most engineering and usually gets the least.

Component 6Eval and guardrails are structural, not a final QA step

The sixth component is not a thing you add at the end. It is the ring that wraps the whole loop. Problem: without structural limits, an agent loop can recurse forever, burn your budget, or, the quiet killer, return output that looks perfect and is functionally wrong. Constraint: probabilistic systems do not fail with a stack trace; they fail with confidence. Recommendation: put the controls at the orchestration layer where they can see the whole run, with recursion limits, termination conditions, and functional checks on output rather than vibe checks. Running that ring well is also a skills question. An interesting read for that is the agent programmer's stack.

The real workThe contracts between the components

Now the actual thesis. You can get all six boxes "right" in isolation and still ship the week-two collapse, because the failures live in the seams. Every arrow in Figure 1 is a promise about the shape of data moving between two components, and an unspecified promise is a latent outage.

Figure 4 · The thesis, made concrete

Name the contract on every seam

This is the diagram I actually care about. The boxes are interchangeable; the labeled seams are the system. "Validated result OR typed error" is not a nice-to-have. It is the line that decides whether a flaky tool degrades gracefully or poisons the whole run.

AssemblyBuild the contracts before you wire the tools

Order matters, and most teams build this backwards. They wire the exciting tools first, demo it, and discover the foundations are missing when traffic arrives. Here is the sequence I push for.

Figure 5 · Build order

The sequence that survives week two

Foundations first, the fun part last. If you build top-down, live tools before contracts, you get the demo that dazzles on Friday and pages you on Tuesday. Bottom-up is slower to a first demo and dramatically faster to something you can actually run.

VerifyThe whiteboard test

Here is how I check a design before it gets near a sprint. Draw the six boxes. Then, for every arrow between them, write the contract out loud: what shape of data crosses this line, what happens when it is malformed, and how do I know it worked? If you can answer that for all of them, you have an architecture. If you trail off on even one, and it is usually the tool-to-model return path, that is your week-two collapse, located in advance, for free.

Do this Monday

Take your current agent and label every seam in Figure 1 with its real contract. The first one you cannot name is your next bug. In my experience it is almost always the tool return path ("what does the orchestrator do when the API times out?"), and "the model figures it out" is not an answer.

None of this is exotic. It is the same discipline we apply to any distributed system: name the interfaces, fail loud, test the seams. Agents feel new because the reasoning core is probabilistic, but the architecture around it is sober, boring engineering, and that is exactly why it holds up. Once the boxes exist, maturity is the next variable. To get a handle on that I'd check out the levels of agentic engineering. For which human skills map onto these six components, I'd check out the agent programmer's stack.

Six components is the easy half of the sentence. The contracts between them are the job.

Draw the boxes for your own system this week. Then go name the seams, because the agent that survives production is not the one with the cleverest orchestrator. It is the one whose contracts were written down before the demo.

Comments (5)

Join the discussion

Ibrahim RiveraAwakened4/8/2025

The contracts between components is the part people skip, and its why their demo dies in week two. Most tutorials show the six boxes but never the interfaces between them, so everyone builds the boxes and wires them with hope. Good to see the contracts get top billing here.

Quinn UedaUnproven4/9/2025

That was the whole reason I wrote it this way. The components are easy to name and hard to connect. If the planner and the orchestrator do not agree on what a finished step looks like, you do not have an architecture, you have six things in a trench coat.

Beatriz EzeAwakened4/9/2025

Saving this, the diagram helped a lot. One bit i got stuck on, whats the actual difference between the planner and the orchestrator? In my head they sound like the same thing doing scheduling. Sorry if thats obvious.

Not obvious at all, people ship them merged all the time. Planner decides what should happen and in what order. Orchestrator actually runs it, handles retries, failures, and state between steps. Planner is the strategy, orchestrator is the thing that survives a tool timing out at 2am. When you merge them you usually lose the recovery half.

Carlos FernandezAwakened4/10/2025

Good piece. I will only push on the eval component: if it is not reproducible by someone outside your team, it is not really a component, it is a vibe check you run yourself. Pin the dataset and the seeds or it rots the moment you change models.