JW · Josh Weir
← AI Systems
Spoke · AI Systems

Agent loops in production: where they break and how to catch them

An agent loop is the most powerful and most dangerous architectural pattern in modern AI. The pattern is simple: a model is given a goal, a set of tools, and the autonomy to decide which tool to call next based on what it has learned so far. The model executes a step, observes the result, plans the next step, and continues until it believes the goal is achieved. When it works, it replaces hours of human attention with a few minutes of model reasoning. When it fails, it can fail spectacularly, expensively, and quietly.

This piece is the catalogue of failure modes I have observed across roughly fifty agent-loop workflows in production over the last eighteen months, and the guardrails that catch each one before it becomes a real problem. The guardrails are unglamorous, all of them are simple, and the cost of building them is much lower than the cost of one bad failure that escapes them.

Failure mode one: the unbounded loop

The first and most expensive failure is the loop that does not terminate. The model is convinced it is making progress, it is convinced the goal is one more step away, and it keeps calling tools, accumulating context, and reasoning until either the context window fills up or the human intervenes — which they often do not, because by then the run is several hundred steps deep and the bill is already large.

The guardrail is mechanical: every agent loop has a hard step budget. Twenty steps for routine work, fifty for the most complex tasks we expose to the pattern. When the budget is hit, the loop terminates, the partial state is returned, and a human reviews. The budget is not optional. It is a circuit breaker.

The second guardrail is a wall-clock timeout. Even within the step budget, an individual tool call that takes longer than its expected duration is killed. Slow tools are the second most common cause of runaway runs.

Failure mode two: the silent dead end

The model decides early in the loop that a particular path is correct, the path is wrong, and every subsequent step is reasoning from the wrong premise. Costly, plausible-looking, and often produces an output that passes a casual review and only fails on close inspection.

The guardrails are layered. First, a structured plan up front: the model is asked to produce a plan before executing, the plan is logged, and the actual trajectory is compared against the plan at the end. Second, a critic step at the end that reads the trajectory and the output, and is asked specifically whether the output answers the original question. Third, the most under-used technique in agent loops: force the model to consider the alternative interpretation. Asking explicitly “what is the strongest reason this answer might be wrong?” produces a meaningful reduction in silent dead-ends.

Failure mode three: the tool-use confusion

The model has access to several tools. It is supposed to use the right one for the job. Sometimes it does not. The most common failure pattern is using a search tool for a task that should be a database query, or calling a write tool when a read would have sufficed.

The guardrails are about the tool definitions, not the model. Every tool description is written for the model the way you would write it for a junior engineer. The description includes when to use it, when not to use it, and an example. Where two tools could plausibly answer the same question, the descriptions explicitly disambiguate. The single highest-leverage thing you can do for agent reliability is invest in the quality of the tool descriptions.

The second layer is permissions. Destructive tools — anything that writes, deletes, sends, or pays — are gated behind explicit confirmation. The model can call them, but the confirmation step is enforced outside the model. The agent does not have implicit permission to do irreversible things just because the model decided to call the tool.

Failure mode four: the context-window collapse

Long agent loops accumulate context. Every tool call adds tokens. Every observation adds tokens. The model's reasoning over the trajectory eats more tokens. By the tenth or twentieth step, the context window is filling up, and the model's behaviour starts to degrade — earlier observations get truncated, the plan drifts, the output quality drops.

The guardrails are around context management. We summarise the trajectory periodically and replace the verbose history with the summary. We never include the raw output of a tool that returned a large blob; we extract the relevant fields and discard the rest. We measure the context-window utilisation per step and alert if it crosses eighty percent.

The deeper architectural fix is to design agent loops as short by construction. A twenty-step loop that completes in five thousand tokens of context is reliable. A twenty-step loop that bloats to a hundred thousand tokens is not, no matter how large the window technically is.

Failure mode five: the unrecoverable side effect

The model called a tool that did something irreversible. The downstream consequence is real. The agent cannot now undo it, and neither can the operator without manual intervention.

The guardrails are policy, not technology. Tools are classified as reversible or irreversible. Reversible tools (read, search, classify, summarise) are available to all agent loops by default. Irreversible tools (send, write, delete, transact) are available only to specifically-tagged loops with an explicit human-in-the-loop step. The default for any new tool is irreversible until proven otherwise.

The architectural implication is that the most powerful agent loops are read-mostly. They observe the world, build a recommendation, present it to a human, and the human commits. Loops that genuinely act autonomously are reserved for narrow, well-bounded tasks where the recoverability of any side effect is well-understood.

The takeaway

Agent loops are not magic. They are a particular kind of distributed system, and the failure modes are the failure modes of distributed systems with stochastic components. Step budgets, wall-clock timeouts, structured plans, critic checks, tool-description discipline, permission gating, context management, and a default-to-irreversible policy on side effects — these are the guardrails that make agent loops tractable in production.

The best agent-loop architecture you can ship today is a conservative one. Bounded steps, narrow tools, explicit permissions, layered review. The capability frontier will widen in the next twelve months and you will be able to relax the bounds. The architecture you build now should support that relaxation. It should not require it.

Working on this?

For operators evaluating sovereign-infrastructure architecture for a business of meaningful scale, we run a quarterly cohort of stack-design engagements.

Get in touch

Search terms this article addresses

ai agent failure modes ukproduction ai agent reliabilityagent loop guardrailsai agent step budgethuman in the loop agentagent tool description best practiceai agent context window managementoperator-grade ai agents

Related under AI Systems