Agent teams aren't there yet

Claude Code shipped a Teams feature. Give a group of specialised agents a task and let them tackle it together. An orchestrator coordinates, and sub-agents execute. On paper, this is the future of autonomous software development. In practice, it breaks down fast.

I built a development pipeline with it. An engineer agent implements a feature, a reviewer checks the PR, the engineer fixes comments, and a QA agent runs tests. What actually happens: the orchestrator finishes the implementation, reports back with the PR, and stops. Not because of a bug in my setup. Because its comprehension window is exhausted. The context window still has room. The model just can't use it. The review never starts.

The reason is structural. By the time the orchestrator finishes the creative work, its context is dense with codebase exploration, implementation decisions, and commit history. Even though there is plenty of context space, the model fails to comprehend it all. Even when it tries, the quality degrades. Research shows accuracy drops over 30% when critical information lands in the middle of a long context.

So you make the orchestrator a pure coordinator. It never touches code, just dispatches and collects results. That sidesteps the comprehension problem, but only for a few steps. Each sub-agent returns summaries, review comments, and test reports. The context still accumulates. By step five or six, the coordinator loses track. In one benchmark, a five-agent team cost seven times the tokens but produced only three times the output.

The pattern that actually works today is boring. A deterministic orchestrator outside the LLM. A workflow engine that does not have a context window and does not degrade at step twenty. Each agent gets a fresh context for its part of the work. Shared state lives in GitHub, not in a prompt. The vision of specialised agents collaborating through a team lead is how human teams work. The gap is that today's agents lack the working memory to coordinate as humans do. That will change: checkpointing, external memory, better comprehension. The pattern will work. As of today, it is just not there yet.