I spend a lot of time thinking about systems. Not vibes, not hype — actual systems. How things compose. Where they break. What the failure mode looks like at 3 AM when you're debugging something that should have been deterministic but clearly isn't.
So when multi-agent AI systems went from "interesting research demo" to "shipped production feature" somewhere between late 2025 and now, I got obsessive about understanding them properly. Not the marketing. The architecture.
This post is that attempt to deep dive into that subject.
First, the obvious question: why can't one agent just do everything?
The naive mental model of an AI agent is a very smart intern. You give it a task, it does the task, you move on. And for a while, that worked fine — for small tasks, clean scopes, bounded contexts.
But here's the thing about context windows: they're not infinite, even when vendors market them that way. More importantly, a single agent handling everything is a monolith. And we already know what happens to monoliths at scale — they become god classes, they accrete complexity, they become impossible to reason about.
Anthropic's own research data from BrowseComp found that token usage alone explained 80% of performance variance. When a single agent's context fills up, performance degrades hard. The swarm architecture exists precisely to route around this constraint.
The industry figured this out fast. According to Gartner, inquiries about multi-agent systems surged 1,445% from Q1 2024 to Q2 2025. That's not a trend. That's a phase transition. The field had its "microservices moment" — the same realization that monolithic LLM scripts should give way to distributed, orchestrated agents the same way monolithic apps gave way to distributed services.
What a multi-agent system actually is
Strip away the buzz: a multi-agent system is just multiple LLM instances, each with their own context, tools, and instructions, that coordinate to complete a shared task.
The coordination is the hard part. And there are exactly five patterns that matter in 2026 production systems:
1. Fan-Out (Scatter-Gather)
A coordinator dispatches the same task — or N specialized subtasks — to multiple agents simultaneously, then aggregates results when all branches return. Wall-clock latency is bounded by the slowest branch, not the sum.
Best for: parallel code review across files, multi-source research, chunked document summarization.
The failure mode nobody talks about: partial-failure aggregation. If one branch errors, what do you do? Return partial results? Retry? Fail the whole request? Most implementations don't decide this at design time, and then they silently return incomplete answers that look complete. That's worse than an error.
# Claude Agent SDK — fan-out
import asyncio
from claude_agent_sdk import ClaudeAgent, ClaudeAgentOptions
async def fan_out(sources: list[str]) -> list[str]:
tasks = [
ClaudeAgent.run(
prompt=f"Review this module for issues: {src}",
options=ClaudeAgentOptions(max_turns=5),
)
for src in sources
]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Handle partial failures explicitly — don't swallow them
return [r.last_message if not isinstance(r, Exception) else None for r in results]
2. Pipeline (Sequential Chain)
Stage A → Stage B → Stage C. Each agent's output is the next agent's input. This is the oldest pattern and still the most abused one.
Best for: generate → critique → refine loops, ETL-style document processing, multi-step code transformation.
The failure mode: cascade contamination. A bad output from stage 2 poisons everything after it. Unlike fan-out, you can't recover from a partial failure — by the time stage 4 surfaces the error, it's already propagated through two more stages. Checkpointing matters here. LangGraph v1.0 ships Checkpointers for time-travel debugging exactly because of this.
3. Supervisor (Hierarchical Delegation)
A coordinator routes non-overlapping subtasks to specialist subagents, then synthesizes independent results. This is the 2026 production default — Claude Agent SDK's subagent model, LangGraph Supervisor, and OpenAI Agents SDK handoffs all converge on this topology.
For something like building Sastram (my forum app with a moderation module, AI workers, and a thread feed), a supervisor pattern makes sense: one agent handles the BullMQ job queue logic, another handles the Prisma schema migration, another reviews the WebSocket deduplication code. They don't see each other's intermediate work. The supervisor synthesizes.
// Pseudocode — supervisor pattern
const supervisor = new ClaudeAgent({
role: "orchestrator",
tools: [assignToModeratorAgent, assignToWorkerAgent, assignToReviewAgent],
});
const result = await supervisor.run(`
Refactor the god-class moderation module into three focused components:
content filtering, user flagging, and escalation routing.
Assign each to the appropriate specialist.
`);
4. Debate (Multi-Perspective Critique)
Same question → multiple agents → disagreement → judge model arbitrates. This is NOT supervisor. The agents are not doing independent tasks; they're arguing about the same task.
Microsoft's Copilot Council runs this pattern with GPT-5.4 and Claude in parallel. Cost: ~2.5× single-model baseline before you add the judge. A two-stage variant (generate then critique) adds ~20%.
Use debate when the stakes justify it. Don't use it as a default quality booster — you're paying for disagreement surface area, and most tasks don't benefit from that.
5. Swarm (Dynamic Peer Agents)
This is the frontier. Not a supervisor with three subagents — genuinely dynamic, peer-coordinated agents that spawn and coordinate without a static hierarchy.
Kimi K2.5 can coordinate up to 100 specialized sub-agents executing 1,500 tool calls in parallel via Parallel-Agent Reinforcement Learning. K2.6 (April 2026) pushed this to 300-agent swarms with 12-hour autonomous coding sessions. Nothing else ships swarm as a first-class primitive at that scale.
The coordination overhead is the real challenge. Expect diminishing returns beyond roughly 8 parallel workers, because the reconciliation step grows in complexity faster than parallelism saves time.
Claude Code's Swarm Mode: what's actually under the hood
Claude Code released Swarm Mode in early 2026, and it's the clearest example of the supervisor pattern implemented at the tooling level.
The core insight: instead of one Claude instance managing a massive codebase and exhausting its context, multiple specialized agents share the workload. You're no longer talking to a lone AI programmer. You're talking to a Team Lead.
The TeammateTool architecture exposes 13 core operations — spawnTeam, assignTask, syncMessage, discoverTeams, and others. The most interesting piece is the isolation mechanism: each agent gets its own Git Worktree. Frontend agent works in worktree/frontend, backend agent in worktree/backend. They can't overwrite each other. Merge happens automatically after tests pass.
This is elegant because it maps the problem of agent state isolation directly onto a primitive that Git already solves well. No custom shared memory management. No complex locking. Just worktrees.
The context distribution insight: By distributing work across agents with independent context windows, you break through the single-agent context ceiling. Anthropic's BrowseComp data validates this — token usage is the binding constraint, and swarm architecture routes around it by design.
Orchestration patterns: picking the right one
The choice of pattern determines your cost structure, your failure surface, and which framework supports what you actually need.
| Pattern | Latency | Cost | Failure Mode | Use When |
|---|---|---|---|---|
| Fan-Out | Bounded by slowest branch | Low-medium | Partial-failure aggregation | Independent parallel subtasks |
| Pipeline | Additive (sum of stages) | Low | Cascade contamination | Sequential transforms |
| Supervisor | Coordination overhead | Medium | Single-point orchestrator | Cross-domain specialist delegation |
| Debate | Multiplied (2× minimum) | ~2.5× | Spurious agreement | High-stakes decisions |
| Swarm | Near-parallel | High (at scale) | Coordination collapse | Massively parallelizable work |
Start with supervisor. It's the default for a reason. Fan-out for genuinely parallel work. Pipeline when you have a clear sequential transform. Debate only when you can justify 2.5× cost. Swarm when you're building something that operates at a scale that justifies the coordination overhead.
The framework landscape
Six production-grade frameworks exist right now. Each has a fundamentally different philosophy:
LangGraph v1.0 — graph-based, spans all five patterns natively. Most broadly capable. Ships Checkpointers for time-travel debugging (critical for pipeline failures). The create_agent API replaced create_react_agent.
Claude Agent SDK (@anthropic-ai/claude-agent-sdk, renamed from @anthropic-ai/claude-code September 2025) — excels at supervisor and fan-out. The 2026 default for Claude Code deployments. Requires custom code for debate and swarm.
OpenAI Agents SDK (March 2025, replaced archived Swarm) — minimal approach: an agent is a model, tools, and a loop. Native handoffs. MCP support for standardized tool discovery.
Google ADK (April 2025) — takes a completely different communication topology than Claude Code's decentralized OS-centric approach. Where Claude Code treats agents as independent UNIX microservices communicating via file system and stdio, ADK centralizes state.
CrewAI — role-based, swarm-native at small scale. Good for document summarization pipelines, less good for dynamic task graphs.
AutoGen (Microsoft) — conversation-based multi-agent coordination. Power move is the nested conversation pattern — agents can spawn sub-conversations.
According to Langfuse's framework comparison, LangGraph leads in monthly searches with 27,100, followed by CrewAI at 14,800. Search volume ≠ production readiness, but it's a signal about which tooling ecosystems are getting real production investment.
The plumbing underneath: A2A and MCP
Two protocols are quietly becoming the substrate everything else builds on. They solve different problems, and understanding the difference matters.
MCP (Model Context Protocol) is agent-to-tool communication. Think of it as the USB-C for AI tools — a standardized way for an agent to discover and call external tools, APIs, and data sources. Anthropic donated it to the Linux Foundation in December 2025. By mid-2026, there are ~17,000 publicly listed MCP servers, 97 million monthly SDK downloads, and native MCP support in Claude, Cursor, Windsurf, VS Code with Copilot, ChatGPT, and Gemini.
The adoption is real. But so are the problems. More on that below.
A2A (Agent-to-Agent Protocol) is agent-to-agent communication. Originally developed by Google, now donated to the Linux Foundation. Where MCP connects an agent to its tools, A2A lets agents built on different frameworks, by different vendors, running on different servers, discover and collaborate with each other. At its one-year mark (April 2026), A2A has 150+ supporting organizations, deep integration across Google, Microsoft, and AWS, and production deployments in supply chain, financial services, and IT operations.
The mental model: MCP is how an agent talks to its tools. A2A is how agents talk to each other. They're complementary, not competing.
MCP: Agent → Tool (database, API, file system, etc.)
A2A: Agent → Agent (different frameworks, vendors, environments)
The MCP reality check
Here's where the industry hype meets production reality.
The Zuplo MCP Report (2026) surveyed ~100 technical professionals. Security is by far the biggest obstacle — 50% of MCP server builders cite security and access control complexity as their primary challenge. 24% of MCP servers use no authentication at all. Authentication practices are uneven: 40% rely on API keys, while only 32% use more robust mechanisms like OAuth, JWT, or SSO.
A PolicyLayer audit of 2,031 MCP servers classified 31,000 tools. The findings are sobering:
- 1 in 4 MCP servers can delete or destroy data.
- 1 in 4 can execute arbitrary commands on its host.
- The average install hands the agent 15.5 tools, often more than 30.
- 96.1% of tools give no warning about what they do. The MCP specification ships with no built-in authorization, no rate limits, no spend caps, and no audit trail.
- Official, semi-official, and community registries show no meaningful risk gap. Seed-listed servers (hand-curated to bootstrap the ecosystem) are actually the highest-risk cohort.
The fix isn't banning destructive tools. It's enforcement at the transport layer — every tool call evaluated against a deterministic policy before it reaches the server. That layer doesn't exist yet in most deployments.
The honest take: where this breaks
Multi-agent systems fail in the same ways distributed systems always fail, just with extra indirection that makes debugging harder. But there are also failure modes that are genuinely new — ones that don't have analogues in traditional distributed systems.
The familiar failures
- 1Silent partial failures — one branch fails, returns nothing, and the supervisor synthesizes from incomplete data without knowing it's incomplete.
- 2Context bleed — agents are supposed to be isolated but share tool state or external side effects in ways that create subtle coupling.
- 3Coordination collapse at scale — beyond ~8 parallel workers, reconciliation overhead grows faster than parallelism saves time. The 300-agent Kimi K2.6 swarms work because the tasks are designed for it. Don't cargo-cult that number.
- 4Runaway loops — autonomous agents in a loop without proper exit conditions will just keep going. The
max_turnsparameter exists for a reason. Set it deliberately, not as an afterthought. - 5Cost explosions — debate pattern at 2.5× cost isn't an estimate, it's a floor. At scale, multi-agent systems burn tokens fast. Build cost monitoring before you build anything else.
The new failures
- 1Hallucination cascading — this one caught the industry off guard. When agents iteratively refine each other's outputs, hallucinations don't just persist — they propagate and mutate. A 2026 study (500 cascade experiments across 10 domains) found that while deeper cascades reduce the raw hallucination score, factual accuracy also drops from 0.789 to 0.769. The agents are "correcting" each other's style while quietly degrading the facts. This is especially dangerous in debate patterns, where the judge model inherits poisoned context from both sides and has no independent way to verify claims.
- 2Consensus inertia — related to hallucination cascading, but a distinct problem. After enough rounds of inter-agent discussion, systems develop a false consensus. Minor errors from early rounds get cited, reinforced, and eventually treated as established facts by later rounds. The error isn't in any single agent — it's in the interaction pattern. Injecting a single atomic error seed into a multi-agent pipeline can lead to widespread failure across the system. The propagation follows the communication topology, not the content.
- 3Emergent misalignment — in swarms especially, agents can develop behaviors that no individual agent was programmed to exhibit. This isn't "AI becoming sentient" — it's simpler and more boring. Agents optimizing for local subgoals can produce system-level behaviors that diverge from the original objective. It's the same principal-agent problem that makes microservices hard, except the "agents" can reason about their own incentives.
- 4Debugging non-deterministic flows — traditional debugging assumes reproducibility. Multi-agent systems are inherently non-deterministic. The same input can produce different execution paths because LLM sampling is stochastic, tool responses vary, and timing affects coordination. AgentTrace (2026) proposes causal graph tracing from execution logs as a partial solution, but the fundamental problem remains: you're debugging a system where the "code" is natural language reasoning that changes between runs.
- 5Context fragmentation — when you split work across agents, you split context too. The supervisor has a view of the whole, but each specialist only sees its slice. Edge cases that span multiple agents' responsibilities fall through the gaps. This is the distributed systems equivalent of a race condition, except the "shared state" is semantic meaning, and there's no lock for that.
The security picture nobody wants to talk about
This is the section I kept delaying writing, because the implications are uncomfortable. But here's the reality: multi-agent systems have a fundamentally larger attack surface than single-agent systems, and the security tooling hasn't caught up.
Prompt injection chains
In a single-agent system, prompt injection is a one-hop attack. In a multi-agent system, it's a chain reaction.
The Prompt Infection attack (2024) demonstrated that malicious prompts can self-replicate across interconnected agents, behaving like a computer virus. A single compromised agent spreads the infection to other agents, coordinating them to exchange data and issue instructions to agents with specific tools. The user never sees the directive, because tool descriptions aren't shown in the interface.
The MASPI evaluation framework (2026) tested 28 prompt injection attacks across three attack surfaces — external inputs, agent profiles, and inter-agent messages — with three objectives: instruction hijacking, task disruption, and information exfiltration. The findings are uncomfortable:
- Increasing topological complexity doesn't guarantee security. The risks are distributed across agents, with the most harmful agent varying depending on the attack objective.
- Defenses designed for single-agent prompt injection don't reliably transfer to multi-agent systems. Narrowly scoped defenses may even increase vulnerabilities to other attack types.
The conjunctive prompt attack is particularly insidious: a trigger key in the user query and a hidden adversarial template in one compromised remote agent each appear benign alone, but activate harmful behavior when routing brings them together. No single component appears malicious in isolation. Existing defenses — PromptGuard, Llama-Guard, tool restrictions — don't reliably stop it because they inspect components individually.
Tool poisoning via MCP
This is where the MCP security problem gets structural.
A tool's description is text the server controls, and it lands in the model's context as if it were a trusted instruction. A malicious server can hide a directive in that description — for example, telling the model to read the user's SSH key and pass it as a parameter — and an instruction-following model will comply. The user never sees the directive.
In January and February 2026 alone, more than 30 MCP-related CVEs were filed, including a remote-code-execution flaw rated 9.6 on the CVSS scale. The underlying weakness is architectural: a tool's description is reviewed once when the agent first connects, but the tool's responses flow into the model afterward without any equivalent check.
Anthropic has characterized the STDIO execution behavior as "expected behavior" and declined to modify the protocol architecture, stating that sanitization is the responsibility of developers. This means each of the thousands of teams building MCP-based tooling must independently discover, understand, and remediate a class of vulnerability that the protocol's designers have declined to address at the root. As of mid-2026, at least 14 CVEs have been assigned to individual MCP-dependent projects, with patches available for some but not all.
Supply chain attacks through the tool ecosystem
The Agent Commander framework (March 2026) demonstrated that multiple AI agents from different vendors can be simultaneously compromised via prompt injection and enrolled into a unified command-and-control network. The operator issues natural language tasking through a centralized dashboard. This isn't theoretical — it's a working demonstration of promptware (prompt injection payloads complex enough to function as malware) operating as a multi-agent C2 platform.
The Promptware Kill Chain (January 2026, endorsed by Bruce Schneier) formalized this into a seven-phase framework: Initial Access, Privilege Escalation, Reconnaissance, Persistence, Command-and-Control, Lateral Movement, and Actions on Objective. Persistence appeared in a substantial proportion of documented attacks. Lateral movement grew from zero documented cases in 2023 to a significant category. The C2 stage — where behavior shifts from static payload execution to dynamic, attacker-modifiable direction — transforms a one-shot injection into a controllable trojan.
The OWASP Top 10 for Agentic Applications (2026) now includes:
- ASI01 (Agent Goal Hijack) — redirecting an agent's objective from user-directed tasks to attacker-directed participation.
- ASI06 (Memory Poisoning) — persistence mechanisms that survive across sessions.
- ASI07 (Insecure Inter-Agent Communication) — compromised agents passing instructions to other agents through shared communication channels.
Data exfiltration through agent coordination
When agents coordinate, they create implicit data flows that are hard to audit. A compromised agent in a supervisor pattern doesn't need to exfiltrate data directly — it can route sensitive information through a chain of seemingly legitimate inter-agent messages, where each hop looks like normal coordination. By the time the data leaves the system, it's been transformed enough that no single message looks suspicious.
This is the multi-agent equivalent of a side-channel attack, and it's especially hard to detect because the "communication" is natural language, not network packets. There's no packet sniffer for "agent A told agent B a summary that happens to contain a password."
Where things are headed — mid-2026 reality
The trajectory is clear: the industry is moving from single-agent autocomplete tools toward multi-agent pipelines that operate autonomously over hours, not seconds. Gartner projects 40% of enterprise applications will include task-specific agents by 2026. The multi-agent market is growing at 48.5% CAGR through 2030.
But the mid-2026 picture is more nuanced than the trajectory suggests.
The framework landscape is bifurcating
The framework market has sorted itself into three tiers:
Tier 1 — Production-proven: LangGraph leads with 34.5 million monthly ecosystem downloads, deployed by ~400 firms including Klarna ($60M savings), Uber, and JP Morgan. Its durable execution model with checkpointing and time-travel debugging is the feature that tips the decision for teams running regulated or high-stakes workflows. When an agent makes a wrong decision at step 7 of a 12-step pipeline, LangGraph lets you replay from step 6 with modified state. Nobody else has that.
Tier 2 — Fastest to production: OpenAI Agents SDK (20,700+ GitHub stars) is winning teams that want speed-to-deployment and minimal abstraction overhead. Its April 2026 update introduced sandbox execution and a harness system — the same scaffolding that powers Codex. Claude Agent SDK is the most operationally capable single-provider framework, shipping the same architecture that powers Claude Code, but it's locked to Anthropic models and lacks native observability, durable execution, and state persistence. You build the platform infrastructure yourself.
Tier 3 — Specialized: CrewAI for fast prototyping (~35 lines to a working team), AG2 (the community successor to Microsoft's AutoGen) for conversation-based coordination, DSPy for compile-time optimized pipelines, Google ADK for deep GCP integration, Strands for AWS-native deployments.
The pragmatic advice: prototype with the framework closest to your existing model provider and infrastructure. The switching cost between frameworks is real but manageable — the business logic and prompt engineering transfer even if the orchestration code doesn't.
The protocol layer is consolidating
A2A has hit its stride. With 150+ organizations, Linux Foundation governance, and v1.0 shipping multi-protocol support, enterprise-grade multi-tenancy, and Signed Agent Cards for cryptographic identity verification, it's no longer a spec — it's infrastructure. The real signal is that Microsoft, AWS, and Google have all integrated it into their platforms. When the three major cloud providers agree on a standard, the standard wins.
MCP's adoption trajectory is even steeper — 97 million monthly SDK downloads, 17,000+ public servers, native support across every major AI client. But the security debt is accumulating faster than the ecosystem can service it. The March 2026 roadmap made enterprise readiness its top priority, but the responses so far have been extensions and best practices rather than protocol-level fixes. The structural vulnerabilities — tool descriptions as trusted instructions, no built-in authorization, no audit trail — remain.
What's actually missing
Three things the ecosystem doesn't have yet, and needs:
- 1Evaluation frameworks for multi-agent systems. We have benchmarks for individual models. We don't have reliable ways to evaluate whether a multi-agent system is doing what it claims to be doing, especially for subjective or creative tasks. The hallucination cascading research shows that agent-to-agent refinement can degrade factual accuracy while appearing to improve quality. Without evaluation, you're shipping on vibes.
- 2Observability at the orchestration layer. LangSmith exists, but it's a single-vendor solution. There's no OpenTelemetry for multi-agent systems — no standard way to trace a request across 5 agents, 12 tool calls, and 3 decision points, regardless of which frameworks and models are involved. This is table stakes for production, and it doesn't exist yet.
- 3Security tooling that reasons about composition. Current security tools inspect individual agents or individual tool calls. The conjunctive prompt attack research shows that the most dangerous vulnerabilities emerge from the composition of individually benign components. You need security that reasons over the routing topology and cross-agent interactions, not just individual messages.
What this actually means for how I think about building things
I'm a final-year CS student who's been building full-stack systems for a while now. Sastram has BullMQ, Redis, WebSockets, AI workers, Prisma — it's not a toy. And the architectural challenges I've run into (god-class moderation module, monolithic AI worker, cross-module dependencies) are exactly the problems multi-agent architectures are designed to solve at scale.
The parallel I keep coming back to: multi-agent systems are to single-agent AI what microservices are to monoliths. The same tradeoffs. The same failure modes. The same "you probably don't need this at your current scale, but you need to understand it to know when you do."
Here's what I think the mental model should be for any developer working in this space right now:
An agent is just a service. It has inputs, outputs, tools (APIs), and state (context window). The orchestration layer is just your service mesh. The patterns (fan-out, supervisor, pipeline) map directly to distributed systems patterns you already know.
The newness is that the "service" can now reason about what to do with ambiguous inputs. The intelligence is distributed. And the coordination — the hard part — is the same distributed systems problem it always was.
But the security model is fundamentally different. In traditional distributed systems, you can reason about trust boundaries by looking at network topology and authentication mechanisms. In multi-agent systems, the "communication channel" is natural language, and the "authentication mechanism" is... prompt engineering. The OWASP Top 10 for Agentic Applications exists because the old threat models don't apply. Tool poisoning, memory poisoning, agent goal hijacking — these are new attack classes that require new defenses.
The architect's job is shifting from writing logic to designing coordination and securing it. That's a different skill set than what CS curricula teach. It's a distributed systems problem wrapped in an LLM API, with a security model that looks more like social engineering than network security.
I find that genuinely interesting. The hard part isn't the AI. The hard part is the plumbing — and making sure nobody can poison it.
If you build something with multi-agent orchestration and it breaks in an interesting way, I'd genuinely like to hear about it. The failure modes are where the real learning is.
— Pulkit