Anthropic's Managed Agents: Decoupling the Brain from the Hands

Anthropic decoupled Claude from execution environments and session state. The result: 60% faster p50 latency, crash-resistant harnesses, and interfaces designed to outlast any specific implementation as models improve.

Anthropic's Managed Agents: Decoupling the Brain from the Hands

TL;DR

  • Anthropic built Managed Agents as a hosted service that separates Claude (the brain) from execution environments (the hands) and session state
  • The architecture treats harnesses and containers as replaceable "cattle" instead of hand-tended "pets" — if anything crashes, it can be swapped without losing work
  • Decoupling dropped p50 time-to-first-token by 60% and p95 by over 90% by eliminating unnecessary container provisioning
  • This matters if you're building long-running agents: the interfaces are designed to outlast any specific implementation as models improve

The Big Picture

Anthropic has been writing about agent harnesses for months. The pattern is always the same: build a harness that compensates for what Claude can't do, then watch that harness become dead weight as the next model ships.

Case in point: Claude Sonnet 4.5 suffered from "context anxiety" — it would wrap up tasks prematurely when sensing its context limit approaching. Anthropic's solution was context resets in the harness. Then Claude Opus 4.5 dropped, and the behavior vanished. The resets became unnecessary overhead.

This is the core problem Managed Agents solves. It's a hosted service on the Claude Platform that runs long-horizon agents through interfaces designed to outlast any particular implementation — including the ones Anthropic runs today. The architecture borrows from operating systems: virtualize the components so the abstractions stay stable while implementations change freely underneath.

The read() syscall doesn't care if it's hitting a 1970s disk pack or a modern NVMe drive. Managed Agents applies the same principle to agent infrastructure. The session (event log), harness (orchestration loop), and sandbox (execution environment) are all virtualized. Swap any component without disturbing the others.

How It Works

Anthropic started with everything in one container: session, harness, and sandbox sharing an environment. File edits were direct syscalls. No service boundaries to design. Simple.

But coupling everything created a "pet" — a named, hand-tended server you can't afford to lose. If the container failed, the session died. If it hung, engineers had to nurse it back to health. Debugging meant opening a shell inside the container, which often held user data. Anthropic essentially lacked the ability to debug production issues.

The second problem: the harness assumed everything Claude worked on lived in the same container. When customers wanted Claude to access resources in their VPC, they had to peer their network with Anthropic's or run the harness themselves. A baked-in assumption became a deployment blocker.

The fix: decouple the "brain" (Claude and its harness) from the "hands" (sandboxes and tools) and the session (event log). Each became an interface with minimal assumptions about the others. Each could fail or be replaced independently.

The harness leaves the container. The harness no longer lives inside the container. It calls the container like any other tool: execute(name, input) → string. The container became cattle. If it dies, the harness catches the failure as a tool-call error and passes it to Claude. If Claude retries, a new container spins up with provision({resources}). No more nursing failed containers.

The harness also became cattle. Because the session log sits outside the harness, nothing in the harness needs to survive a crash. When a harness fails, a new one boots with wake(sessionId), calls getSession(id) to retrieve the event log, and resumes from the last event. During the agent loop, the harness writes to the session with emitEvent(id, event) to maintain a durable record.

Security boundaries matter. In the coupled design, untrusted code Claude generated ran in the same container as credentials. A prompt injection only had to convince Claude to read its own environment variables. Once an attacker has those tokens, they can spawn fresh sessions and delegate work.

The structural fix: tokens are never reachable from the sandbox where Claude's code runs. For Git, Anthropic uses each repository's access token to clone the repo during sandbox initialization and wires it into the local git remote. Git push and pull work from inside the sandbox without the agent ever handling the token. For custom tools, OAuth tokens live in a secure vault. Claude calls MCP tools via a proxy that fetches credentials from the vault and makes the external call. The harness never sees credentials.

What This Changes For Developers

The session is not Claude's context window. Long-horizon tasks often exceed context limits, and the standard fixes — compaction, selective trimming, memory tools — all involve irreversible decisions about what to keep. If you guess wrong about which tokens future turns will need, you're stuck.

Managed Agents treats the session as a context object that lives outside Claude's context window. The interface getEvents() lets the brain interrogate context by selecting positional slices of the event stream. The brain can pick up where it last stopped reading, rewind a few events before a specific moment, or reread context before an action. Fetched events can be transformed in the harness before being passed to Claude — context organization for prompt cache hit rates, context engineering, whatever the harness encodes.

Decoupling the brain from the hands also fixed a performance problem. When the brain lived in a container, every session paid the full container setup cost up front — even sessions that would never touch the sandbox. Time-to-first-token (TTFT) included cloning repos, booting processes, fetching events.

Now containers are provisioned via a tool call only if needed. Inference starts as soon as the orchestration layer pulls pending events from the session log. Anthropic's p50 TTFT dropped roughly 60%. P95 dropped over 90%.

The architecture also enables multi-environment reasoning. Each hand is a tool: execute(name, input) → string. That interface supports custom tools, MCP servers, and Anthropic's own tools. The harness doesn't know whether the sandbox is a container, a phone, or a Pokémon emulator. Because no hand is coupled to any brain, brains can pass hands to one another.

Try It Yourself

Anthropic's Managed Agents documentation covers setup and API usage. The core interfaces are:

// Session management
getSession(id: string) → Session
emitEvent(id: string, event: Event) → void
getEvents() → Event[]

// Harness lifecycle
wake(sessionId: string) → Harness
provision(resources: Resources) → Sandbox

// Tool execution
execute(name: string, input: string) → string

The session log is append-only and durable. The harness is stateless and recoverable. The sandbox is provisioned on-demand and disposable. If you're building agents that need to survive crashes, handle long-running tasks, or connect to multiple execution environments, these interfaces give you the primitives to do it without coupling your implementation to Anthropic's.

The Bottom Line

Use Managed Agents if you're building production agents that need to survive infrastructure failures, operate across multiple environments, or run tasks longer than a single context window. The decoupled architecture means you're not locked into Anthropic's current harness implementation — the interfaces are designed to accommodate future harnesses as models improve.

Skip it if you're prototyping or running short-lived tasks where a single container failure means starting over anyway. The abstraction overhead isn't worth it for simple use cases.

The real opportunity here is that Anthropic is treating harness design as a moving target. Earlier work on context engineering and MCP integration showed that assumptions about what Claude can't do go stale fast. Managed Agents bets that the interfaces around Claude — session state, execution environments, orchestration — will outlast any specific implementation. If you're building agents that need to evolve with model capabilities, that's the right bet.

Source: Anthropic