anthropic

Code Execution with MCP: How Anthropic Cuts Agent Costs by 98%

Anthropic shows how code execution with MCP cuts agent token usage by 98.7%. Instead of loading thousands of tool definitions upfront, agents discover tools on-demand and process data locally.

TL;DR

Loading thousands of MCP tool definitions upfront burns tokens and slows agents down
Code execution lets agents load tools on-demand and process data locally, cutting token usage by 98.7%
Agents write code to interact with MCP servers instead of making direct tool calls
This approach enables progressive tool discovery, context-efficient filtering, and privacy-preserving operations

The Big Picture

The Model Context Protocol launched in November 2024 and became the de-facto standard for connecting AI agents to external systems. Developers have built thousands of MCP servers. Agents now routinely connect to hundreds or thousands of tools across dozens of servers.

But there's a problem. As agents scale up, they hit a wall. Loading all tool definitions upfront consumes massive amounts of context. Passing intermediate results through the model for every operation burns tokens and adds latency. An agent connected to 50 MCP servers might process 150,000 tokens of tool definitions before it even reads your request.

Anthropic's solution: stop making agents call tools directly. Instead, let them write code that interacts with MCP servers. The agent discovers tools by exploring a filesystem, loads only what it needs, and processes data in a secure execution environment. The result? A 98.7% reduction in token usage for common workflows.

This isn't just an optimization. It's a fundamental shift in how agents interact with external systems. Code execution transforms MCP from a direct tool-calling protocol into a programmable API layer that agents can compose, filter, and orchestrate using familiar software engineering patterns.

How It Works

Traditional MCP clients load every tool definition into the model's context window upfront. Each definition includes the tool name, description, parameters, and return types. For a Google Drive integration, that might look like:

gdrive.getDocument
Description: Retrieves a document from Google Drive
Parameters:
  documentId (required, string): The ID of the document to retrieve
  fields (optional, string): Specific fields to return
Returns: Document object with title, body content, metadata, permissions, etc.

Multiply this across dozens of servers with hundreds of tools each, and you're looking at hundreds of thousands of tokens before the agent does any actual work.

The second problem hits when agents chain tool calls together. Say you ask an agent to "Download my meeting transcript from Google Drive and attach it to the Salesforce lead." The model makes two calls:

TOOL CALL: gdrive.getDocument(documentId: "abc123")
→ returns full transcript text (loaded into context)

TOOL CALL: salesforce.updateRecord(
  objectType: "SalesMeeting",
  recordId: "00Q5f000001abcXYZ",
  data: { "Notes": "[full transcript text written out again]" }
)

The transcript flows through the model twice. For a 2-hour sales meeting, that's an additional 50,000 tokens. The model has to read it, hold it in context, then write it back out in the next tool call. This is inefficient and error-prone.

Code execution solves both problems by presenting MCP servers as code APIs instead of direct tool calls. The implementation generates a file tree of available tools:

servers
├── google-drive
│   ├── getDocument.ts
│   └── index.ts
├── salesforce
│   ├── updateRecord.ts
│   └── index.ts
└── ... (other servers)

Each tool corresponds to a TypeScript file with a typed function interface:

// ./servers/google-drive/getDocument.ts
import { callMCPTool } from "../../../client.js";

interface GetDocumentInput {
  documentId: string;
}

interface GetDocumentResponse {
  content: string;
}

export async function getDocument(input: GetDocumentInput): Promise<GetDocumentResponse> {
  return callMCPTool<GetDocumentResponse>('google_drive__get_document', input);
}

Now the agent discovers tools by exploring the filesystem. It lists the ./servers/ directory to find available servers, then reads only the specific tool files it needs. The Google Drive to Salesforce workflow becomes:

import * as gdrive from './servers/google-drive';
import * as salesforce from './servers/salesforce';

const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content;
await salesforce.updateRecord({
  objectType: 'SalesMeeting',
  recordId: '00Q5f000001abcXYZ',
  data: { Notes: transcript }
});

The agent loads two tool definitions instead of thousands. The transcript stays in the execution environment—it never flows through the model's context window. Token usage drops from 150,000 to 2,000. That's a 98.7% reduction.

Cloudflare published similar findings in their "Code Mode" research. The core insight is the same: LLMs are excellent at writing code. Developers should leverage this strength to build agents that interact with MCP servers more efficiently.

What This Changes For Developers

Code execution with MCP enables several patterns that weren't practical with direct tool calling.

Progressive tool discovery. Models are good at navigating filesystems. Presenting tools as code files lets agents load definitions on-demand. An agent working with a Salesforce integration can search for "salesforce" tools and load only what it needs for the current task. You can also implement a search_tools function with detail levels—name only, name and description, or full schema—so agents conserve context while finding tools efficiently.

Context-efficient filtering. When working with large datasets, agents can filter and transform results in code before returning them. Consider fetching a 10,000-row spreadsheet:

// Without code execution - all rows flow through context
TOOL CALL: gdrive.getSheet(sheetId: 'abc123')
→ returns 10,000 rows to filter manually

// With code execution - filter in the execution environment
const allRows = await gdrive.getSheet({ sheetId: 'abc123' });
const pendingOrders = allRows.filter(row => row["Status"] === 'pending');
console.log(`Found ${pendingOrders.length} pending orders`);
console.log(pendingOrders.slice(0, 5)); // Only log first 5

The agent sees five rows instead of 10,000. Similar patterns work for aggregations, joins across data sources, or extracting specific fields—all without bloating the context window.

Control flow in code. Loops, conditionals, and error handling use familiar programming patterns instead of chaining individual tool calls. If you need to wait for a deployment notification in Slack, the agent writes:

let found = false;
while (!found) {
  const messages = await slack.getChannelHistory({ channel: 'C123456' });
  found = messages.some(m => m.text.includes('deployment complete'));
  if (!found) await new Promise(r => setTimeout(r, 5000));
}
console.log('Deployment notification received');

This is more efficient than alternating between MCP tool calls and sleep commands through the agent loop. It also reduces "time to first token" latency—the code execution environment evaluates conditionals instead of waiting for the model to process each branch.

Privacy-preserving operations. Intermediate results stay in the execution environment by default. The agent only sees what you explicitly log or return. Data you don't want to share with the model can flow through your workflow without entering the model's context.

For sensitive workloads, the agent harness can tokenize data automatically. Imagine importing customer contact details from a spreadsheet into Salesforce. The agent writes code to read the sheet and update records. The MCP client intercepts the data and tokenizes PII before it reaches the model. Real email addresses, phone numbers, and names flow from Google Sheets to Salesforce, but never through the model. This prevents the agent from accidentally logging or processing sensitive data.

State persistence and skills. Code execution with filesystem access lets agents maintain state across operations. Agents can write intermediate results to files, enabling them to resume work and track progress. They can also persist their own code as reusable functions. Once an agent develops working code for a task, it saves that implementation for future use. This ties into the concept of Skills—folders of reusable instructions, scripts, and resources that improve performance on specialized tasks. Over time, agents build a toolbox of higher-level capabilities.

Try It Yourself

The article includes a reference implementation in TypeScript. The core pattern is straightforward: generate a file tree of MCP tools, expose them as typed functions, and let the agent explore the filesystem to discover what it needs.

Here's a minimal example of the tool file structure:

// ./servers/google-drive/getDocument.ts
import { callMCPTool } from "../../../client.js";

interface GetDocumentInput {
  documentId: string;
}

interface GetDocumentResponse {
  content: string;
}

export async function getDocument(input: GetDocumentInput): Promise<GetDocumentResponse> {
  return callMCPTool<GetDocumentResponse>('google_drive__get_document', input);
}

The agent imports and calls this like any other TypeScript function. The callMCPTool function handles the actual MCP protocol communication.

You'll need a secure execution environment with appropriate sandboxing, resource limits, and monitoring. These infrastructure requirements add operational overhead that direct tool calls avoid. The benefits—reduced token costs, lower latency, improved tool composition—should be weighed against implementation complexity.

For more details on the implementation, see the Model Context Protocol documentation and join the MCP community to share your findings.

The Bottom Line

Use code execution with MCP if you're connecting agents to more than a dozen tools or working with large datasets that need filtering before they reach the model. The 98.7% token reduction is real, and the ability to compose complex workflows in code instead of chaining tool calls is a significant developer experience improvement.

Skip it if you're building simple agents with a handful of tools and small payloads. The infrastructure overhead—secure sandboxing, resource limits, monitoring—isn't worth it for straightforward use cases where direct tool calling works fine.

The real opportunity here is that code execution transforms MCP from a tool-calling protocol into a programmable API layer. Agents can now use established software engineering patterns—progressive disclosure, data transformation pipelines, state management—to interact with external systems. This is how Claude handles 1,000+ tools without blowing its context window. As MCP adoption scales, code execution will become the default pattern for building efficient, composable agents.

Source: Anthropic

Code Execution with MCP: How Anthropic Cuts Agent Costs by 98%

TL;DR

The Big Picture

How It Works

What This Changes For Developers

Try It Yourself

The Bottom Line

Read next

GitHub Copilot CLI: Agentic AI in Your Terminal

Anthropic's Managed Agents: Decoupling the Brain from the Hands

GitHub Universe 2026: Call for Sessions Open Through May 1