Code Execution with MCP: Why Your AI Agent Is Wasting Tokens

Loading thousands of MCP tool definitions upfront burns 150,000+ tokens before your agent starts working. Code execution cuts that by 98.7% by loading tools on-demand and processing data locally. Here's how it works.

Code Execution with MCP: Why Your AI Agent Is Wasting Tokens

TL;DR

  • Loading thousands of MCP tool definitions upfront can burn 150,000+ tokens before your agent even starts working
  • Code execution lets agents load tools on-demand and process data locally, cutting token usage by 98.7% in real-world tests
  • Presenting MCP servers as code APIs instead of direct tool calls solves both context bloat and intermediate result overhead
  • This matters if you're connecting agents to dozens of MCP servers — the traditional approach doesn't scale

The Big Picture

The Model Context Protocol launched in November 2024 and became the de-facto standard for connecting AI agents to external systems. Developers have built thousands of MCP servers. Agents now routinely connect to hundreds or thousands of tools across dozens of servers.

But there's a problem. The standard approach — loading all tool definitions upfront and passing every intermediate result through the model's context window — breaks down at scale. You end up burning 150,000 tokens just to load tool definitions before your agent reads a single user request. A simple "copy this Google Doc to Salesforce" task can process a 50,000-token transcript twice, once to read it and once to write it.

Anthropic's engineering team identified two specific bottlenecks: tool definitions overload the context window, and intermediate tool results consume additional tokens unnecessarily. Their solution is straightforward: present MCP servers as code APIs instead of direct tool calls. Let the agent write code that runs in an execution environment. Load only the tools you need. Process data locally before passing results back to the model.

This isn't theoretical. Cloudflare published similar findings and called it "Code Mode." The core insight is the same: LLMs are good at writing code. Use that strength.

How It Works

Most MCP clients load all tool definitions directly into context. Each tool gets a description, parameters, and return schema. A Google Drive tool might look like this:

gdrive.getDocument
Description: Retrieves a document from Google Drive
Parameters:
  documentId (required, string): The ID of the document to retrieve
  fields (optional, string): Specific fields to return
Returns: Document object with title, body content, metadata, permissions, etc.

Multiply that by a thousand tools and you're processing hundreds of thousands of tokens before the agent does anything useful. The model then orchestrates a message loop where each tool call and result passes through its context window.

The code execution approach generates a file tree of available tools from connected MCP servers instead:

servers
├── google-drive
│   ├── getDocument.ts
│   ├── ... (other tools)
│   └── index.ts
├── salesforce
│   ├── updateRecord.ts
│   ├── ... (other tools)
│   └── index.ts
└── ... (other servers)

Each tool corresponds to a file. The Google Drive document retrieval tool becomes:

// ./servers/google-drive/getDocument.ts
import { callMCPTool } from "../../../client.js";

interface GetDocumentInput {
  documentId: string;
}

interface GetDocumentResponse {
  content: string;
}

/* Read a document from Google Drive */
export async function getDocument(input: GetDocumentInput): Promise<GetDocumentResponse> {
  return callMCPTool<GetDocumentResponse>('google_drive__get_document', input);
}

The agent discovers tools by exploring the filesystem. It lists the ./servers/ directory to find available servers, then reads specific tool files it needs. This is progressive disclosure — load only what you need, when you need it.

A task like "Download my meeting transcript from Google Drive and attach it to the Salesforce lead" becomes:

// Read transcript from Google Docs and add to Salesforce prospect
import * as gdrive from './servers/google-drive';
import * as salesforce from './servers/salesforce';

const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content;
await salesforce.updateRecord({
  objectType: 'SalesMeeting',
  recordId: '00Q5f000001abcXYZ',
  data: { Notes: transcript }
});

The transcript flows from Google Drive to Salesforce through the execution environment. It never passes through the model's context window. For a 2-hour sales meeting with a 50,000-token transcript, this saves 100,000 tokens — you don't read it into context, then write it back out.

Anthropic's tests showed token usage dropping from 150,000 to 2,000 tokens for agents connected to many tools. That's a 98.7% reduction. Cloudflare reported similar numbers.

The execution environment also handles filtering and transformation. Fetching a 10,000-row spreadsheet and filtering for pending orders looks like this:

const allRows = await gdrive.getSheet({ sheetId: 'abc123' });
const pendingOrders = allRows.filter(row => row["Status"] === 'pending');
console.log(`Found ${pendingOrders.length} pending orders`);
console.log(pendingOrders.slice(0, 5)); // Only log first 5 for review

The agent sees five rows instead of 10,000. The same pattern works for aggregations, joins across multiple data sources, or extracting specific fields.

Control flow becomes more efficient too. Loops, conditionals, and error handling use familiar code patterns instead of chaining individual tool calls. Waiting for a deployment notification in Slack becomes:

let found = false;
while (!found) {
  const messages = await slack.getChannelHistory({ channel: 'C123456' });
  found = messages.some(m => m.text.includes('deployment complete'));
  if (!found) await new Promise(r => setTimeout(r, 5000));
}
console.log('Deployment notification received');

This is more efficient than alternating between MCP tool calls and sleep commands through the agent loop. It also reduces time-to-first-token latency — the execution environment evaluates conditionals instead of waiting for the model.

What This Changes For Developers

Code execution with MCP makes context management a solved problem. You stop worrying about how many tools to expose or whether intermediate results will blow up your token budget. The agent loads tools on-demand and processes data locally.

Privacy gets easier too. Intermediate results stay in the execution environment by default. The agent only sees what you explicitly log or return. Data flows through your workflow without entering the model's context. For sensitive workloads, the agent harness can tokenize data automatically. Importing customer contact details from a spreadsheet into Salesforce looks like this:

const sheet = await gdrive.getSheet({ sheetId: 'abc123' });
for (const row of sheet.rows) {
  await salesforce.updateRecord({
    objectType: 'Lead',
    recordId: row.salesforceId,
    data: {
      Email: row.email,
      Phone: row.phone,
      Name: row.name
    }
  });
}
console.log(`Updated ${sheet.rows.length} leads`);

The MCP client intercepts the data and tokenizes PII before it reaches the model. The agent sees [EMAIL_1], [PHONE_1], [NAME_1] instead of real values. When the data flows to Salesforce, it's untokenized via a lookup in the MCP client. Real email addresses and phone numbers move from Google Sheets to Salesforce without passing through the model.

State persistence becomes possible. Code execution with filesystem access lets agents write intermediate results to files. They can resume work and track progress:

const leads = await salesforce.query({
  query: 'SELECT Id, Email FROM Lead LIMIT 1000'
});
const csvData = leads.map(l => `${l.Id},${l.Email}`).join('\n');
await fs.writeFile('./workspace/leads.csv', csvData);

// Later execution picks up where it left off
const saved = await fs.readFile('./workspace/leads.csv', 'utf-8');

Agents can also persist their own code as reusable functions. Once an agent develops working code for a task, it saves that implementation for future use. This ties into the concept of Skills — folders of reusable instructions, scripts, and resources that improve performance on specialized tasks. Over time, your agent builds a toolbox of higher-level capabilities.

The tradeoff is infrastructure complexity. Running agent-generated code requires a secure execution environment with appropriate sandboxing, resource limits, and monitoring. These requirements add operational overhead and security considerations that direct tool calls avoid. Anthropic has written about their sandboxing approach before. The benefits — reduced token costs, lower latency, improved tool composition — need to be weighed against implementation costs.

Try It Yourself

The implementation details depend on your MCP client and execution environment. Anthropic's example uses TypeScript with a file tree structure. The core pattern is:

  1. Generate a filesystem representation of your MCP servers and tools
  2. Let the agent explore that filesystem to discover available tools
  3. Provide a callMCPTool function that bridges code execution to actual MCP tool calls
  4. Run the agent's code in a sandboxed environment with filesystem access

You can also add a search_tools function to help agents find relevant tools without listing directories. Include a detail level parameter so the agent can request just names, names and descriptions, or full definitions with schemas.

The MCP community is actively exploring this pattern. If you implement it, share your findings on the MCP community channels.

The Bottom Line

Use code execution with MCP if you're connecting agents to more than a dozen MCP servers or if your workflows involve large intermediate results like documents, spreadsheets, or API responses. The token savings are real — 98.7% in Anthropic's tests — and the latency improvements matter for production workloads.

Skip it if you're building simple agents with a handful of tools and small data payloads. Direct tool calls are simpler to implement and debug. The infrastructure overhead of secure code execution isn't worth it for basic use cases.

The real opportunity here is that code execution turns context management from a scaling problem into a solved problem. You stop making tradeoffs between how many tools to expose and how much context budget you have left. The real risk is underestimating the security and operational complexity of running agent-generated code in production. Sandboxing, resource limits, and monitoring aren't optional — they're table stakes.

Source: Anthropic