How GitHub Cut Agentic Workflow Costs by 62% With Token Optimization
GitHub reduced token usage in production agentic workflows by up to 62% through systematic optimization. Here's how they instrumented token consumption, built self-optimizing workflows, and cut costs without sacrificing quality.
TL;DR
- GitHub reduced token usage in production agentic workflows by 43-62% through systematic optimization
- Most efficiency gains came from eliminating unused MCP tools and replacing LLM calls with deterministic CLI commands
- An API proxy logs all token usage across agent frameworks, feeding daily auditor and optimizer workflows
- Developers running agentic CI workflows can use the same techniques to cut costs without sacrificing quality
The Big Picture
GitHub runs hundreds of agentic workflows in production — automated agents that triage issues, audit security, maintain documentation, and clean up repos. These workflows run as GitHub Actions against real API rate limits, and because they trigger automatically on every issue, pull request, or commit, costs accumulate fast.
In April 2026, GitHub's team began systematically optimizing token usage across their most expensive workflows. The results are striking: Auto-Triage Issues dropped 62% in token consumption. Security Guard fell 43%. Smoke Claude, an integration test workflow, dropped 59%. All without changing what the workflows actually do.
This isn't theoretical optimization. GitHub is eating its own dog food here, running these workflows in production repos like gh-aw and gh-aw-firewall. The techniques they used — API-level observability, automated auditing, MCP tool pruning, and CLI substitution — are available today in the GitHub Agentic Workflows framework. If you're running agentic CI and wondering whether you're burning tokens unnecessarily, this is the playbook.
How It Works
The first problem was visibility. Each agent framework (Claude CLI, Copilot CLI, Codex CLI) logs differently, and historical usage data was incomplete. GitHub solved this by instrumenting their existing API proxy — the security layer that prevents agents from directly accessing auth credentials. Every workflow now outputs a token-usage.jsonl artifact with one record per API call: input tokens, output tokens, cache-read tokens, cache-write tokens, model, provider, timestamps.
With token data in hand, GitHub built two daily optimization workflows. A Daily Token Usage Auditor reads recent artifacts, aggregates consumption by workflow, and flags anomalies — workflows that suddenly spike in usage, or runs that take 18 LLM turns when they normally take four. When the Auditor flags a workflow, a Daily Token Optimizer analyzes the workflow's source and logs, then creates a GitHub issue describing concrete inefficiencies and proposing specific fixes.
The Auditor and Optimizer are themselves agentic workflows. Their token usage appears in daily reports, creating a small virtuous cycle of self-optimization.
The most common inefficiency the Optimizer found: unused MCP tools. Because LLM APIs are stateless, agent runtimes include the full MCP tool manifest — function names and JSON schemas — with every request. For a GitHub MCP server with 40 tools, that's 10-15 KB of schema per turn. If the agent only uses two tools, the other 38 are pure overhead. Workflow authors naturally start with a full toolset since it's the path of least resistance, but over time most workflows stabilize around a narrow set of tools. The Optimizer cross-references tool manifests against actual tool calls and recommends pruning unused tools. In smoke-test workflows, this saved 8-12 KB per call — several thousand tokens per run with no behavior change.
The bigger structural win: replacing GitHub MCP calls with GitHub CLI commands for data-fetching operations. An MCP tool call is a reasoning step. The agent must decide to call the tool, formulate arguments, and receive output as part of context. That's a full LLM round-trip. Calling gh pr diff is a deterministic HTTP request with no LLM involvement.
GitHub used two strategies. For data the agent always needs — pull request diffs, changed files — they added pre-agentic setup steps that run gh commands before the agent starts and write results to workspace files. The agent reads those files instead of making MCP calls. For data the agent determines at runtime, they use a lightweight HTTP proxy that routes CLI traffic to GitHub's API without exposing auth tokens. The agent runs gh pr view --json and gets structured data back, just like a terminal user would. This moves the majority of GitHub data-fetching out of the LLM reasoning loop.
What This Changes For Developers
The optimization results are uneven but instructive. Auto-Triage Issues shows a sustained 62% reduction across 109 post-fix runs. It fires on every new issue — averaging 6.8 runs per day, maxing at 15 — so per-run savings compound quickly. Over the observation period, this optimization saved roughly 7.8 million Effective Tokens in aggregate. Security Guard and Smoke Claude run even more frequently and show 43% and 59% improvements respectively.
Not every optimization translates to measurable savings, especially over short windows on a live repo where workload varies. Contribution Check experienced a 5% increase in Effective Tokens, but that's due to a workload shift: the post-optimization period coincided with a burst of development activity, and the workflow processed 65% large pull requests versus 39% before. Output tokens rose 14% as the agent reviewed bigger diffs. The optimization likely improved per-turn efficiency, but the heavier workload masks that gain.
GitHub uses an Effective Tokens (ET) metric to normalize consumption across model tiers: ET = m × (1.0 × I + 0.1 × C + 4.0 × O), where m is a model cost multiplier (Haiku = 0.25×, Sonnet = 1.0×, Opus = 5.0×), I is input tokens, C is cache-read tokens, and O is output tokens. Output tokens carry 4× weight because they're the most expensive. Cache-read tokens carry 0.1× weight because they're served from cache at a fraction of the cost. A 10% ET reduction means a genuine 10% cost reduction regardless of model.
The patterns that emerge: many agent turns are deterministic data-gathering. Auto-Triage's 44% improvement came from eliminating structural inefficiency — agent turns spent fetching issue metadata and scanning labels, operations that require no inference. Moving those reads into pre-agentic CLI steps removed them from the LLM reasoning loop entirely. Security Guard's 60% reduction came from a relevance gate that skips the LLM entirely for pull requests that don't touch security-sensitive files. The cheapest LLM call is the one you don't make.
Unused tools are expensive to carry, but not always. Glossary Maintainer called a single tool — search_repositories — 342 times in one run, accounting for 58% of all tool calls, despite being completely unnecessary for a workflow that only scans local file changes. Removing it was the optimizer's recommendation. But Daily Community Attribution was configured with eight GitHub MCP tools and made zero calls to any of them, yet removing them didn't reduce ET. Tool manifests were a small fraction of this workflow's overall context.
A single misconfigured rule can cause runaway loops. Daily Syntax Error Quality was the highest-ET workflow before optimization. The root cause: a one-line misconfiguration. The workflow copied test files to /tmp/ then called gh aw compile *, but the sandbox's bash allowlist only permitted relative-path glob patterns. Every compile attempt was blocked. Unable to use the tool it needed, the agent fell into a 64-turn fallback loop, manually reading source code to reconstruct what the compiler would have told it. One fix to the allowed bash patterns eliminated the loop.
Try It Yourself
The tools GitHub used are available today in the GitHub Agentic Workflows framework. If you're running agentic workflows in CI, you can add the auditor and optimizer to your repo:
gh extensions install github/gh-aw
gh aw add githubnext/agentic-ops/copilot-token-audit githubnext/agentic-ops/copilot-token-optimizerRunning them alongside your existing CI gives immediate visibility into usage and helps continuously optimize workflows over time. The first step is the same as GitHub's: add the API proxy, turn on logging, and let the data tell you where to look.
For more on validating agentic workflows when correctness isn't repeatable, see How GitHub Validates AI Agents When Correctness Isn't Repeatable. If you're reviewing agent-generated pull requests, Agent Pull Requests Are Everywhere. Here's How to Review Them covers the practical workflow.
The Bottom Line
Use this if you're running agentic workflows in CI and costs are accumulating out of view. The proxy-level observability and optimizer workflows change how you develop and deploy automations — you add token monitoring from day one rather than retrofitting it later. Skip this if your workflows run infrequently or you're not yet at scale where token costs matter. The real opportunity is moving from workflow-level optimization to system-level optimization: understanding which episodes cause costly runs, which workflows duplicate reads, and where shared intermediate artifacts should be cached instead of rediscovered by each run. That requires richer lineage data than most systems collect today, but it's the direction that matters. GitHub is building the plane as they fly it. You can use the same techniques to avoid burning jet fuel unnecessarily.
Source: GitHub Blog