Inside Anthropic's Three-Week Infrastructure Nightmare Three infrastructure bugs hit Claude simultaneously in August 2024, affecting up to 16% of requests. Anthropic's detailed postmortem reveals why detection took weeks and what they're changing.
Context Engineering: Why Your AI Agent Needs Less, Not More Anthropic's guide to context engineering reveals why LLMs degrade with bloated context windows — and how to build agents that use compaction, just-in-time retrieval, and structured memory to stay focused across long-horizon tasks.
Claude Code Gets Sandboxing: 84% Fewer Permission Prompts Anthropic shipped sandboxing for Claude Code that cuts permission prompts by 84%. OS-level isolation creates hard boundaries for filesystem and network access, letting Claude run autonomously within defined limits while blocking prompt injection attacks.
Code Execution with MCP: How Anthropic Cuts Agent Costs by 98% Anthropic shows how code execution with MCP cuts agent token usage by 98.7%. Instead of loading thousands of tool definitions upfront, agents discover tools on-demand and process data locally.
Claude Can Now Search 1,000+ Tools Without Blowing Its Context Anthropic shipped Tool Search Tool, Programmatic Tool Calling, and Tool Use Examples. Tool Search cuts token usage 85% by discovering tools on-demand. Programmatic Calling orchestrates workflows in code, eliminating inference overhead. Examples teach correct usage patterns schemas can't express.
How Anthropic Solved the Long-Running Agent Problem Anthropic cracked the long-running agent problem with a two-part harness: an initializer that scaffolds 200+ features, and a coding agent that works incrementally with git-based progress tracking. Here's how it works.
How Anthropic Actually Builds Evals for AI Agents That Ship Anthropic's playbook for building AI agent evaluations that actually work. Start with 20-50 real failures, combine deterministic and model-based graders, and read the transcripts. The teams that invest early ship faster.