AI Dev Stack

Sign in Subscribe

feature

Inside Anthropic's Three-Week Infrastructure Nightmare

Inside Anthropic's Three-Week Infrastructure Nightmare

Three infrastructure bugs hit Claude simultaneously in August 2024, affecting up to 16% of requests. Anthropic's detailed postmortem reveals why detection took weeks and what they're changing.

Context Engineering: Why Your AI Agent Needs Less, Not More

Context Engineering: Why Your AI Agent Needs Less, Not More

Anthropic's guide to context engineering reveals why LLMs degrade with bloated context windows — and how to build agents that use compaction, just-in-time retrieval, and structured memory to stay focused across long-horizon tasks.

Claude Code Gets Sandboxing: 84% Fewer Permission Prompts

Claude Code Gets Sandboxing: 84% Fewer Permission Prompts

Anthropic shipped sandboxing for Claude Code that cuts permission prompts by 84%. OS-level isolation creates hard boundaries for filesystem and network access, letting Claude run autonomously within defined limits while blocking prompt injection attacks.

Code Execution with MCP: How Anthropic Cuts Agent Costs by 98%

Code Execution with MCP: How Anthropic Cuts Agent Costs by 98%

Anthropic shows how code execution with MCP cuts agent token usage by 98.7%. Instead of loading thousands of tool definitions upfront, agents discover tools on-demand and process data locally.

Claude Can Now Search 1,000+ Tools Without Blowing Its Context

Claude Can Now Search 1,000+ Tools Without Blowing Its Context

Anthropic shipped Tool Search Tool, Programmatic Tool Calling, and Tool Use Examples. Tool Search cuts token usage 85% by discovering tools on-demand. Programmatic Calling orchestrates workflows in code, eliminating inference overhead. Examples teach correct usage patterns schemas can't express.

How Anthropic Solved the Long-Running Agent Problem

How Anthropic Solved the Long-Running Agent Problem

Anthropic cracked the long-running agent problem with a two-part harness: an initializer that scaffolds 200+ features, and a coding agent that works incrementally with git-based progress tracking. Here's how it works.

How Anthropic Actually Builds Evals for AI Agents That Ship

How Anthropic Actually Builds Evals for AI Agents That Ship

Anthropic's playbook for building AI agent evaluations that actually work. Start with 20-50 real failures, combine deterministic and model-based graders, and read the transcripts. The teams that invest early ship faster.