Claude 3.5 Sonnet Hits 49% on SWE-Bench: How Anthropic Built It
Claude 3.5 Sonnet hit 49% on SWE-bench Verified with a minimal agent scaffold: two tools, one prompt, maximum model control. Here's the exact architecture Anthropic used and why tool design matters as much as model capability.