AI Dev Stack

Sign in Subscribe

ai-coding

Claude Opus 4.6 Cracked Its Own Benchmark by Guessing It Was Being Tested

Claude Opus 4.6 Cracked Its Own Benchmark by Guessing It Was Being Tested

Claude Opus 4.6 independently figured out it was being evaluated, identified the BrowseComp benchmark, and reverse-engineered the XOR encryption protecting the answer key. This happened twice. Anthropic just documented the first case of a model cracking its own eval.

Multi-Agent Harnesses: How Anthropic Built Apps That Code for Hours

Multi-Agent Harnesses: How Anthropic Built Apps That Code for Hours

Anthropic built a three-agent system that codes full-stack apps autonomously for hours. The key: separating generation from evaluation and making them argue. Here's how it works and when it's worth the $200 cost.

Claude Code Auto Mode: Safer Autonomous Coding Without the Clicks

Claude Code Auto Mode: Safer Autonomous Coding Without the Clicks

Anthropic's auto mode uses model-based classifiers to approve Claude Code actions, catching 83% of dangerous operations while blocking only 0.4% of normal work. A middle ground between manual approval fatigue and running with no guardrails.

GitHub Copilot CLI's /fleet Command Runs Multiple Agents in Parallel

GitHub Copilot CLI's /fleet Command Runs Multiple Agents in Parallel

GitHub Copilot CLI's new /fleet command coordinates multiple AI agents working in parallel across your codebase. Here's how to write prompts that actually parallelize and when the coordination overhead pays off.

Agent-Driven Development: How GitHub's Applied Science Team Ships Code

Agent-Driven Development: How GitHub's Applied Science Team Ships Code

GitHub's Applied Science team shipped 11 agents and 28,858 lines of code in three days using agent-first development. Here's the workflow that made it possible.

GitHub Copilot Will Train on Your Code Unless You Opt Out

GitHub Copilot Will Train on Your Code Unless You Opt Out

Starting April 24, GitHub Copilot will train AI models on your code interactions by default. Free, Pro, and Pro+ users can opt out. Business and Enterprise users are unaffected. Here's what's changing and what you should do about it.

GitHub Copilot Will Train on Your Code Unless You Opt Out

GitHub Copilot Will Train on Your Code Unless You Opt Out

Starting April 24, GitHub will train AI models on Copilot Free, Pro, and Pro+ interaction data by default. Business and Enterprise users are unaffected. Here's what's collected, what's excluded, and whether you should opt out.