GitHub Copilot Code Review: 60M Reviews and the Agentic Shift
GitHub's Copilot code review hit 60M reviews by rebuilding with an agentic architecture. It now handles 1 in 5 GitHub code reviews, stays silent 29% of the time, and helped WEX ship 30% more code. Here's how the new system works.
TL;DR
- Copilot code review usage grew 10X since April 2024, now handling one in five GitHub code reviews
- GitHub rebuilt it with an agentic architecture that retrieves repo context and reasons across changes, driving an 8.1% increase in positive feedback
- The system prioritizes high-signal comments over volume — 71% of reviews surface actionable feedback, 29% stay silent
- 12,000+ organizations now run it automatically on every pull request, with WEX shipping 30% more code after adoption
The Big Picture
GitHub just hit 60 million Copilot code reviews. That's not a vanity metric — it represents a fundamental shift in how teams handle pull requests at scale. One in five code reviews on GitHub now runs through Copilot's AI agent, and the growth curve is steep.
The real story isn't the volume. It's that GitHub completely rebuilt the system's architecture in the past year, moving from a basic thoroughness model to an agentic design that retrieves context, maintains memory across reviews, and reasons about code changes. This isn't incremental improvement — it's a different approach to automated code review.
The results show in production data: 71% of reviews now surface actionable feedback, while the remaining 29% stay silent rather than add noise. That restraint matters. Most AI code review tools fail because they comment on everything. GitHub's agent learned to shut up when it has nothing useful to say.
For teams already using GitHub's agentic execution framework, this code review agent fits into a broader pattern: AI that doesn't just generate code, but understands repository context and makes judgment calls about what matters.
How It Works
The agentic architecture is the key technical shift. Previous versions of Copilot code review analyzed diffs linearly, then generated comments at the end. That approach had a memory problem — early discoveries got lost by the time the agent finished reading a long pull request.
The new system works differently. It builds an explicit review plan before diving into code, mapping out what to check and in what order. As it reads, it flags issues immediately rather than waiting until the end. It can retrieve context from linked issues, previous pull requests, and other parts of the repository to understand whether code matches project requirements.
Most importantly, it maintains memory across reviews. If it identifies a pattern in one pull request — say, a common error in how the team handles async operations — it can reference that context in future reviews. This isn't just pattern matching. The agent reasons about code changes in relation to the broader codebase.
GitHub measures performance through two production signals: developer feedback (thumbs up/down on comments) and whether flagged issues get resolved before merging. When they switched to a more advanced reasoning model, positive feedback rates jumped 6%, even though review latency increased 16%. They kept the slower model. That trade-off tells you what they're optimizing for — signal over speed.
The system now averages 5.1 comments per review. That number stayed stable even as the agent got better at identifying issues, because GitHub tuned it to filter out low-signal observations. A comment only appears if it meets a quality threshold: it needs to explain both the problem and the fix, and it needs to matter for logic or maintainability.
Multi-line comments replaced single-line pins. Instead of flagging one line and forcing developers to guess the scope, the agent attaches feedback to logical code ranges. When it finds the same pattern error in multiple places, it clusters them into one comment rather than spamming the pull request timeline.
Batch autofixes let developers apply entire classes of fixes at once. If the agent identifies ten instances of the same logic bug, you can resolve all of them in one action instead of context-switching through individual suggestions.
What This Changes For Developers
The workflow impact is straightforward: you open a pull request, and Copilot code review runs automatically. It catches bugs, flags maintainability issues, and suggests fixes before a human reviewer looks at the code. That first pass happens fast enough to matter — you're not waiting hours for feedback.
For teams, the shift is bigger. WEX made Copilot code review a default across every repository and saw 30% more code shipped. Two-thirds of their developers now use Copilot, including the most active contributors. That adoption pattern matters — if your best engineers trust the tool, the rest of the team follows.
General Motors uses it to handle pull request reviews and summaries, freeing teams to focus on complex architectural decisions rather than catching missing dependencies or retry loop bugs. The value isn't replacing human review — it's handling the mechanical first pass so humans can focus on judgment calls.
The agentic architecture changes what kinds of issues get caught. Previous versions missed subtle gaps where code looked reasonable in isolation but didn't match project requirements. The new system reads linked issues and pull requests to understand intent, not just syntax. It catches logic errors that only make sense with full repository context.
For developers who already use GitHub's agentic workflows, this fits into a consistent pattern: AI agents that understand your codebase and make contextual decisions. The security architecture is the same, the execution model is the same, and the trust model is the same.
Try It Yourself
Copilot code review is available with Copilot Pro, Pro+, Business, and Enterprise plans. GitHub also offers a limited version without a full Copilot license for teams that want to test it.
To enable automatic reviews on every pull request:
- Go to your repository or organization settings
- Navigate to Copilot settings
- Enable automatic code review for pull requests
- Configure review triggers (all PRs, or specific branches/labels)
The agent runs on the next pull request you open. You'll see comments appear in the pull request timeline, attached to specific code ranges. Each comment includes a suggested fix you can apply directly or modify.
If you want to test it on an existing pull request, add a comment with @copilot review to trigger a manual review.
Full setup documentation: Configure automatic Copilot code reviews
The Bottom Line
Use this if you're shipping code fast enough that pull request review is a bottleneck. The agentic architecture makes it genuinely useful for catching logic bugs and maintainability issues, not just style nits. Teams with high pull request volume — dozens per day — will see the biggest impact.
Skip it if your team is small (under five engineers) or if you're working on a codebase where context is everything and automated review can't possibly understand the domain. The agent is good, but it's not magic. It won't catch architectural problems or business logic errors that require deep product knowledge.
The real opportunity is in the agentic shift. GitHub rebuilt this system to reason about code changes in context, maintain memory across reviews, and make judgment calls about what matters. That architecture pattern — not just the code review feature — is what changes the game. If you're already using GitHub's agentic workflows or SDK, this is the natural next step. If you're not, this is a low-risk way to test whether AI agents can actually improve your team's velocity without sacrificing code quality.
Source: GitHub Blog