github

GitHub's Accessibility Agent: 3,535 PRs Reviewed, 68% Auto-Fixed

GitHub's experimental accessibility agent has reviewed 3,535 pull requests with a 68% auto-fix rate. Here's how they built it using sub-agents, linear execution, and years of manually cataloged accessibility issues.

TL;DR

GitHub built an experimental accessibility agent that's reviewed 3,535 pull requests with a 68% resolution rate
Uses a two-tier sub-agent architecture: one passive reviewer, one active implementer — no parallel execution
Agent refuses to touch high-risk patterns (drag-and-drop, tree views, data grids) and routes complex cases to humans
Trained on GitHub's manually cataloged accessibility issues — years of structured remediation data beats generic LLM training
If you haven't invested in manual accessibility work yet, you're behind. The European Accessibility Act is live, ADA Title II compliance hits April 2027

The Big Picture

GitHub is piloting an accessibility agent that does two things: answers accessibility questions in Copilot CLI and VS Code, and auto-remediates simple accessibility issues before they ship. The agent has processed over 3,500 pull requests, automatically fixing problems like missing ARIA labels, broken focus order, and DOM structure mismatches.

This isn't a "solve accessibility with AI" play. It's augmentation. The agent catches objective, deterministic issues — the kind that slow down engineers who don't specialize in accessibility. It leaves the hard stuff (complex interactive patterns, contextual judgment calls) to humans.

The timing matters. The European Accessibility Act is in effect. ADA Title II compliance becomes legally binding in April 2027. Organizations that haven't built up a corpus of manually identified and remediated accessibility issues are at a disadvantage — not just for compliance, but for training agents like this one.

How It Works

Most agent guides recommend spinning up a fleet of specialized sub-agents that run in parallel. GitHub tried that. It didn't work. Token costs exploded, output quality tanked, and the agent hallucinated fixes.

Instead, they use a two-tier architecture with sequential execution. The parent agent orchestrates. Two sandboxed sub-agents do the work:

Reviewer sub-agent — Read-only. Audits code, researches WCAG criteria, detects escalation triggers, outputs structured findings.
Implementer sub-agent — Read-write. Takes the reviewer's findings, generates fixes or guidance docs, validates changes.

The sub-agents can't talk to each other. They pass templatized JSON schemas back to the parent agent, which validates output and routes the next step. This creates an audit trail and prevents the agents from going rogue.

Each sub-agent executes instructions in a fixed linear order. The reviewer runs three phases: research (WCAG criteria, GitHub's internal interpretations, assistive tech support), code audit (read source files, validate against decision tables), structured output (findings report with severity scoring). The implementer follows a similar pattern.

Why linear instead of parallel? Accessibility work is contextual and detail-oriented. Speed doesn't matter if the output is wrong. Sequential execution mirrors how a human accessibility engineer would approach the problem.

The agent also runs a shell script to score code complexity. If the score exceeds a threshold, the agent switches to guidance-only mode and tells the engineer to consult the accessibility team. Same logic applies to high-risk patterns: drag-and-drop, toasts, rich text editors, tree views, data grids. The agent won't touch them. It escalates instead.

What This Changes For Developers

The top five issues the agent catches: unclear structure for assistive tech, missing names for interactive controls, no status message announcements, missing alt text, broken focus order. These are friction points that would otherwise require manual review and back-and-forth with accessibility specialists.

The 68% resolution rate means the agent is fixing roughly two-thirds of the issues it identifies without human intervention. The remaining third gets escalated — either because the code is too complex, the pattern is high-risk, or multiple high-severity failures are present.

For engineers, this means fewer accessibility bugs make it to production. For accessibility teams, it means they can focus on the hard problems — design-level decisions, complex interactive patterns, edge cases that require human judgment.

The agent also answers accessibility questions in Copilot CLI and VS Code. This is just-in-time learning. Instead of digging through WCAG documentation or waiting for a Slack response, engineers get answers inline.

The Training Data Advantage

LLMs are trained on decades of inaccessible code. Tell an LLM to "use accessibility best practices" and it will generate antipatterns. GitHub's agent works because it's trained on something better: years of manually cataloged accessibility issues and their corresponding pull request fixes.

GitHub has a structured system for logging accessibility issues. Each issue includes steps to reproduce, severity level, service area, applicable WCAG success criteria, crosslinks to the PR that fixed it, and acceptance criteria. All centralized in a single repository.

This corpus of structured, remediated issues is the agent's strongest asset. The LLM can fuzzy-match against real examples written in GitHub's conventions, using GitHub's patterns. This is why the agent works. Generic training data won't cut it.

If you're building an accessibility agent, invest in manual auditing first. Catalog issues. Document fixes. Build the corpus. Then train the agent. Skipping this step means your agent will generate the same inaccessible patterns the LLM was trained on.

What It Won't Do

Only 64% of WCAG level A and AA success criteria can be detected automatically. The remaining 36% require manual evaluation. The agent helps with the 64%, but it doesn't close the gap on the 36%.

This is why GitHub still invests in manual accessibility work during design and prototyping. Most accessibility issues originate in design. Catching them early prevents costly downstream redesigns. The agent's escalation logic reflects this — it routes engineers to the accessibility team for pairing on design decisions.

The agent also won't generate code for high-risk patterns. Drag-and-drop, tree views, data grids — these require focused attention and detail that LLMs can't reliably produce. The agent knows this. It escalates instead of guessing.

GitHub also built anti-gaming instructions to prevent the LLM from sneaking around its own intervention rules. LLMs want to generate code. You have to explicitly tell them not to when human expertise is required.

The Bottom Line

Use this approach if you have a mature accessibility practice with structured issue tracking and remediation history. Skip it if you're starting from zero — build the manual process first, then automate. The real opportunity here is token efficiency through sub-agent architecture and linear execution. The real risk is deploying an agent without training data and letting it generate inaccessible code at scale. GitHub plans to open source this agent eventually. Until then, the lesson is clear: manual accessibility work isn't a prerequisite for compliance — it's a prerequisite for effective automation.

Source: GitHub Blog