Continuous AI: Automate Judgment-Heavy Dev Work with Agentic CI

GitHub Next's Continuous AI pattern uses AI agents to automate judgment-heavy dev tasks CI can't handle—syncing docs with code, detecting dependency drift, generating reports. Real examples: 1,400+ tests for $80, automated translations, performance fixes.

Continuous AI: Automate Judgment-Heavy Dev Work with Agentic CI

TL;DR

  • GitHub Next's Continuous AI pattern uses AI agents to automate judgment-heavy tasks CI can't handle—like syncing docs with code, detecting dependency drift, and generating project reports
  • Agents run as GitHub Actions with read-only defaults and explicit permissions for outputs like PRs or issues—developers stay in control
  • Real examples: 1,400+ tests written over 45 days for ~$80, automated translation updates, performance regression detection
  • If you spend time on repetitive work that requires interpretation rather than rules, this pattern is worth testing

The Big Picture

CI solved automation for deterministic work. Tests pass or fail. Builds succeed or break. Linters catch rule violations. That's the easy stuff.

The hard stuff? Reviewing whether a docstring still matches the implementation. Tracking whether a dependency silently changed behavior. Keeping translations current. Detecting performance regressions that only show up under specific conditions. Generating weekly reports that synthesize activity across issues, PRs, and commits.

These tasks require judgment, not just rules. They resist heuristics. And they eat up engineering time precisely because they can't be reduced to a flowchart.

"Any time something can't be expressed as a rule or a flow chart is a place where AI becomes incredibly helpful," says Idan Gazit, head of GitHub Next.

This is the gap Continuous AI fills. Not replacing CI, but handling a different class of automation—work that depends on understanding intent rather than validating against static rules. GitHub Next has been prototyping this pattern: background agents that operate in your repository like CI jobs, but only for tasks requiring reasoning instead of deterministic validation.

The shift mirrors the original CI movement. CI didn't replace developers. It changed when certain work happened—from "when someone remembers" to "every commit." Continuous AI does the same for judgment-heavy chores that were previously manual.

How It Works

Continuous AI is a pattern, not a product. The formula: natural-language rules + agentic reasoning, executed continuously inside your repository.

In practice, you write plain-language instructions expressing what should be true about your code, especially expectations that can't be reduced to heuristics. An agent evaluates the repository and produces artifacts you can review—suggested patches, issues, discussions, or insights.

Example instructions:

  • "Check whether documented behavior matches implementation, explain any mismatches, and propose a concrete fix."
  • "Generate a weekly report summarizing project activity, emerging bug trends, and areas of increased churn."
  • "Flag performance regressions in critical paths."
  • "Detect semantic regressions in user flows."

These workflows combine intent, constraints, and permitted outputs. They're not one-sentence prompts. Developers iterate with the agent to refine expectations, add guardrails, and define acceptable outputs.

The GitHub Next prototype (gh-aw) uses a deliberately simple implementation:

  1. Write an agentic workflow in Markdown
  2. Compile it into a GitHub Action
  3. Push it to your repository
  4. Let the agent run on any GitHub Actions trigger—pull requests, pushes, issues, comments, or schedules

Nothing is hidden. The generated YAML is visible and auditable. The agent operates within the same infrastructure developers already use.

Guardrails are built in. By default, agents have read-only access. They can't create issues, open PRs, or modify content unless explicitly permitted. GitHub Next calls this Safe Outputs—a deterministic contract for what an agent may produce.

When defining a workflow, you specify exactly which artifacts an agent can create (open a PR, file an issue) and under what constraints. Anything outside those boundaries is forbidden. Outputs are sanitized, permissions are explicit, and all activity is logged.

This assumes agents can fail or behave unexpectedly. The blast radius is deterministic. This isn't "AI taking over development." It's AI operating within guardrails you define.

Agentic workflows don't make autonomous commits. They create the same artifacts developers would—pull requests, issues, comments, discussions—depending on what the workflow permits. Pull requests remain the most common output because they align with how developers already review change.

"The PR is the existing noun where developers expect to review work," Gazit says. "It's the checkpoint everyone rallies around."

Developer judgment remains the final authority. Continuous AI scales that judgment across a codebase.

What This Changes For Developers

The shift is mental, not technical. CI already runs in your repository. Continuous AI uses the same infrastructure. The difference is what you can automate.

Consider documentation drift. A function's docstring says one thing. The implementation does another. CI can't catch this because it requires understanding semantics. An agentic workflow can read the docstring, compare it to the code, detect mismatches, and open a PR with suggested fixes.

"You don't want to worry every time you ship code if the documentation is still right," Gazit says. "That wasn't possible to automate before AI."

Or dependency drift. Dependencies often change behavior without bumping major versions. New flags appear. Defaults shift. In one GitHub Next demo, an agent installed dependencies, inspected CLI help text, diffed it against previous days, found an undocumented flag, and filed an issue before maintainers noticed. This requires semantic interpretation, not just text diffs.

Or test coverage. In one experiment, test coverage went from ~5% to near 100%. The agent wrote 1,400+ tests across 45 days for about $80 worth of tokens. Because it produced small PRs daily, developers reviewed changes incrementally rather than facing a massive batch at the end.

Or translations. Content changes in English. Translations fall behind. Teams batch the work late in the cycle. An agent can detect when English text changes, regenerate translations for all languages, and open a single PR with updates. The workflow becomes continuous, not episodic. Machine translations aren't perfect, but having a draft ready for review makes it easier to engage professional translators or community contributors.

Or performance regressions. Linters don't catch inefficiencies that depend on understanding intent—like compiling a regex inside a loop so it recompiles on every invocation. An agent can recognize the pattern, rewrite the code to pre-compile the regex, and open a PR with an explanation.

These aren't theoretical. GitHub Next has tested these patterns in real repositories. The value isn't replacing developers. It's shifting when judgment-heavy chores happen—from "when someone has time" to "continuously."

This mirrors concerns around AI-generated contribution spam, but inverted. Instead of external agents flooding maintainers with low-quality PRs, Continuous AI lets maintainers deploy their own agents to handle internal chores on their terms.

Try It Yourself

You don't need new infrastructure. The GitHub Next prototype compiles agentic workflows into standard GitHub Actions.

Here's a real example from the prototype documentation—a daily status report agent:

--- 
on: daily 
permissions: read 
safe-outputs: 
  create-issue: 
    title-prefix: "[news] " 
--- 
Analyze the recent activity in the repository and: 
- create an upbeat daily status report about the activity 
- provide an agentic task description to improve the project based on the activity. 
Create an issue with the report.

Compile it into an action:

gh aw compile daily-team-status

This generates a GitHub Actions workflow. Review the YAML. Nothing is hidden. Push it to your repository. The agent runs on the schedule you defined, just like any other action.

The agent creates an issue with the report. You review it. If it's useful, you keep the workflow. If not, you iterate or disable it.

Start small. Pick one recurring judgment-heavy task:

  • Translate strings when English content changes
  • Add missing tests for uncovered code paths
  • Check for docstring drift
  • Detect dependency changes
  • Flag subtle performance issues

Each of these is something agents can meaningfully assist with today. Identify the chores that quietly drain attention and make them continuous instead of episodic.

Explore more examples in the Continuous AI Actions and frameworks repository.

The Bottom Line

Use Continuous AI if you're spending time on repetitive judgment calls that resist deterministic rules—syncing docs with code, tracking dependency changes, generating project reports, maintaining test coverage, or detecting performance regressions. These are real time sinks that agents can handle continuously instead of episodically.

Skip it if your automation needs are already covered by traditional CI. Rules-based validation (linting, testing, building) doesn't need AI. YAML and heuristics remain the right tools for deterministic work.

The real risk isn't agents doing too much—it's developers not recognizing which judgment-heavy chores can shift from manual to continuous. If CI automated rule-based work over the past decade, Continuous AI may do the same for select categories of judgment-based work, when applied deliberately and safely. The pattern is early, but the prototype is functional. Start with one small workflow and see if it saves you time. If it does, add another.

Source: GitHub Blog