junie

What Is an AI Coding Agent? Architecture and Implementation

AI coding agents aren't magic. They're four-stage systems that perceive context, reason with LLMs, execute actions, and evaluate results. Here's how they actually work.

TL;DR

AI coding agents are multi-stage systems that perceive context, reason with LLMs, execute actions, and evaluate results
They differ from raw LLMs by adding autonomous task execution, tool integration, and iterative error correction
JetBrains built Junie to demonstrate how coding agents work in production IDEs
Developers who understand the agent architecture can write better prompts and debug failures faster

The Big Picture

AI coding agents are everywhere now. Students use them for homework. Lawyers use them for case research. Developers feel pressured to adopt them. But most people treat them like magic boxes.

They're not. An AI coding agent is a structured system with four distinct stages: perception, reasoning, action, and evaluation. Understanding this architecture matters because it changes how you prompt, how you debug, and how you decide whether an agent is worth using at all.

JetBrains built Junie as their coding agent for IntelliJ IDEA, PyCharm, WebStorm, GoLand, PhpStorm, RubyMine, and RustRover. They're transparent about how it works. This article breaks down the architecture using their explanations and a working example you can run locally.

The core insight: LLMs are the engine, but agents are the car. An LLM generates text. An agent perceives your codebase, plans actions, executes them, and fixes errors. That difference is everything.

How It Works

An AI coding agent operates in four stages. Each stage has a specific job.

Stage 1: Perceiving the Relevant Information

The agent scans your project. It reads your code, supporting files, and your prompt. Then it packages that data and sends it to the LLM. This isn't just copy-paste. The agent decides what context matters. If you ask it to refactor a function, it needs to know the function's dependencies, the file structure, and any tests that reference it.

Poor perception means the LLM works with incomplete information. That's why agents that integrate deeply with IDEs perform better. They have access to the full project graph.

Stage 2: Reasoning with the LLM

The agent sends the context to the LLM using a specific protocol. This protocol defines the format for prompts and responses. It's not a free-form conversation. The agent might ask the LLM to return an action plan in JSON format, or to include a "thought process" section that explains its reasoning.

This is where the LLM does its work. It processes the code, the prompt, and the available tools. Then it generates a plan. That plan might include steps like "create a new file," "list the directory," or "run the test suite."

Stage 3: Putting the Plan Into Action

The agent parses the LLM's response. If the LLM returned JSON, the agent converts it into executable commands. Then it runs those commands. This might mean writing files, running shell commands, or invoking IDE functions.

The agent logs the result of each step. Did the file get created? Did the test pass? These results feed into the next stage.

Stage 4: Evaluation and Feedback

The agent evaluates the outcome. If something failed, it can ask the LLM to generate a follow-up plan. This creates a feedback loop. The agent tries, fails, adjusts, and tries again.

There's a catch: you need a maximum number of attempts. Otherwise, the agent can loop forever. Good agents set timeouts and retry limits.

This four-stage loop is what separates agents from raw LLMs. An LLM generates a response and stops. An agent keeps going until the task is done or it hits a limit.

What This Changes For Developers

Understanding the architecture changes how you use agents. If you know the agent is working in stages, you can diagnose failures faster.

Prompt failures usually happen in Stage 1 or Stage 2. If the agent doesn't have the right context, it can't generate a useful plan. If you're asking it to refactor code but it doesn't see the test files, it might break your tests. The fix: be explicit about what files matter.

Execution failures happen in Stage 3. The plan was good, but the tools failed. Maybe the agent tried to create a file in a directory that doesn't exist. Or it ran a command that requires permissions. The fix: check the logs. Most agents show you what commands they ran.

Evaluation failures happen in Stage 4. The agent ran the plan, but it didn't realize something went wrong. Maybe it created a file but didn't check if the syntax was valid. The fix: add explicit checks to your prompt. Ask the agent to run tests or validate output.

This also changes how you think about spec-driven development. If you give the agent a clear spec upfront, it has better context in Stage 1. That means better plans in Stage 2 and fewer retries in Stage 4.

Agents also change your workflow. You're not just writing code anymore. You're writing prompts, reviewing generated code, and debugging agent failures. That's a different skill set. Some developers love it. Others find it frustrating.

Try It Yourself

JetBrains provides a working example using the DeepSeek-R1 model and llamafile. This runs entirely on your local machine. No API keys, no cloud services.

First, download the llamafile from Hugging Face. On macOS, Linux, or BSD, make it executable:

chmod +x DeepSeek-R1-Distill-Qwen-7B-Q6_K.llamafile

Launch the model as a local server. The -ngl 999 flag offloads processing to your GPU:

./DeepSeek-R1-Distill-Qwen-7B-Q6_K.llamafile -ngl 999

Now test it with Python. The llamafile server is compatible with OpenAI APIs:

#!/usr/bin/env python3

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1",
    api_key="sk-no-key-required"
)

completion = client.chat.completions.create(
    model="DeepSeek-R1-Distill-Qwen-7B-Q6_K",
    messages=[
        {"role": "assistant", "content": "You are an AI coding agent. Your top priority is perform coding task requested by the user."},
        {"role": "user", "content": "create a Python function that returns the square of a number."},
    ]
)

content = completion.choices[0].message.content
print(content)

The response includes the LLM's "thought process" wrapped in <think> tags, followed by the actual code. This is Stage 2 in action. The LLM is reasoning through the problem before generating the solution.

To turn this into a real agent, you'd add Stage 3 (parse the response and execute the code) and Stage 4 (run tests to verify the function works). That's the architecture in practice.

The Bottom Line

Use an AI coding agent if you're working on repetitive tasks, boilerplate generation, or exploratory refactoring. Agents excel when the task is well-defined and the codebase is structured. They struggle with ambiguous requirements and poorly organized projects.

Skip agents if you're doing deep architectural work, performance optimization, or security-critical code. Agents don't understand trade-offs. They generate code that works, not code that's optimal.

The real risk is treating agents like autopilot. They're not. They're tools that require supervision. If you don't understand the four-stage architecture, you'll waste time debugging failures you can't diagnose. If you do understand it, you'll write better prompts, catch errors faster, and ship code that actually works.

Junie is free to try in JetBrains IDEs. If you want to see how a production agent handles real codebases, start there.

Source: Junie