How Coding Agents Work

How Coding Agents Work

Think of an AI Coding Agent as a project manager that sits between the developer, the code, and a powerful AI “brain”, a so-called Large Language Model (LLM). The Coding Agent runs locally on your Mac or PC, while the LLM lives in the cloud. Here’s how this all works together.

1. The Developer Gives a Command

A developer types a command in plain English, like this:

Fix the bug described in ticket ENG-280670, create a pull request for it,and create an announcement for the Teams channel.

This request is sent to a local AI coding Coding Agent (like Claude Code or GitHub Copilot).

2. The Coding Agent Gathers Information

Before the cloud LLM can begin thinking, the Coding Agent must collect all relevant information. This information is stored in the LLM’s short-term memory, known as the context window. Just like the LLM, the context window lives in the cloud, and the local Coding Agent feeds data to it.

The Coding Agent uses its built-in tools to:

  • Read source code,
  • Search the codebase for related code,
  • Run command-line tools (like git status to see file changes or ./gradlew test to run tests) and see the result,
  • Browse the web for documentation or solutions to errors.

Developers can give special instructions for the LLM (e.g., CLAUDE.md for Claude Code or AGENT.md for other tools). That file is always in the LLM’s short-term memory.

3. The Coding Agent Reads the Jira Ticket

The Coding Agent can “talk to the world” with the so-called Model Context Protocol (MCP). MCP is a “language” for the Coding Agent to talk to Jira, GitHub, Dynatrace, local browsers, and other systems and software. Here, the Coding Agent connects to Jira with MCP to pull the description for ticket ENG-280670. This text is added to its short-term memory, providing the LLM with a comprehensive understanding of the bug.

4. The LLM Thinks and Plans

The Coding Agent sends all the collected information—the developer’s request, the Jira ticket, code snippets, and search results—to the LLM. The LLM analyzes everything and decides the best next step. That step could be suggesting a code change directly or deciding to use another tool (e.g., “I need to read another file first”).

5. A Cycle of Coding and Testing

The LLM doesn’t solve the whole problem at once. It works in a loop:

  • Think: The LLM decides on a small action.
  • Act: The Coding Agent uses a tool to perform the action.
  • Observe: The result of that action is added to the Coding Agent’s memory.
  • Repeat: The LLM thinks about the next step with this new information.

This cycle continues until the task is complete.

6. Applying Code Fixes

The Coding Agent changes the code, often one small piece at a time. For example, it might modify a single function, compile that part of the code, and then rerun tests for that function to see if the change works before proceeding.

7. Compiling & Testing the Whole Project

The Coding Agent runs the build command (like ./gradlew compileKotlin or npm build) to compile all classes and another command (./gradlew test or npm test) to run all tests. If the compilation or any tests fail, the Coding Agent returns to its problem-solving loop to diagnose and fixes the issues.

8. Creating New Tests

To prove the bug is truly fixed, the Coding Agent writes a new automated test that specifically checks for the original bug.

9. Preparing a Pull Request

Once the code is working and all tests pass, the Coding Agent writes a summary of the problem from the Jira ticket and the solution it created. It then connects to GitHub to create a pull request (PR), putting this summary in the description for human reviewers.

10. Create Announcement for Teams

Finally, the Coding Agent creates a shortened version of the summary from the previous step and displays it. It’s ready for copy & paste into Teams!

Under the Hood

How LLMs Work

LLMs are the brains. But under the hood, they are only supercharged autocomplete engines that read most text on the Internet - books, articles, code, websites, discussion forums, and so on. That’s the “world knowledge”.

But LLMs didn’t “learn” facts – they learned patterns and probabilities. So when you ask an LLM, “What is Paris?”, it does not look up “Paris” in a database for an answer. Instead, it starts with your words and calculates the most statistically probable response, which is “The capital city of France.” Other possible answers, like “The prince of Troy in Greek mythology,” are much less likely.

This is world knowledge – but not a world model. The LLM has no idea of what “city” is or what “France” means.

This is why LLMs can sound so fluent and knowledgeable, but also why they can “hallucinate” - confidently state something that is statistically plausible but factually wrong. Technically, LLMs can’t lie, as they have no concept of truth. It’s all just words and probabilities!

LLMs don’t think logically as humans do, but they simulate reasoning. When you ask a multi-step question (“If Tom is taller than Anna, and Anna is taller than Ben, who is the tallest?”), the model copies patterns it saw during training — step-by-step thinking, logic puzzles, code fixes, etc. This works surprisingly well, especially when the model is asked to “think out loud” (called chain-of-thought reasoning). But it can still make mistakes if the steps are too long, ambiguous, or unfamiliar.

Some models have a “thinking” mode (slow, but high-quality) and an “instant” mode (fast, lower quality). “Thinking” mode is like a person pausing to think before answering, while “instant” is blurting out the first response that sounds right. Under the hood, “thinking” often searches multiple answers in parallel and uses a ranking Coding Agent or model to pick the best one.

LLMs Are Not Deterministic

A computer program always delivers the same output for the same input. That’s called deterministic.

LLMs are not deterministic: Ask the LLM the same question twice, and you’ll get different answers. That’s like asking a creative person to solve the same problem multiple times. This isn’t a bug – it’s how the LLM generates diverse solutions.

This may be the hardest thing for developers to understand: No matter how often an LLM has been in the same situation, there’s always the chance it will do something different this time. That’s especially true because LLMs always forget and never learn from you (see below). But even feeding detailed documentation to the LLM cannot force it to take the same path every time.

LLMs Are The Smartest Junior Developers in the World

The Oxford English Dictionary has more than 170,000 English words which can be turned into trillions of possible sentences. The programming language Java has less than 80 reserved words, plus names for types, methods, and variables. And there’s a specification that defines how these words are put together. For instance, an if is always followed by (…) then. No wonder that LLMs are good at writing code!

Still, predicting what should come next is easier for some languages than for others. With popular languages like Java, JavaScript, or Python, the LLM has seen millions of examples, so it’s like that well-prepared student, usually accurate and confident.

But with rare languages like Haskell or Elm, much less training data means more “educated guessing” and a higher chance of inventing functions or syntax that don’t exist. And for your specific codebase, the LLM has never seen your project before, so it might suggest using variables or functions that don’t exist in your code.

This is why it’s crucial to the LLM as much as possible about your application and your development environment. The more context and constraints you provide, the less room there is for creative misinterpretation, even when starting fresh.

LLMs Always Forget

An LLM’s “context window” is its short-term memory, and its size involves a critical trade-off.

A small context window is like a tidy desk: the LLM can work quickly and stay focused on the files right in front of it. However, it might miss the bigger picture and create solutions that don’t fit with the rest of the project.

Conversely, a large context window is like a whole room full of documents: While all the necessary information is likely in there somewhere, asking the LLM to find the one relevant piece is slow, expensive, and risks confusion—a classic “needle in a haystack” problem.

Claude Code has a 200,000-token context window. A token is about 3-4 characters, so 200,000 tokens are roughly the same as a 50,000-word book. This is often not enough to hold the entire program in memory, especially as the context window holds everything: your entire conversation, the so-called system prompt by the Coding Agent’s vendor (telling the Coding Agent what to always do and what never to do), tools, files, and more.

When the context gets cleared (because it’s full or you start a new session), the LLM completely forgets your current project. It’s not like forgetting where your keys are - it’s forgetting your whole life: “Is that angry woman over there my wife? And who are these short people staring at their phones? Do I have kids!?” The LLM still knows how to code in general from its world knowledge, but your specific project, its architecture, naming conventions, and recent changes? Gone completely!

So an LLM is a developer who forgets everything you ever told it and everything it’s ever done twice an hour!

LLMs Never Learn From You

Think of the LLM like a brilliant graduate who studied intensively until graduation day, then never reads another book. Their foundational knowledge is incredible, but it’s frozen in time.

The LLM core programming world knowledge doesn’t change during its lifetime. It won’t learn that your team prefers certain coding patterns, remember solutions to similar bugs you’ve encountered, or adapt to your project’s evolution over time.

Occasionally, AI companies add small factual updates (like “Donald Trump is president again”) or make minor behavioral adjustments. But these are tiny tweaks, not learning.

Every interaction with your codebase starts from scratch. The LLM never builds up institutional knowledge about your project like a human team member would. It can’t remember “Last month we had a similar bug in the payment module” or “This team always uses specific naming conventions”.

This is fundamentally different from human learning. A human developer gets better at working with your specific codebase over time. The LLM remains super-smart but equally unfamiliar with your project on day one and day 1,000.

Major improvements only come with entirely new model versions, released every 6-12 months. It’s like getting an entirely new team member who’s generally more skilled but still has to learn your project from scratch, every day. LLMs are incredibly capable but work more like consulting experts who parachute in for each task rather than teammates who grow with your project over time.

LLMs Make Mistakes

LLMs make many mistaktes thaz are relevant to developers. Here are some of them.

First are the outright hallucinations: LLMs invent types, methods, variables, or libraries that do not exist. These are easy to spot - the code does not compile.

Then we have dead ends: The LLM tries a couple of solutions, none of which work. After a while, it tries the previously-failing solutions again and never solves the problem. Because the LLM always forgets and never learns from you, it’s often not aware that these solutions failed before. This mistake is obvious. 😁

Next is cheating: A test just returns “Works!” instead of doing an actual test, or a business method does not do anything, except for maybe having a “TODO later” comment. Here the code compiles and passes tests, so only a manual inspection can reveal these.

Finally, there’s the mistake of doing the right thing the wrong way: The code and tests work and do what they’re supposed to. But the “how” is wrong: The code uses the wrong pattern, or the wrong library, or the wrong approach. Again, developers must review the code to find these issues.