CodeHarness

Most coding-agent failures are not model failures. They are repo-context failures. Better prompts help. But real repositories need instructions, architecture context, validation rules, and decision history.
Coding Agents Need Better Repositories, Not Just Better Prompts published

Coding Agents Need Better Repositories, Not Just Better Prompts

Most coding-agent failures are not model failures. They are repo-context failures. Better prompts help. But real repositories need instructions, architecture context, validation rules, and decision history.

Coding Agents Need Better Repositories, Not Just Better Prompts

The most common way teams try to improve coding agents is with better prompts.

More detailed instructions. Better system messages. Longer context windows. More examples.

Sometimes this helps. Often it does not — and nobody can explain why.

Here is the real problem: coding agents fail most when the repository does not tell them what they need to know.


A different way to think about it

A coding agent has two sources of context:

Prompt context — what you send at the start of a session: task description, constraints, which tool to use, what “good” means.

Repo context — what the repository itself contains: architecture notes, test commands, boundary definitions, decision records, conventions, validation rules.

Most teams invest entirely in prompt context. They send increasingly elaborate instructions. They build better prompting libraries. They try prompts that are 3,000 words long.

Meanwhile the repository stays exactly as it was — a pile of code and a README.

This is the wrong split.

Prompts are temporary. They apply to one task. They do not persist across sessions, and they cannot encode the growing body of project decisions that make a codebase coherent.

Repo context is durable. It accumulates. Every session starts with it. It makes the same instructions unnecessary to repeat.


What goes wrong

When repo context is missing, agents fill in the blanks. Sometimes they guess correctly. Often they do not.

Here are the failure modes that show up repeatedly in coding-agent sessions:

Architectural guessing. The agent sees a monorepo with 12 modules and starts editing files without understanding which boundaries matter. It touches an internal package that other packages depend on. The changes compile. Tests pass in isolation. Then 40 minutes later a human discovers that 6 behaviors broke silently.

Validation blind spots. The agent ships a feature and describes it as “done.” The human reviewer finds 3 missing test cases. The agent did not run the integration suite because it did not know it existed. The repo had no explicit instruction to distinguish unit tests from integration tests.

Reversal of settled decisions. The agent “improves” a piece of code by introducing a pattern the team explicitly rejected 8 months ago. Nobody documented the decision. The agent cannot see that it is reopening a closed issue.

Task boundary confusion. The agent goes way beyond the requested scope. Or it stops too early because it had no clear definition of done. Both happen when tasks arrive without a structured format.

Unsafe discovery. The agent creates an API endpoint and writes no docs because the repo never said “docs are required for public interfaces.” That context was in someone’s head. Now it is in nobody’s head.

None of these are model intelligence failures. They are information architecture failures. The repo is not communicating what it needs to communicate.


The prompt-first trap

Teams enter the prompt-first trap in a predictable sequence:

  1. Agent makes a bad change
  2. Team improves the prompt
  3. Agent makes a different bad change
  4. Team adds more instructions to the prompt
  5. The prompt becomes a 2,000-word document that nobody reads and the agent ignores anyway
  6. The team declares coding agents unreliable

Step 6 is wrong. The agent is unreliable because the repo gave it no durable context to work from.

Prompts are horizontal — they apply across different repos. Repo context is vertical — it is specific to one project. Vertical context is always more valuable than horizontal context for project-specific work.


What better repositories provide

A good repository for coding agents gives agents the same context a good onboarding gives a human developer:

What this project is — not just the README title, but what it does and who it is for.

Which files matter — a map of the directory structure and why things are where they are.

What rules are non-negotiable — coding conventions, architectural boundaries, safety constraints.

What checks prove correctness — exact commands, not just “run tests.”

Which decisions are already made — what the team rejected and why, so agents do not reopen it.

How to structure tasks — how a feature request becomes a change, not just “do the thing.”

This is not a giant document. A useful version fits on one screen.

The harness-experimental project is one attempt at encoding this for real repositories. It provides templates, checklists, and structural conventions that agents can read at the start of every session.


Why repository context compounds

Prompt improvements apply to one session. Repository improvements apply to every session, forever, for every agent and every developer who touches the repo.

If you improve your AGENTS.md file today, every future coding-agent session is better — without changing any prompts.

This is leverage that prompting cannot match.

It also means the repository itself becomes an asset. Better structure = better agents = better output = higher confidence in AI-assisted development.


The practical starting point

You do not need to rebuild your repository. You need to add three things:

1. An AGENTS.md file — at the root of the repo. 500 words maximum. Describes the project, where to read first, what not to touch, and which commands prove correctness.

2. A validation matrix — a small table that maps change types to exact commands. Frontend changes: run these tests. Documentation: run this lint. API changes: run this integration suite.

3. A decision log — three lines about each architectural decision that would otherwise be invisible. “We chose PostgreSQL over MongoDB because —.” “We do not use class inheritance in domain models.” These become boundaries agents learn from.

That is enough to change what coding agents do in the repo. The rest is iteration.


FAQ

Are better prompts still useful?

Yes. Prompts describe the current task. Repo context provides durable operating environment. Both are necessary. The mistake is confusing the two — using prompts to carry context that should live in the repo.

What repo context matters most?

Exact validation commands. Most agent failures happen because the agent did not know which checks to run. If nothing else, make sure every repository makes clear exactly how to validate a change before handing it back.

How does this apply to small repos?

Even more so. Small repos often have less documented architecture, which means agents infer more and guess more. An AGENTS.md file costs 30 minutes and prevents a lot of wasted iteration.

Is this only for autonomous agents?

No. Human-guided agents benefit just as much. Humans still decide what to build, but agents that have good repo context make better suggestions, ask better questions, and produce more reviewable changes.


What to do next

Audit your current repository. Open the repo as if you were a coding agent seeing it for the first time. What do you not know that you would need to know to make a useful change without guessing?

That gap is what repo context should fill.

Try harness-experimental as a starting point for adding this structure to your own projects. Or open an issue with a failure case — sharing what went wrong in a coding-agent session helps the community build better patterns.


See also: What Is an Agent-Ready Repository? — a practical checklist and reference for making any repository more legible to AI coding agents.