Harness Engineering for Coding Agents: The Operational Workflow
Harness engineering applies the context engineering framework to the actual coding-agent workflow — feature intake, story packets, agent execution, validation, review, and decision recording.
Harness Engineering for Coding Agents: The Operational Workflow
Context engineering gives coding agents durable repository context. Harness engineering applies that same principle to the operational workflow — the steps between receiving a task and handing back a change.
Where context engineering asks “what does the repo know?”, harness engineering asks “what process makes an agent’s output reliable and reviewable?”
This distinction matters because knowing the repo is not enough. Agents need a structured path through a task.
The problem with unstructured task delivery
Most coding-agent sessions start with a prompt like:
“Add a user profile page with avatar upload.”
That is task context at layer 1 — useful, but not enough for a reliable outcome.
The agent has to figure out:
- What files to create or modify
- What the acceptance criteria actually are
- What “done” means for this specific change
- Which validation commands apply
- What not to touch while building
Without a structured task format, the agent infers all of this. Sometimes it infers correctly. Often it does not — and the human reviewer discovers the gap only after the agent has handed back work.
Harness engineering makes that process explicit.
The harness workflow, step by step
Step 1 — Feature intake
Before the agent receives anything, the task is captured in a structured format.
## Feature: User avatar upload
**Problem being solved:**
Users currently have no profile picture option. This creates friction
in community features that depend on visual identification.
**User outcome:**
A user can upload a JPEG or PNG avatar (max 2MB) from their profile
settings page. The image is resized to 200×200 and stored in S3.
**Relevant files:**
- `src/pages/profile.tsx` — existing profile page
- `src/components/Avatar.tsx` — existing avatar component
- `src/api/avatar_upload.py` — new endpoint to create
- `infra/s3.tf` — S3 bucket configuration
**Constraints:**
- Do not modify the auth layer.
- Do not change the existing Avatar component interface.
- Avatar upload must work without page reload.
**Acceptance criteria:**
- [ ] Upload succeeds for valid JPEG/PNG under 2MB
- [ ] Upload fails cleanly with descriptive error for files > 2MB
- [ ] Upload fails cleanly for non-image file types
- [ ] New avatar displays immediately after upload without page reload
- [ ] `npm test -- --testPathPattern=avatar` passes
- [ ] `npm run typecheck` passes
- [ ] No new console errors in browser
**Validation commands:**
`npm test -- --testPathPattern=avatar && npm run typecheck && npm run build`
This is not a long document. It is a shared agreement about what the agent should produce.
The agent reads this instead of inferring scope. The reviewer evaluates against acceptance criteria instead of guessing what “done” means.
Step 2 — Story packet activation
The feature intake becomes a story packet — a focused, bounded unit of work the agent can reason about in a single session.
Story packets share a common structure:
## Story packet
**What to build:** [one sentence]
**Why it matters:** [one sentence]
**Where to work:** [specific files and directories]
**What to validate:** [exact commands]
**When to stop:** [acceptance criteria checklist]
**What not to touch:** [specific boundaries]
The agent works from the story packet, not the raw prompt. This is the harness’s most important function: it replaces a vague ask with a reviewable specification.
Step 3 — Agent execution
The agent reads the repository context (AGENTS.md, architecture notes, decision records), then executes the story packet.
During execution, the agent is expected to:
- Check relevant files before modifying them
- Run validation commands before reporting completion
- Flag anything that blocks progress before changing direction
- Ask a human before touching a safety boundary
The agent does not need to be told these things every time — the AGENTS.md and story packet encode them.
Step 4 — Validation gate
Before handing back work, the agent runs the exact validation commands from the story packet.
If checks fail, the agent fixes the failure and re-runs. No human review until the validation gate passes.
This shifts review burden from “did the agent produce something plausible?” to “does the validated output meet the acceptance criteria?” — a much faster human review.
Step 5 — Human review and decision recording
The human reviews the validated output against the acceptance criteria.
If something is wrong that the harness should have prevented, the gap becomes a harness improvement:
- The agent missed a validation step → add it to the story packet template
- The agent touched a file it should not have → add it to the safety boundaries in AGENTS.md
- The agent reopened a settled decision → add a decision record so future agents can see it
If the output is correct, the change merges. The harness does not need to change.
Step 6 — Decision recording
When a non-obvious choice was made during implementation, it gets recorded:
# Decision: Chose client-side avatar resize before upload
**Date:** 2026-05-28
**Status:** accepted
**Context:** Avatar uploads were triggering server-side timeout for large
files. Resizing on the client before upload avoids the timeout.
**Decision:** Resize in-browser using canvas before uploading to S3.
**Why:** Reduces server load, eliminates timeout edge cases, provides
instant feedback in the browser.
**What would reopen this:** A demonstrated need for server-side thumbnail
generation with multiple resolution variants.
Decision records are the Layer 4 context that makes future agents smarter. Without them, every new agent potentially repeats the same exploration that led to the original decision.
How this connects to the content cluster
This workflow depends on the agent-ready repository foundation:
- Repo context (AGENTS.md, architecture notes) → gives the agent the map to work within
- Decision records → prevents agents from reopening settled questions
- Story packets → replaces vague prompts with reviewable specifications
- Validation matrix → gives agents exact commands for each change type
- Harness workflow → structures the task delivery and review process
Together these form a complete system: the repo knows what it is, the harness knows how work flows through it.
When to use this workflow
Not every task needs a full story packet. The harness scales:
| Task size | Story packet | Decision records | Validation gate |
|---|---|---|---|
| Docs fix | Lightweight | Not needed | Lint only |
| Small component change | Standard | Not needed | Unit tests |
| Feature with new API | Full story packet | Record non-obvious choices | Full suite |
| Architecture change | Full story packet + review step | Required | Full suite + human sign-off |
The investment in process matches the risk and scope of the change.
Start with one story packet
You do not need a complete harness before seeing results.
Pick one task. Write one story packet with:
- What to build (one sentence)
- Where to work (specific files)
- What “done” looks like (acceptance criteria)
- Exact validation commands
Run it with a coding agent. Watch whether the output is more focused, more reviewable, and closer to what you expected.
If yes, write the next story packet with slightly more structure. The harness grows from real tasks, not from templates applied in advance.
See also: Context Engineering for Coding Agents — the foundational framework. What Is an Agent-Ready Repository? — the repo-level checklist. How to Write AGENTS.md That Actually Works — the repo instruction template.