"Harness Engineering Series Guide — Where Should You Start If You Want to Design and Run AI Agents Properly?"
๐ Harness Engineering Master Map — 7 Series, 29 Parts
A step-by-step map to read the agent harness series from foundations to appendix.
1. Foundations
① Harness Engineering (6 parts) — Context, memory, tools, routing, evaluation — everything around an agent
② Harness Engineering Basics (4 parts) — The work environment and agent loop that matter more than the model
2. Implementation
③ OpenAI and Claude Harnesses (3 parts) — Reading Responses API and CLAUDE.md as operating surfaces
3. Operations
④ Evaluation, Ops, and Memory (4 parts) — Handoff, guardrails, and memory ownership for long-running agents
⑤ AI Operations Economics (4 parts) — Cost, routing, caching, and context decisions
4. Strategy
⑥ Patterns, Strategy, and Cases (4 parts) — Repeatable structures, design decisions, ACI, and case analysis
5. Appendix
⑦ Companion Appendix (4 parts) — Glossary, source verification, workbooks, and multi-agent Q&A
This series was created to move one step beyond the expectation that "a good model automatically makes a good agent" and explain how the working environment around the model, including instruction files, tool surfaces, permissions, verification, handoffs, and memory, is what really separates strong agents from weak ones. It spans 19 posts in total and is divided into five tracks so readers can enter from the point that fits their needs, whether they are beginners or responsible for operational design.
Korean version: ํ๋ค์ค ์์ง๋์ด๋ง ์๋ฆฌ์ฆ ์๋ด — AI ์์ด์ ํธ๋ฅผ ์ ๋๋ก ์ค๊ณ·์ด์ํ๋ ค๋ฉด ๋ฌด์๋ถํฐ ์ฝ์ด์ผ ํ๋
Who this series is for
This series is designed for readers who fall into one of the following groups.
- People who are just starting to adopt AI agents but want to understand the structure beyond prompts
- People who want to compare OpenAI and Claude not by "model quality" but by "operational surface"
- People trying to design a practical harness that includes long-running work, permission control, evaluation loops, and memory ownership
- People who want to organize concepts like
AGENTS.md,CLAUDE.md,MCP,handoff, andsubagentinto one coherent framework
If you are only looking for a simple model introduction or API basics, this series leans more toward operational design than that.
What you will get from this series
This series revolves around three core ideas.
- Agent quality diverges more because of harness design than because of the model itself.
- A good harness is not about adding more features. It is about creating clearer work boundaries and stronger verification loops.
- Long-running operation and multi-agent design depend less on a "smart model" and more on operating structures such as
artifact,handoff,permission, andmemory ownership.
So instead of giving only broad summaries, this series is structured to answer questions like these.
- What does it really mean to design a work environment rather than just choose a model
- Where do OpenAI and Claude place the harness, and how do they split it differently
- At which layer should evaluation, permissions, sandboxing, and audit logs be designed
- Why can you not truly own an agent if you do not own its memory
- When is a subagent enough, and when do you actually need an agent team
Recommended reading order
The safest default path is A -> B -> C -> D -> E.
Abuilds the shared language and conceptual map.Bhelps you compare how OpenAI and Claude expose their implementation surfaces.Cfocuses on the evaluation, handoff, safety, and memory issues that show up in real operations.Dbroadens the view through patterns, strategy, and public case studies.Eworks like a companion asset set with a glossary, verification methods, worksheets, and Q&A.
That said, not everyone needs to read the series strictly from the beginning. You can also start from the track that matches your situation.
- Beginner:
A1 -> A2 -> A3 -> A4 - Need the OpenAI/Claude comparison first:
B1 -> B2 -> B3 - Responsible for operational design:
C1 -> C2 -> C3 -> C4 - Want patterns and structural comparisons:
D1 -> D2 -> D3 -> D4 - Need working notes and worksheets:
E1 -> E2 -> E3 -> E4
Track guide
A. Foundations Series
This track explains what a harness actually is. It starts by aligning the basic language first, covering the difference between prompts and harnesses, the agent loop, context design, MCP, and the tool surface.
- Harness Engineering Basics (1/4) — Why the Harness Matters More Than the Model in AI Agents
- Harness Engineering Basics (2/4) — How AI Agents Actually Work: Context, Tool Calls, and the Agent Loop
- Harness Engineering Basics (3/4) — Why Context Design and Instruction Files Matter More Than Prompts
- Harness Engineering Basics (4/4) — MCP and Tool Engineering: Design the Tool Surface for AI Agents
B. OpenAI and Claude Implementation Series
This track reads the two ecosystems not as a "which one is better" argument, but through the question of where the harness lives. It compares what operational philosophy is revealed by things like the Responses API, the Agents SDK, CLAUDE.md, skills, hooks, and permissions.
- Building an OpenAI Harness (1/3) — Understanding the Responses API, Tools, and Agents SDK from a Practical Perspective
- Building a Claude Harness (2/3) — How Should You Split
CLAUDE.md, Skills, Hooks, and Permissions? - OpenAI vs Claude (3/3) — What Is Actually Different About Their AI Agent Harness Design?
C. Evaluation, Operations, and Memory Series
This track sits closest to real-world operations. It deals with how to make an agent system hold up in practice through evaluation loops, handoffs, permissions, sandboxing, audit logs, and memory ownership.
- Agent Evaluation Harness (1/4) — How to Validate AI Outputs with Tests, Rubrics, and Regression Evaluation
- Long-Running Agent Operations (2/4) — Designing Handoffs That Keep Work Moving Even When Context Breaks
- AI Operational Safeguards (3/4) — Designing Permission Control, Approval Loops, Sandboxing, and Audit Logs
- Memory Ownership (4/4) — Why You Need to Own Your AI Agent's Memory Directly
D. Patterns, Strategy, and Case Studies Series
This track is useful when you want to look at the whole structure instead of isolated features. Through a pattern language, architectural decisions, the ACI viewpoint, and public case comparisons, it places agent systems inside a larger map.
- 12 Harness Patterns (1/4) — Twelve Structures That Make Good AI Agents Repeatable
- Seven Design Decisions in Harness Engineering (2/4) — Single Agent vs Multi-Agent, Thin Harness vs Thick Harness
- The Harness Is Everything (3/4) — Why ACI and Agent-First Engineering Matter
- Learning Harness Engineering Through Real Cases (4/4) — How OpenAI, Anthropic, Vercel, GitHub, and Cursor Designed Their Systems
E. Appendix Companion Series
This track provides practical assets that support the main series. It revisits confusing terms, separates source verification methods, offers worksheets for turning real work into a harness, and clarifies how to think about multi-agent boundaries.
- Harness Appendix E1 — Glossary and Cheat Sheet: From
AGENTS.mdtohandoffin One Pass - Harness Appendix E2 — Source Map and Verification Method: How Should You Separate Primary Sources, Working Notes, and Case Citations?
- Harness Appendix E3 — My Work Harness Worksheet: Design Cards for Turning Repetitive Work into an AI Agent Harness
- Harness Appendix E4 — Subagent vs Agent Teams: When Should You Delegate, and When Should You Build a Team?
If you only have limited time
If you want the core ideas in the shortest path, these six posts are enough.
A1Why the harness matters more than the modelA4MCP and tool engineeringB3OpenAI vs Claude comparisonC2Long-running agents and handoffsC4Memory ownershipD4Public case study comparison
Even with just these six, you will grasp the central message of the series: strong agents are shaped more by operating structure than by the model itself.
The one point this series keeps emphasizing
This series repeats the same idea from different angles.
The quality of an AI agent depends more on how you design the working environment around the model than on the model itself.
That is why harness engineering is less about prompt-writing technique and closer to operational design that makes failure harder and makes failure visible faster when it does happen.
Attaching a good model is not enough. You also have to design what it can read, which tools it can see, where it must stop, who approves the next step, and what trace it leaves behind. This series breaks that problem down across 19 posts.
What to read next
If you are starting the series from the beginning, the most natural next step is Harness Engineering Basics (1/4) — Why the Work Environment Matters More Than the Model. If you are already working with OpenAI or Claude, you can jump straight to OpenAI vs Claude Harnesses (3/3) — The Difference in Operating Philosophy.
If you want the Korean reading path, use the Korean landing post.
๋๊ธ
๋๊ธ ์ฐ๊ธฐ