"Harness Engineering Series Guide — Where Should You Start If You Want to Design and Run AI Agents Properly?"

๐Ÿ“š Harness Engineering Master Map — 7 Series, 29 Parts

A step-by-step map to read the agent harness series from foundations to appendix.

1. Foundations

Harness Engineering (6 parts) — Context, memory, tools, routing, evaluation — everything around an agent
Harness Engineering Basics (4 parts) — The work environment and agent loop that matter more than the model

2. Implementation

OpenAI and Claude Harnesses (3 parts) — Reading Responses API and CLAUDE.md as operating surfaces

3. Operations

Evaluation, Ops, and Memory (4 parts) — Handoff, guardrails, and memory ownership for long-running agents
AI Operations Economics (4 parts) — Cost, routing, caching, and context decisions

4. Strategy

Patterns, Strategy, and Cases (4 parts) — Repeatable structures, design decisions, ACI, and case analysis

5. Appendix

Companion Appendix (4 parts) — Glossary, source verification, workbooks, and multi-agent Q&A


This series was created to move one step beyond the expectation that "a good model automatically makes a good agent" and explain how the working environment around the model, including instruction files, tool surfaces, permissions, verification, handoffs, and memory, is what really separates strong agents from weak ones. It spans 19 posts in total and is divided into five tracks so readers can enter from the point that fits their needs, whether they are beginners or responsible for operational design.

Korean version: ํ•˜๋„ค์Šค ์—”์ง€๋‹ˆ์–ด๋ง ์‹œ๋ฆฌ์ฆˆ ์•ˆ๋‚ด — AI ์—์ด์ „ํŠธ๋ฅผ ์ œ๋Œ€๋กœ ์„ค๊ณ„·์šด์˜ํ•˜๋ ค๋ฉด ๋ฌด์—‡๋ถ€ํ„ฐ ์ฝ์–ด์•ผ ํ•˜๋‚˜


Who this series is for

This series is designed for readers who fall into one of the following groups.

  • People who are just starting to adopt AI agents but want to understand the structure beyond prompts
  • People who want to compare OpenAI and Claude not by "model quality" but by "operational surface"
  • People trying to design a practical harness that includes long-running work, permission control, evaluation loops, and memory ownership
  • People who want to organize concepts like AGENTS.md, CLAUDE.md, MCP, handoff, and subagent into one coherent framework

If you are only looking for a simple model introduction or API basics, this series leans more toward operational design than that.

What you will get from this series

This series revolves around three core ideas.

  • Agent quality diverges more because of harness design than because of the model itself.
  • A good harness is not about adding more features. It is about creating clearer work boundaries and stronger verification loops.
  • Long-running operation and multi-agent design depend less on a "smart model" and more on operating structures such as artifact, handoff, permission, and memory ownership.

So instead of giving only broad summaries, this series is structured to answer questions like these.

  • What does it really mean to design a work environment rather than just choose a model
  • Where do OpenAI and Claude place the harness, and how do they split it differently
  • At which layer should evaluation, permissions, sandboxing, and audit logs be designed
  • Why can you not truly own an agent if you do not own its memory
  • When is a subagent enough, and when do you actually need an agent team

Recommended reading order

The safest default path is A -> B -> C -> D -> E.

  • A builds the shared language and conceptual map.
  • B helps you compare how OpenAI and Claude expose their implementation surfaces.
  • C focuses on the evaluation, handoff, safety, and memory issues that show up in real operations.
  • D broadens the view through patterns, strategy, and public case studies.
  • E works like a companion asset set with a glossary, verification methods, worksheets, and Q&A.

That said, not everyone needs to read the series strictly from the beginning. You can also start from the track that matches your situation.

  • Beginner: A1 -> A2 -> A3 -> A4
  • Need the OpenAI/Claude comparison first: B1 -> B2 -> B3
  • Responsible for operational design: C1 -> C2 -> C3 -> C4
  • Want patterns and structural comparisons: D1 -> D2 -> D3 -> D4
  • Need working notes and worksheets: E1 -> E2 -> E3 -> E4

Track guide

A. Foundations Series

This track explains what a harness actually is. It starts by aligning the basic language first, covering the difference between prompts and harnesses, the agent loop, context design, MCP, and the tool surface.

  1. Harness Engineering Basics (1/4) — Why the Harness Matters More Than the Model in AI Agents
  2. Harness Engineering Basics (2/4) — How AI Agents Actually Work: Context, Tool Calls, and the Agent Loop
  3. Harness Engineering Basics (3/4) — Why Context Design and Instruction Files Matter More Than Prompts
  4. Harness Engineering Basics (4/4) — MCP and Tool Engineering: Design the Tool Surface for AI Agents

B. OpenAI and Claude Implementation Series

This track reads the two ecosystems not as a "which one is better" argument, but through the question of where the harness lives. It compares what operational philosophy is revealed by things like the Responses API, the Agents SDK, CLAUDE.md, skills, hooks, and permissions.

  1. Building an OpenAI Harness (1/3) — Understanding the Responses API, Tools, and Agents SDK from a Practical Perspective
  2. Building a Claude Harness (2/3) — How Should You Split CLAUDE.md, Skills, Hooks, and Permissions?
  3. OpenAI vs Claude (3/3) — What Is Actually Different About Their AI Agent Harness Design?

C. Evaluation, Operations, and Memory Series

This track sits closest to real-world operations. It deals with how to make an agent system hold up in practice through evaluation loops, handoffs, permissions, sandboxing, audit logs, and memory ownership.

  1. Agent Evaluation Harness (1/4) — How to Validate AI Outputs with Tests, Rubrics, and Regression Evaluation
  2. Long-Running Agent Operations (2/4) — Designing Handoffs That Keep Work Moving Even When Context Breaks
  3. AI Operational Safeguards (3/4) — Designing Permission Control, Approval Loops, Sandboxing, and Audit Logs
  4. Memory Ownership (4/4) — Why You Need to Own Your AI Agent's Memory Directly

D. Patterns, Strategy, and Case Studies Series

This track is useful when you want to look at the whole structure instead of isolated features. Through a pattern language, architectural decisions, the ACI viewpoint, and public case comparisons, it places agent systems inside a larger map.

  1. 12 Harness Patterns (1/4) — Twelve Structures That Make Good AI Agents Repeatable
  2. Seven Design Decisions in Harness Engineering (2/4) — Single Agent vs Multi-Agent, Thin Harness vs Thick Harness
  3. The Harness Is Everything (3/4) — Why ACI and Agent-First Engineering Matter
  4. Learning Harness Engineering Through Real Cases (4/4) — How OpenAI, Anthropic, Vercel, GitHub, and Cursor Designed Their Systems

E. Appendix Companion Series

This track provides practical assets that support the main series. It revisits confusing terms, separates source verification methods, offers worksheets for turning real work into a harness, and clarifies how to think about multi-agent boundaries.

  1. Harness Appendix E1 — Glossary and Cheat Sheet: From AGENTS.md to handoff in One Pass
  2. Harness Appendix E2 — Source Map and Verification Method: How Should You Separate Primary Sources, Working Notes, and Case Citations?
  3. Harness Appendix E3 — My Work Harness Worksheet: Design Cards for Turning Repetitive Work into an AI Agent Harness
  4. Harness Appendix E4 — Subagent vs Agent Teams: When Should You Delegate, and When Should You Build a Team?

If you only have limited time

If you want the core ideas in the shortest path, these six posts are enough.

  1. A1 Why the harness matters more than the model
  2. A4 MCP and tool engineering
  3. B3 OpenAI vs Claude comparison
  4. C2 Long-running agents and handoffs
  5. C4 Memory ownership
  6. D4 Public case study comparison

Even with just these six, you will grasp the central message of the series: strong agents are shaped more by operating structure than by the model itself.

The one point this series keeps emphasizing

This series repeats the same idea from different angles.

The quality of an AI agent depends more on how you design the working environment around the model than on the model itself.

That is why harness engineering is less about prompt-writing technique and closer to operational design that makes failure harder and makes failure visible faster when it does happen.

Attaching a good model is not enough. You also have to design what it can read, which tools it can see, where it must stop, who approves the next step, and what trace it leaves behind. This series breaks that problem down across 19 posts.

What to read next

If you are starting the series from the beginning, the most natural next step is Harness Engineering Basics (1/4) — Why the Work Environment Matters More Than the Model. If you are already working with OpenAI or Claude, you can jump straight to OpenAI vs Claude Harnesses (3/3) — The Difference in Operating Philosophy.

If you want the Korean reading path, use the Korean landing post.

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System