"Harness Engineering Series Guide — Where Should You Start If You Want to Design and Run AI Agents Properly?"

5월 18, 2026

📚 Harness Engineering Master Map — 7 Series, 29 Parts

A step-by-step map to read the agent harness series from foundations to appendix.

1. Foundations

① Harness Engineering (6 parts) — Context, memory, tools, routing, evaluation — everything around an agent
② Harness Engineering Basics (4 parts) — The work environment and agent loop that matter more than the model

2. Implementation

③ OpenAI and Claude Harnesses (3 parts) — Reading Responses API and CLAUDE.md as operating surfaces

3. Operations

④ Evaluation, Ops, and Memory (4 parts) — Handoff, guardrails, and memory ownership for long-running agents
⑤ AI Operations Economics (4 parts) — Cost, routing, caching, and context decisions

4. Strategy

⑥ Patterns, Strategy, and Cases (4 parts) — Repeatable structures, design decisions, ACI, and case analysis

5. Appendix

⑦ Companion Appendix (4 parts) — Glossary, source verification, workbooks, and multi-agent Q&A

This series was created to move one step beyond the expectation that "a good model automatically makes a good agent" and explain how the working environment around the model, including instruction files, tool surfaces, permissions, verification, handoffs, and memory, is what really separates strong agents from weak ones. It spans 19 posts in total and is divided into five tracks so readers can enter from the point that fits their needs, whether they are beginners or responsible for operational design.

Korean version: 하네스 엔지니어링 시리즈 안내 — AI 에이전트를 제대로 설계·운영하려면 무엇부터 읽어야 하나

Who this series is for

This series is designed for readers who fall into one of the following groups.

People who are just starting to adopt AI agents but want to understand the structure beyond prompts
People who want to compare OpenAI and Claude not by "model quality" but by "operational surface"
People trying to design a practical harness that includes long-running work, permission control, evaluation loops, and memory ownership
People who want to organize concepts like AGENTS.md, CLAUDE.md, MCP, handoff, and subagent into one coherent framework

If you are only looking for a simple model introduction or API basics, this series leans more toward operational design than that.

What you will get from this series

This series revolves around three core ideas.

Agent quality diverges more because of harness design than because of the model itself.
A good harness is not about adding more features. It is about creating clearer work boundaries and stronger verification loops.
Long-running operation and multi-agent design depend less on a "smart model" and more on operating structures such as artifact, handoff, permission, and memory ownership.

So instead of giving only broad summaries, this series is structured to answer questions like these.

What does it really mean to design a work environment rather than just choose a model
Where do OpenAI and Claude place the harness, and how do they split it differently
At which layer should evaluation, permissions, sandboxing, and audit logs be designed
Why can you not truly own an agent if you do not own its memory
When is a subagent enough, and when do you actually need an agent team

Track guide

A. Foundations Series

This track explains what a harness actually is. It starts by aligning the basic language first, covering the difference between prompts and harnesses, the agent loop, context design, MCP, and the tool surface.

B. OpenAI and Claude Implementation Series

This track reads the two ecosystems not as a "which one is better" argument, but through the question of where the harness lives. It compares what operational philosophy is revealed by things like the Responses API, the Agents SDK, CLAUDE.md, skills, hooks, and permissions.

C. Evaluation, Operations, and Memory Series

This track sits closest to real-world operations. It deals with how to make an agent system hold up in practice through evaluation loops, handoffs, permissions, sandboxing, audit logs, and memory ownership.

D. Patterns, Strategy, and Case Studies Series

This track is useful when you want to look at the whole structure instead of isolated features. Through a pattern language, architectural decisions, the ACI viewpoint, and public case comparisons, it places agent systems inside a larger map.

E. Appendix Companion Series

This track provides practical assets that support the main series. It revisits confusing terms, separates source verification methods, offers worksheets for turning real work into a harness, and clarifies how to think about multi-agent boundaries.

If you only have limited time

If you want the core ideas in the shortest path, these six posts are enough.

A1 Why the harness matters more than the model
A4 MCP and tool engineering
B3 OpenAI vs Claude comparison
C2 Long-running agents and handoffs
C4 Memory ownership
D4 Public case study comparison

Even with just these six, you will grasp the central message of the series: strong agents are shaped more by operating structure than by the model itself.

The one point this series keeps emphasizing

This series repeats the same idea from different angles.

The quality of an AI agent depends more on how you design the working environment around the model than on the model itself.

That is why harness engineering is less about prompt-writing technique and closer to operational design that makes failure harder and makes failure visible faster when it does happen.

Attaching a good model is not enough. You also have to design what it can read, which tools it can see, where it must stop, who approves the next step, and what trace it leaves behind. This series breaks that problem down across 19 posts.

What to read next

If you are starting the series from the beginning, the most natural next step is Harness Engineering Basics (1/4) — Why the Work Environment Matters More Than the Model. If you are already working with OpenAI or Claude, you can jump straight to OpenAI vs Claude Harnesses (3/3) — The Difference in Operating Philosophy.

If you want the Korean reading path, use the Korean landing post.

이 블로그 검색

MaJu Tech Notes