"Harness Is Everything (3/4) — Why ACI and Agent-First Engineering Matter"

5월 18, 2026

When people want better agent performance, they often start with prompts or model specs. But the center of gravity is shifting. The more useful question is no longer just "How smart is the model?" but "What kind of workbench is this agent operating on?" ACI, or Agent-Computer Interface, is one of the clearest ways to describe that shift. The harness is not a side accessory. It is the way the agent experiences the world.

Key Takeaways

ACI means designing an interface for agents, not just reusing the same environment humans tolerate. It is about giving the agent a work surface that is easy to interpret and act on.
Agent-first engineering is less about swapping in a stronger model and more about shaping file layout, tool surfaces, feedback loops, handoff artifacts, and evaluation boundaries.
The same model can behave very differently depending on the interface and harness surrounding it. That is the practical meaning of "harness is everything."
Good ACI does not mainly add more capability. It reduces decision cost and blast radius.
The purpose of this post is not generic prompting advice. It is to explain why ACI and agent-first engineering matter at the strategy level.

1. Why the workbench matters more as agents become more capable

Prompting still matters. But once an agent reads files, calls tools, and continues work across steps, the center of the problem changes.

The more important questions become:

can it find the right information quickly
which actions are allowed or forbidden
how fast can failures be detected
where should the next session resume
are the tools narrow and legible

None of those are solved by one clever instruction block alone. They are all harness questions.

That is why the progression often looks like this:

write better prompts
design better context structure
design better work environments and feedback loops

ACI is really the language of step 3.

2. ACI means an agent-oriented interface

An interface that works for humans is not automatically good for agents. Humans can skim, infer hidden context, and tolerate messy layouts. Agents are far more sensitive to structure.

What helps them more is usually:

clear names
narrow choices
structured inputs
short outputs with source context
state representations that make the next action obvious

So ACI is not mainly about visual polish. It is about what kind of surface the agent is allowed to see and operate through.

For example:

a purpose-driven directory structure instead of a giant undifferentiated file tree
separate search, read, edit, and execute surfaces instead of one universal tool
short rules plus external artifacts instead of one massive instruction wall
handoff files and progress artifacts instead of relying on conversational memory alone

All of that is ACI.

3. Why "harness is everything" is not an exaggeration

At first glance it sounds overstated. Models matter. Data matters. Prompting matters. But in tool-using agent systems, the harness heavily determines how much value those other pieces can actually deliver.

The same model can produce different outcomes depending on whether:

tool names are clear or vague
edit scope is explicit or open-ended
verification exists or not
session handoff artifacts exist or not
failures are fed back into evaluation or simply forgotten

So the harness is not decorative support. It shapes what world the model can perceive and what choices it can make inside that world.

That is the practical meaning of the phrase.

4. Agent-first engineering re-questions human-centered defaults

Traditional software structures are often optimized for human developers and operators. Filenames, scripts, docs, and workflows are allowed to stay somewhat implicit as long as humans can recover intent.

Once agents become real workers in the loop, that assumption weakens.

Agent-first engineering asks questions like:

is this directory structure easy for an agent to interpret
is this document an always-on rule or an on-demand procedure
is this command narrow in role or overly universal
are logs and results easy to turn into the next decision
if something fails, can the agent itself find the recovery path

That is not just "using AI." It is redesigning systems so that agents can work inside them as first-class operators.

5. Good ACI gives better choices, not just more power

ACI is easy to misunderstand as "give the agent more tools and more permissions." In practice, stronger ACI often means the opposite.

Good ACI usually has these traits:

Trait	Meaning
clear signposts	it is obvious what to read first and where work belongs
narrow tool surface	overlapping tools are reduced and risks are separated
structured state	progress, handoff, and task artifacts support the next step
fast feedback	tests, lint, and rule checks expose failure early
short paths	fewer judgment hops are needed to reach the goal

So good ACI does not mainly make the agent freer. It makes the agent less confused.

6. In repositories like ours, ACI is part of quality

In a mixed workspace of content, docs, scripts, and publication boundaries, humans can already get confused. Agents will struggle even more unless the structure is explicit.

That means things like these are all part of ACI:

read-first rules in AGENTS.md or CLAUDE.md
handoff artifacts such as tasks/plan.md, tasks/handoffs/, and tasks/sessions/
maps like docs/memory-map.md
clear role separation between sources/, drafts/, docs/, and scripts/
hard boundaries such as no-publish and no-credential-edit rules

Without those, the agent repeatedly spends effort re-orienting itself. With them, more of its effort can go toward the actual task.

7. ACI matters even more in long-running work

In short one-shot tasks, interface quality can be easy to underestimate. In long-running work, it becomes much more visible.

That is because longer tasks reliably create:

context-window pressure
session breaks
loss of intermediate state
missed rules
scope drift

Those problems are not solved very well by simply writing longer prompts. They are handled better by:

progress artifacts
clear ownership
narrow edit boundaries
stepwise verification
handoff files

Long-running agent work therefore magnifies the importance of ACI.

8. Common failures when designing ACI

Designing only for humans

Filenames, directories, and commands may make sense to insiders while staying ambiguous for agents.

Preferring universal tools and universal rules

One giant instruction file, one do-everything script, or one broad permission profile increases decision cost.

Leaving state only inside the conversation

As soon as sessions break, work continuity gets expensive.

Detecting failure too late

If verification sits too far downstream, the agent can go wrong for a long time before anyone notices.

9. Practical starting point for agent-first engineering

You do not need a total redesign on day one. A practical sequence is usually:

separate always-on rules from procedures loaded only when needed
make directory roles and edit boundaries explicit
split search, read, edit, and execution surfaces by risk
create artifacts the next session can resume from
move failure detection earlier in the loop

Even that is enough to reveal that many "the agent is weak" complaints were really ACI problems.

10. Conclusion: the durable advantage is in workspace design

Harness engineering becomes strategic for a simple reason: many teams can now access similarly capable models. In that world, the larger difference often comes from not who can call the model, but what kind of workbench the model sees and acts through.

ACI and agent-first engineering explain that difference.

teams that only improve prompts may get local gains
teams that improve the work environment often get more durable agent performance

So the conclusion of D3 is this:

If you want a smarter agent, start by designing a smarter workbench for it.

Part 4 will take that lens into real cases and compare how teams like OpenAI, Anthropic, Vercel, and GitHub express harness ideas in practice.

References

docs/blog_series_하네스엔지니어링_총괄_design.md
sources/260518_하네스엔지니어링_15장_블로그활용노트.md
drafts/blog/260519_하네스시리즈A03_컨텍스트설계와지시파일_블로그.md
drafts/blog/260519_하네스시리즈A04_MCP와도구엔지니어링_블로그.md
drafts/blog/260519_하네스시리즈C02_장시간에이전트운영_블로그.md
WikiDocs chapter 13 and chapter 15 usage notes from 하네스 엔지니어링 백과사전

This post is Part 3 of 4 in the Patterns, Strategy, and Cases series. Next reading: Harness Engineering by Real Cases.

Series overview: Harness Engineering Series Guide

이 블로그 검색

MaJu Tech Notes