"Harness Engineering Basics (2/4) — How AI Agents Actually Work: Context, Tool Calls, and the Agent Loop"

5월 18, 2026

AI agents do not directly touch your computer. The model proposes the next action in text, the harness turns that proposal into a real tool call, and the result comes back as new context. Once you understand that loop, tool misuse, context blow-up, and long-task failures become much easier to explain.

Key Takeaways

The default agent pattern is a repeated loop of observe → plan → act → verify → record.
The model does not directly operate the filesystem or shell. It emits a tool-call instruction, and the harness executes it.
System rules, developer rules, user requests, and tool results are not all the same kind of prompt. They are different input layers.
The context window is not infinite storage. It is a shared budget, so placement matters as much as size.
In practice, many agent quality problems come less from model intelligence than from loop design and input structure.

1. An agent is not a chatbot answer, but a repeated loop

A normal chatbot interaction is simple: the user asks, the model answers, and the turn ends. A work-oriented agent usually does more. It reads, searches, executes, checks, and then decides what to do next.

The pattern usually looks like this:

read the goal and constraints
decide what information is missing
propose the next tool call
read the tool result
continue the loop until the completion condition is met

Compressed into one line:

Agent = model reasoning + harness execution + result reinjection

So an agent is not just a model with a personality. It is a loop that the model and harness run together.

2. The model does not execute directly. The harness does

This is the first misconception to remove. When an agent reads a file or runs a shell command, the model is not directly manipulating the operating system.

What actually happens is simpler:

the model outputs a structured request such as "read this file" or "run this command"
the harness checks whether that call is allowed
if allowed, the harness runs the tool
the result is returned to the model as new context
the model decides the next step

That is why tool calling is not just a feature. It is part of the operating loop.

If you miss this structure, you also miss:

why permissions and sandboxing matter
why the length and format of tool results affect the next answer

3. Agent input is layered, not flat

Calling all agent input a "prompt" hides important differences. In practice, several layers are combined into what the model sees.

Input layer	Role
System instructions	highest-level operating and safety rules
Developer or project instructions	repository-specific workflow and policy
User request	the current task and outcome target
Tool results	facts and execution results from the outside world
Conversation history	the recent reasoning trail and intermediate state

These layers should not be treated the same because their priorities and lifetimes differ.

System rules should be stable.
User requests can redirect the task.
Tool results are useful but often short-lived.
Conversation history is valuable but becomes a compression target.

If you ignore this layering, you often end up with bloated system prompts that try to do everything at once.

4. The context window is a shared budget

The series design document and reading notes both make the same point: context is not just "bigger is better." Multiple elements compete for the same space.

instruction files
the current user task
reference documents
tool results
the model's own recent plans and outputs

So context behaves less like a giant archive and more like a workbench. Only the relevant material should be on it.

This matters for three reasons.

4.1 More material can reduce quality

Long inputs bury important rules. Long-context weaknesses also make middle information easier to lose.

4.2 Tool results are expensive

Search results, long logs, and file dumps can consume a large part of the token budget for several subsequent turns.

4.3 Re-reading on demand is often stronger

Keeping file paths and fetch tools available is usually better than carrying every file inline at all times.

Good agents are not strong because they hold everything in memory. They are strong because they decide what stays outside and what comes in now.

5. Tool results are part of context too

If tool calls are only seen as external execution, the picture remains incomplete. Tool results come back as model input, so tool design immediately becomes context design.

Compare these two search tools:

one returns title, date, source, and a short summary
one dumps thousands of lines of raw text

Both are "search," but the second one can quickly damage the quality of the next few turns.

That is why good harnesses do more than expose tools.

they cap output size
they store large results externally
they give the model summaries and pointers first
they require a second fetch when details are truly needed

This is also where the A4 article on tool engineering begins.

6. The simplest useful agent loop

Real systems vary, but for beginners the following model is enough.

Step	Question	Harness role
Observe	what do we know and what is missing	assemble the right inputs
Plan	what should happen next	keep goals and constraints visible
Act	which tool should be called	check permissions and execute
Verify	did the result improve the state	provide tests, reviews, retry rules
Record	what must survive the turn or session	store logs, handoffs, state files

This is useful because it makes failures diagnosable. Instead of saying "the model was bad," you can ask which stage of the loop was weak.

For example:

observe failure: it never read the needed file
plan failure: it ignored constraints and jumped into action
act failure: it chose the wrong tool
verify failure: it accepted a bad result
record failure: the next session could not resume

7. This maps directly to repository-native structures

In this repository, the agent loop already has familiar anchors.

Loop element	Repository example
High-level instructions	`AGENTS.md`, `CLAUDE.md`
Scope reduction	`tasks/plan.md`, `docs/memory-map.md`
Execution boundary	publish restrictions, protected `config/`
State preservation	`tasks/handoffs/`, `tasks/sessions/`
Verification surface	quality gates, path checks, review steps

Understanding the loop is therefore less about learning new jargon and more about recognizing which document or mechanism owns which part of agent behavior.

8. Common failure signals

Reworking the original notes and earlier drafts, several failure signals show up repeatedly when people do not understand how agents operate.

8.1 Everything is stuffed into one instruction file

Without layer separation, system rules and task-specific rules start to conflict.

8.2 Tool results are reinjected without limits

Large logs and raw search dumps pollute context and weaken later decisions.

8.3 The loop can act but cannot verify

The system reads and executes, but no test or review loop catches bad outcomes.

8.4 Session continuity depends only on conversation history

Long work gets compacted or interrupted. Without handoff notes or state files, every new session starts from reconstruction.

9. Minimum practical principles

You do not need to memorize a complicated architecture chart. The four principles below are enough to reason better about most agents.

The model does not execute directly. The harness executes and returns results.
Agent input is layered. Different layers have different roles and lifetimes.
Context is a shared budget. Reducing and staging inputs often matters more than expanding them.
An agent is a loop. A good loop includes verification and recording, not just action.

Once that is clear, the next topics naturally follow.

A3 will focus on what belongs in instruction files versus external context.
A4 will focus on how tools should be designed so the model can act more accurately.

References

docs/blog_series_하네스엔지니어링_총괄_design.md
sources/260518_하네스엔지니어링_15장_블로그활용노트.md
drafts/blog/260429_harness_series_02_context_engineering_en.md
WikiDocs, Chapter 2 notes from 하네스 엔지니어링 백과사전

This is Part 2 of the Harness Engineering Basics series. Next: Why instruction structure and context design matter more than longer prompts.

Series overview: Harness Engineering Series Guide

이 블로그 검색

MaJu Tech Notes