"Harness Engineering Basics (3/4) — Why Instruction Structure and Context Design Matter More Than Longer Prompts"

Many teams treat agent quality as a prompt-writing problem: should the prompt be longer, more detailed, more explicit? In practice, the more important problem is often structural. Which rules stay always visible, and which information is fetched only when needed? Good instruction files behave less like encyclopedias and more like signposts.


Key Takeaways

  • Prompt engineering improves one input message. Context design determines what information enters the work loop, when, and at what level.
  • AGENTS.md, CLAUDE.md, skills, and handoff notes should not all do the same job. Separate shared rules, runtime rules, reusable procedures, and changing state.
  • Context is a scarce resource, so a better design is usually show less by default, fetch more on demand.
  • Long instruction files often feel safer, but they can bury the critical rules and raise the cost of every session start.
  • In practice, instruction quality is often less about wording and more about placement.

1. Why structure matters more than longer prompts

As Part 2 showed, agents do not live in a one-shot question-answer pattern. They operate across multiple turns, reading material, calling tools, and revising plans. That makes the assembly of inputs more important than any single sentence.

So context design is not just prompt engineering with a bigger budget. It is a higher-level operating design that includes prompts inside it.

Question Long-prompt mindset Context-design mindset
Goal explain everything at once stage the right information by layer
Assumption more detail helps too much detail can bury the signal
Failure mode bloated, conflicting instruction needed material arrives too late or not at all
Fix add more wording split structure, reference, load on demand

Strong agents are therefore less like "agents with better prompts" and more like agents with better information routing.

2. Instruction files are different layers, not interchangeable documents

The reading notes and series design document make the same point: instruction, context, and memory should not all be piled into one file. They have different roles.

In this repository, a four-layer split is a natural minimum.

Layer Example Role
Shared rules AGENTS.md project boundaries, hard constraints, role policy
Runtime rules CLAUDE.md family workflow habits, priorities, execution style
Reusable procedures skills, checklists, templates encapsulated repeatable processes
Changing state tasks/plan.md, tasks/handoffs/, tasks/sessions/ current progress and re-entry state

This is not document taxonomy for its own sake. It affects performance directly.

  • Shared rules should be stable.
  • Runtime rules should guide behavior without becoming huge.
  • Reusable procedures can be loaded only when relevant.
  • Changing state should persist, but not always be injected.

If all of these are mixed together, the agent struggles to distinguish "must always know" from "only matters right now."

3. Good instruction files are maps, not encyclopedias

One of the most reusable ideas from the earlier drafts and notes is simple: AGENTS.md and CLAUDE.md should function more like maps than encyclopedias.

Why?

3.1 Long always-on files raise the cost of every session

If a document is loaded every time, its size becomes recurring overhead.

3.2 Critical rules get buried

The one rule that truly matters disappears inside twenty paragraphs of secondary explanation.

3.3 Changing material makes the file stale quickly

Once temporary state, exceptions, and one-off details get mixed in, the document stops being trustworthy.

Good instruction files usually share these traits:

  • short
  • clearly scoped
  • explicit about priorities and prohibitions
  • detailed procedures moved into separate docs or skills
  • changing state kept in handoff or plan artifacts

Their purpose is not to explain everything. It is to keep the agent from getting lost.

4. Context should be split into "always visible" and "load when needed"

Good context design is not about maximum volume. It is about placement. The simplest useful split is:

Type What belongs there Design rule
Always-visible context core role, hard constraints, output rules, current objective keep it short and stable
Load-when-needed context detailed docs, prior session notes, large references, bulky outputs expose by reference and fetch on demand

This matters because agents do not use everything equally well at once. In many cases, the stronger design is to put only the essential material front and center, while leaving the rest accessible by path or tool.

The repository structure here already follows that pattern.

  • current active work in tasks/plan.md
  • memory navigation in docs/memory-map.md
  • prior session state in tasks/sessions/
  • durable boundaries in AGENTS.md

That separation makes new-session re-entry much easier.

5. What belongs where

The most common practical question is straightforward: "Which file should hold this information?"

In practice, the following split is simple and durable.

Put this in AGENTS.md

  • project-wide prohibitions
  • role policy
  • language rules
  • hard boundaries that should never be crossed

Put this in the CLAUDE.md layer

  • workflow order
  • first documents to read
  • editing and verification habits
  • default tool-usage rules

Put this in skills or separate docs

  • long procedures
  • infrequent workflows
  • tasks that need many examples
  • operational knowledge that should load only when relevant

Put this in tasks/ artifacts

  • current progress state
  • re-entry points for the next session
  • open risks and unresolved questions

The cleaner this separation is, the easier it becomes for the agent to stay focused on the current turn.

6. Symptoms of bad context design

When context design goes wrong, the model may look inconsistent, but the root cause is often structural. Common symptoms include:

6.1 The agent keeps missing rules

The key rule is likely buried or duplicated across conflicting files.

6.2 The output becomes unnecessarily long

If the input is long and priorities are unclear, the model also struggles to compress the right things.

6.3 Every session feels slow to warm up

Your always-loaded instruction layer may be too large, or it may include material that is not needed right now.

6.4 Prior work gets re-derived again and again

If you rely on conversation history instead of handoff artifacts, re-entry becomes expensive and unreliable.

7. Practical checklist for instruction structure

At the beginner level, a few structural questions are more useful than any framework.

  1. Does this rule need to be always visible, or only loaded when needed?
  2. Is this project-wide guidance, or temporary task state?
  3. Should this be repeated as a long instruction, or separated into a reusable skill?
  4. Is this document helping because it is long, or hiding the important rule because it is long?
  5. Could the next session resume from this structure without re-deriving everything?

Those five questions alone often clean up a large part of an agent environment.

8. Placement often matters more than wording

This does not mean prompt engineering is useless. It means prompt quality is not enough in an agent environment. Even strong wording loses force inside a bad structure.

The reverse is also true. With a good structure, short instructions often become stronger.

  • keep top-level rules short and fixed
  • separate changing state into ledgers or handoffs
  • move long procedures into skills or reference docs
  • design for on-demand loading

That is the shift from "prompt engineering" toward "harness engineering."

References

  • docs/blog_series_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_์ด๊ด„_design.md
  • sources/260518_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_15์žฅ_๋ธ”๋กœ๊ทธํ™œ์šฉ๋…ธํŠธ.md
  • drafts/blog/260429_harness_series_02_context_engineering_en.md
  • WikiDocs, Chapter 3 notes from ํ•˜๋„ค์Šค ์—”์ง€๋‹ˆ์–ด๋ง ๋ฐฑ๊ณผ์‚ฌ์ „

This is Part 3 of the Harness Engineering Basics series. Next: MCP, tool engineering, and how to design the tool surface deliberately.

Series overview: Harness Engineering Series Guide

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System