Ontology and Memory Systems (5/13) — 3-Layer Memory Architecture: Domain, Behavioral, Session

AI 에이전트의 생존 비결 — 컨텍스트 오염을 막는 위임과 스킬 아키텍처

Don't load everything — selective loading via keyword matching


Summary

  • Agent memory is split into three layers: domain knowledge (permanent), behavioral rules (promotion-based), session restore (volatile)
  • Memory is never loaded all at once. A keyword index selects only the files relevant to the current task
  • This structure reduces token waste while preserving the agent's access to accumulated experience

Background

The fundamental limitation of LLM-based agents is that memory disappears when a session ends. Every session starts fresh. CLAUDE.md preserves identity, but "what was researched last time," "patterns that previously failed," and "decisions made two sessions ago" are all lost without an explicit structure.

You could store everything in files. The problem is that once you have more than 50 files, "which file should I read?" becomes the bottleneck. Loading all 50 files at session start fills half the context window with memory before any work begins.

A selective load-on-demand structure was necessary. The 3-layer memory architecture is the answer.


Architecture

1. Layer 1: Project Knowledge (docs/) — Permanent Memory

Long-term memory stored under docs/. Once written, files are never deleted (no decay).

docs/
├── domain/          # Topic-specific domain knowledge
├── references/      # Frequently cited sources and institutions
├── patterns/        # Recurring patterns
└── risks/           # Risk records

domain/ — Topic knowledge accumulated through research. When writing repeatedly about AI agents, prior research results persist here instead of repeating the same background investigation every time.

references/ — Metadata on frequently cited sources, institutions, and data providers. Includes trust assessments ("is this source reliable?") and update cadences ("how often does this source refresh?").

patterns/ — Empirical heuristics. Examples: "this type of claim tends to have weak sourcing," "check this data point first in this domain."

risks/ — Historical verification failures, data types that require extra scrutiny, common error patterns.

Each file is capped at 50 lines. If a file exceeds 50 lines, split it. The reason is straightforward: a long file loaded into context consumes proportionally more tokens. Multiple short files support selective loading far better than a single long one.

2. Layer 2: Behavioral Rules (lessons.md) — Promoted from Experience

tasks/lessons.md is the agent's behavioral rule store. If CLAUDE.md rules represent "design-time judgments," lessons.md rules represent "judgments extracted from operational experience."

Promotion mechanism:

Pattern observed once → recorded in docs/patterns/ (observation)
    ↓
Same pattern repeats twice → promoted to lessons.md as a rule
    ↓
Same pattern across 2+ projects → candidate for global memory promotion

The key invariant: two occurrences trigger promotion. Once may be coincidence. When the same mistake or the same success appears twice, it qualifies as a rule.

Example rules in lessons.md:

- Numerical data derived from secondary summaries frequently distorts the original source; trace back to primary
- During fact-checking, never accept "this seems right intuitively" as a verdict
- Each tweet carries exactly one message

lessons.md is loaded automatically at every session start — unlike docs/ files, which are loaded selectively. Behavioral rules must apply regardless of task type.

순차 실행 전략

3. Layer 3: Session Restore (sessions/) — Volatile Memory

Session snapshots stored under tasks/sessions/. Only the most recent snapshot is valid.

When a session grows long, /compact compresses the context. At that point, the current session state is written as a snapshot:

## Session Snapshot
- Task: [current task]
- Progress: [completed / remaining]
- Key Decisions: [decisions made this session]
- Open Questions: [unresolved items]
- Next: [next action]

The next session reads this snapshot and resumes. When a new session starts, the previous snapshot automatically expires.

Why keep only the most recent snapshot? A session snapshot is context for the work currently in progress. Older snapshots are meaningless. If the knowledge gained in an older session was valuable, it should already have been promoted to Layer 1 (docs/) or Layer 2 (lessons.md).

스킬(Skills) 시스템

4. Memory Loading Protocol — Never Read Everything

The critical component of this architecture is the loading strategy. What you read matters more than what you store.

Auto-loaded at every session start: - CLAUDE.md (global + project) - tasks/lessons.md (behavioral rules) - Most recent snapshot from tasks/sessions/ (if present)

Selective load at task start:

1. Scan keyword table in memory-map.md
2. Match keywords against the current task
3. Load only matched files
4. No match → skip loading (token savings)

The single governing rule: never load an entire category at once.

Writing about AI agents does not trigger loading all files under docs/domain/. Only the specific files mapped to "AI agent" keywords in memory-map.md are read.

5. memory-map.md — Keyword Index

memory-map.md is the index for the memory system. It records which keywords map to which files.

## Keyword → File Mapping
| Keyword | Category | File |
|---|---|---|
| AI agent, LLM | domain | docs/domain/ai-agent-landscape.md |
| fact-check, verification | patterns | docs/patterns/fact-check-failures.md |
| Blogger, API, publish | references | docs/references/blog-agent-config.md |

Search protocol: 1. Find relevant keywords in the keyword table 2. Read only the matched files 3. If no match, skip memory loading entirely

Write protocol: 1. Save new memory files under docs/{category}/ 2. Frontmatter must include: title, keywords, created, last_used 3. Always add a row to memory-map.md — memory not in the index does not exist

Without this index, the agent must scan the entire docs/ directory on every session. With 10 files that is tolerable; with 50+, the scan itself consumes meaningful tokens.

6. Token Efficiency — Why This Structure Exists

The LLM context window is finite. Tokens spent on memory are tokens unavailable for actual work.

Token consumption comparison:

Strategy Tokens at session start Problem
Full load All docs/ + lessons.md + sessions/ 30–50% of context consumed by memory
Selective load lessons.md + 1–3 matched docs/ files 5–10% of context is sufficient

The difference is concrete. Full load: 50 memory files × 40 lines average = 2,000 lines loaded at session start. Selective load: lessons.md (~30 lines) + 2 matched files (~80 lines) = ~110 lines total.

Additional token efficiency rules: - Never re-read a file already read in the current session - If tool output exceeds 50 lines, store a summary only - Memory files are capped at 50 lines


Design Considerations

  • Behavioral rules must live in a separate layer. If rules and knowledge are mixed in docs/, it becomes unclear whether a given file should always be loaded or loaded selectively. Behavioral rules always load; domain knowledge loads selectively. Different properties require different layers.

  • A stale memory-map.md creates orphaned memory. If a file exists but is absent from the index, the agent cannot find it. Writing memory must atomically trigger an index update — enforce this as a rule.

  • Keeping multiple session snapshots is inefficient. "Wouldn't keeping the last 3 snapshots provide richer context?" — In practice, session restore consumes three times the tokens for nearly no gain in useful information. Anything worth keeping has already been promoted to Layer 1 or Layer 2.

  • Two occurrences as the promotion threshold is the current equilibrium. One occurrence risks promoting transient observations into rules; three occurrences delays useful rules from becoming active.

  • Token explosion case study: During stable operation on OpenClaw, a transition to Hermes — a full redesign of the memory architecture — was attempted. The result was a token usage explosion. After rolling back to OpenClaw and diagnosing the root cause, Hermes is currently under re-validation (harness git PR#12497). This case demonstrates that memory architecture directly determines token efficiency.


Conclusion

The 3-layer memory architecture is defined by layer separation and selective loading.

  • Layer 1 (docs/) — Permanent knowledge. Keyword search; load only what matches.
  • Layer 2 (lessons.md) — Behavioral rules. Auto-loaded every session. Promoted from experience.
  • Layer 3 (sessions/) — Session restore. Only the most recent snapshot is valid. Volatile.

As long as the principle of not loading everything at once is maintained, token consumption at session start stays nearly constant even as memory grows to 100 files. Agents can leverage accumulated experience without wasting context.

Memory is not about quantity — it is about accessibility. Retrieving the right memory quickly matters more than storing more of it.

댓글

이 블로그의 인기 게시물

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System