Agent Self-Improvement Harness (8/12) — Governance Tier Gate: Preventing Memory Runaway

workspace-shared Tier 0/1/2 and the plan → apply → verify → audit → fix pipeline


ํ•ต์‹ฌ ์š”์•ฝ

  • In an automated memory system, when the LLM re-decides where to write on every call, policy does not hold. The Tier 0/1/2 gate moves write-location decisions into the policy layer, narrowing the LLM's decision scope.
  • Promotion is split into a 4-step pipelineplan → apply → verify → audit → fix — with the final safety net at the pre-commit hook, which blocks direct commits to bank/_tier2/.
  • Result: write paths are locked by policy, and governance cost converges to small-context verification calls.

What You Will Take Away

  • The structural root cause of automated memory system failures, and how the 3-Tier model moves write-location authority into the policy layer.
  • Which failure modes are eliminated when Tier 2 → Tier 1 promotion is split into 4 independent steps instead of a single LLM call.
  • The scope and limits of a single pre-commit hook line that subjects both humans and LLMs to the same gate.
  • The portability gained by decoupling memory backend from governance policy — why the policy layer survives a backend swap.

Problem Definition — Where Automated Memory Systems Break

Failures in automated memory systems almost always reduce to one point: the LLM re-decides where to write on every call. Even at 95% single-call accuracy, 100 accumulated calls leave 5 entries written to wrong locations — enough to skew the entire memory structure. All subsequent retrieval, reference, and promotion decisions run on context contaminated by those 5 entries.

The direction of the fix is straightforward. Remove write-location decisions from per-call LLM judgment and lift them into the policy layer. The LLM then decides only within already-permitted slots. The 3-Tier model below implements this direction as structure.

3-Tier Model — Write Authority as Policy

Tier Meaning Write Authority
Tier 0 Permanent identity (MEMORY.md, bank/identity/) Human only, PR review required
Tier 1 Curated knowledge (bank/world/, bank/patterns/, etc.) LLM allowed after passing 4-step promotion
Tier 2 Experimental / temporary (bank/_staging/) LLM unrestricted; promoting to Tier 1 requires another 4-step pass

The core principle: 4-step promotion is mandatory on every Tier 2 → Tier 1 path. The LLM writes freely to the staging area (Tier 2), but for that content to become curated knowledge (Tier 1) it must pass through the gate. Tier 0 is human-only — identity and contract-level files are excluded from LLM automation.

The intent is simple: things that change frequently and things that must not change do not share the same write path.

4-Step Promotion Pipeline — Why Steps Are Separated

plan → apply → verify → audit → fix
  1. plan — Produce candidates: what to write and where. No file changes. Read-only.
  2. apply — Write candidates to a temporary branch. File writes occur only in this step.
  3. verify — Mechanically check schema, Retain tags, topic links, and duplicates.
  4. audit → fix — Generate result report. On failure, revert apply and enqueue in the fix queue.

Each step separates read / write / verify. The effect of this separation: no partial results survive. If verify fails, state rolls back to before apply. If apply halts mid-run, the plan output is discardable without harm. When an LLM plans, writes, and verifies inside a single call, mid-call failures accumulate; splitting into steps confines each failure to its own step.

Pre-Commit Blocking — Same Gate for Humans and LLMs

The final safety net of the policy layer is a git hook.

if git diff --cached --name-only | grep -q '^bank/_tier2/'; then
  echo "ERROR: Direct commit to Tier 2 path is blocked. Use the 4-step promotion pipeline."
  exit 1
fi

This single line means that humans and LLMs pass through the same gate. An operator who manually moves a file and commits is blocked; an LLM that attempts to bypass the promotion pipeline and write directly is also blocked. A policy that applies only to the LLM collapses the moment a human punches through it — the pre-commit hook closes that bypass.

The limitation is clear: this hook fires only at commit time. It cannot detect files temporarily written to and deleted from the filesystem. Pre-commit therefore occupies the position of last safety net, not first line of defense.

Observed Behavior and Governance Cost

Before the gate was introduced, unintended files appeared intermittently in the bank/ directory. After introduction, events in that category dropped to unobservable levels.

On cost: the 4-step pipeline does not generate additional LLM calls. It restructures the original single call (plan + write + verify) into discrete steps; the net incremental cost converges to the small-context checks in the verify step. Schema, Retain, and duplicate checks are largely deterministic logic, so incremental LLM token use is bounded.

Limitations and Open Questions

  • LLM dependency in promotion criteria. The "topic link" judgment in the verify step still involves LLM evaluation. There is room to push this further toward deterministic rules.
  • Operational fatigue at tier boundaries. Early on, content may accumulate in Tier 2 without moving up to Tier 1. The design of the promotion trigger (batch? event-driven?) is a separate problem.
  • Multi-repository scaling. The current gate assumes a single workspace. When multiple projects share memory, the Tier policy will require namespacing.

Portability — Policy That Does Not Follow the Backend

This gate structure is decoupled from the memory backend (filesystem, database, vector store). The plan/apply/verify steps map to hook points such as MemoryProvider.on_memory_write; the pre-commit block remains in the git hook unchanged. When the backend changes, the policy layer's contract holds.

Applicability

The Tier gate model is portable to any system that satisfies the following conditions:

  • An automated write process exists in a memory, document, or configuration store (LLM, scheduler, or external process — all qualify).
  • Write paths can be pre-classified by path rules or schema.
  • Changes can be tracked at commit granularity via VCS integration.

If any one of the three conditions is absent, the Tier model cannot provide a complete safety net. In environments where the third condition is missing, a write-barrier of equivalent role must be designed separately instead of relying on pre-commit blocking.

Summary — Lock with Policy, Give the LLM the Locked Slot

The approach to preventing automation runaway is not asking the LLM to re-decide on every call. Lock what policy can lock, and give the LLM decision authority only within already-locked slots. The Tier gate is the consistent application of this principle across the entire write path. Splitting into steps, blocking bypass routes with hooks, and decoupling backend from policy — when these three axes align, governance cost converges to zero.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System