Agent Self-Improvement Harness (9/12) — memoryFlush: Fitting a Session Core Into 500 Tokens

How a 6-field fixed format and 500-token compression resolved the bloat problem of free-form memory


Key Takeaways

  • Free-form agent memory bloats as sessions accumulate, wastes tokens, and makes retrieval impractical
  • A fixed 6-field schema — Goal / Progress / Decisions / Changed Files / Blockers / Next Steps — delivers both consistency and searchability
  • 300 tokens drops the "why"; 700 tokens introduces redundancy — 500 tokens is the equilibrium

Background

State persistence across sessions is a core challenge in multi-agent systems. When an agent terminates, its working context is lost. This is why memoryFlush exists: a mechanism that compresses and stores critical information at session end. The problem was format.

Free-form memory has three structural defects.

Bloat. The length an agent writes does not correlate with actual importance. Minor bug fixes get long entries; architectural decisions get one line. This inversion is frequent.

Token waste. Resuming a session requires loading prior memory into context. LLM context windows are finite. Bloated memory directly consumes tokens available for real work.

Non-retrievable. Without structure, recovering the reasoning behind a past decision requires a full linear scan of memory.

To address all three, the memoryFlush format was redesigned at the schema level.


Body

1. 6-Field Fixed Schema — Defining What Gets Stored

The root problem with free-form memory is that every flush relies on the agent's ad hoc judgment about what to store. No judgment criteria means no consistency.

The fix is to fix the fields. When memoryFlush triggers, the agent must populate exactly these six fields:

Goal:          What this session set out to accomplish
Progress:      What was actually completed
Decisions:     Decisions made and their rationale
Changed Files: List of modified files
Blockers:      What was stuck or unresolved
Next Steps:    What must happen in the next session

The fixed schema produces three effects.

Consistency. Every session's memory shares an identical schema, regardless of session number.

Searchability. "Why did we change this structure?" → read only the Decisions field. "What's next?" → read only Next Steps. Fields act as an index.

Forced reflection. Empty fields are immediately visible. "Blockers is empty — was there truly nothing blocking?" The agent self-audits for omissions.

2. Trigger Conditions — When Flush Executes

memoryFlush triggers automatically under two conditions.

On session termination. When the agent receives an explicit exit signal, it runs flush immediately. All active working state in context is compressed into the 6 fields.

On context threshold. When context window utilization reaches a configured threshold (default: 80%), a mid-session flush executes. Memory is written, then context is reset, and work continues. Raw context prior to the flush is discarded — only the stored 6-field struct persists.

What is flushed. A compressed session summary structured as the 6 fields. Raw tool output, intermediate reasoning, and conversation history are not stored.

What persists. The 6-field struct itself is written to persistent storage (file or DB). Loading this struct in the next session restores the agent's working context.

3. Tool Output Summarization Rule — Store the Summary, Not the Raw Output

Another source of memory bloat was storing raw tool output: full file diffs, full terminal output, full API responses flowing directly into memory.

One rule was added: tool output exceeding 50 lines is stored as a summary only.

Example: a 100-line error log produces:

(100 lines of raw log)

Error: SQLite migration failed at step 3/7.
Root cause: column name mismatch (expected 'user_id', got 'userId').
Fix: ALTER TABLE + re-run migration.

If the raw output is needed, read the file. Memory retains only what happened. Raw recovery is the file system's responsibility; context preservation is memory's.

4. 500-Token Compression Target — 300 Is Too Few, 700 Is Excess

After establishing the 6-field format and summarization rule, the next question was: how short?

First attempt: 300 tokens. Aggressively compressed. Goal, Progress, Changed Files, and Next Steps fit without issue. But the Decisions field was effectively empty — entries read "switched to SQLite" with no record of why.

This was critical. Follow-on sessions could not reconstruct the reasoning behind past decisions. The core value of memory is preserving decision rationale; 300 tokens did not provide enough room for it.

Second attempt: 700 tokens. Decisions could include full rationale. But redundancy emerged: "tried X, failed because Y, switched to Z" appeared in both Progress and Decisions; Next Steps repeated context already in Progress.

Final: 500 tokens. The point at which Decisions fits "what + why" in a single line, all 6 fields can be filled without duplication, and redundancy has not yet appeared. Sufficient context; no filler.

500 tokens is the default. Sessions with high-stakes architectural decisions may extend to 600–650. Simple bug fixes can compress to 300. The principle: anchor the default at 500, set an upper bound, and adjust by content.


Lessons Learned

"Just store everything." Storage is unlimited but the LLM bottleneck is the context window. Loading memory consumes tokens. Ten memory entries at 1,000 tokens each consume 10K tokens on load alone — directly reducing space available for actual work.

"Let the AI summarize freely." Issuing a bare "save memory" instruction without format constraints produced inconsistent results across agents: full code blocks included, key decisions omitted. The 6-field schema and token cap must be explicitly specified for flush output to be consistent.

The 300-token lesson. "Shorter is better" is the wrong intuition. The highest-value information in memory is not what but why. Changed files are recoverable from git log; decision rationale disappears if not recorded. The compression floor is the point at which "why" survives.


Closing

Three principles define structured memoryFlush.

  1. Fix the fields. Free-form has no consistency. Six fixed fields give every session's memory the same schema.
  2. Store summaries, not raw output. Tool output over 50 lines: summary only. Raw recovery is delegated to the file system.
  3. Default to 500 tokens. The minimum at which "why" survives; the boundary before redundancy begins.

Session memory is the only state an agent can restore in the next session. If that structure is unstructured, restoration quality varies by session. The 6-field format combined with a 500-token ceiling is the simplest mechanism for ensuring flush produces consistent, recoverable output every time.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System