Agent Self-Improvement Harness (6/12) — Evolution of the Reflect Pipeline
Optimizing AI agent long-term memory management from LLM-only to regex-based scripts
핵심 요약
- The Reflect pipeline evolved through three stages: v0.3 (LLM-only) to v0.5 (inverted architecture) to the current script-augmented design
- Once data solidifies into structured formats, replacing LLM calls with regex scripts is the rational move
- The script transition improved testability, predictability, and debugging efficiency across the board
Background
The Reflect pipeline is the long-term memory management system for an AI agent. Every night, it processes the day's temporary memory and merges relevant entries into permanent storage. This post covers the optimization lessons learned from rearchitecting this pipeline three times.
The Evolution
v0.3 (Initial): LLM Handles Everything
All four runners required individual LLM calls. A single failure point in any runner could halt the entire pipeline — a fragile design.
v0.5 (Inverted Architecture): Separate Judgment from Execution
The Manager now processes only a summary report (2k-5k tokens), while downstream Runners handle first-pass processing of raw files. This reduced the LLM's workload while preserving its judgment capabilities.
Current: Script-Augmented Stage
The turning point was the "extract Retain tags from conversation logs" task. The actual data had fully stabilized into structured formats (- W:, - O(c=0.90):, - S[entity]:), so I replaced the LLM runner with a Python regex script.
The division of labor is now clear: structured data is extracted by regex, while the LLM focuses exclusively on semantic judgment and natural language generation.
Pitfalls and Caveats
LLMs are flexible but non-deterministic — they don't guarantee the same output for the same input. Continuing to use an LLM after the data format has been finalized means carrying unnecessary cost and instability. Conversely, switching to scripts while the data format is still in flux means rewriting code with every format change.
The timing heuristic: Switch to scripts when the data format has been stable for at least two weeks without changes.
Takeaway
During the stabilization phase, anything that can be structured should be moved to scripts. Script-based processing is testable, predictable, and easy to debug through failure isolation. Use the LLM only where an LLM is genuinely needed — that's the key to long-term operational sustainability.
댓글
댓글 쓰기