"Memory Systems — Preserving Information Outside the Context Window (Harness Series 3/6)"
Part 2 argued for small context windows. So where does the older information go? Into external memory systems. By 2026, memory itself has become its own infrastructure category.
This article covers agent memory — short-term vs long-term, three types, vector vs graph, solutions like Mem0 and Zep, and where SQLite+FTS5 ends and Pinecone begins.
Series Roadmap (6 parts)
- What Is Harness Engineering?
- Context Engineering
- Memory Systems ← this article
- Tools & Sandboxing
- Multi-Provider Routing
- Evaluation & Ops
1. Memory ≠ Extended Context Window
A common confusion: "context is memory."
The accurate split: - Context = the model input on the current turn (volatile, costs tokens) - Memory = external storage that persists across sessions (durable, queryable)
Context fills fresh every turn. Memory is queried every turn — only relevant fragments get injected into context.
2026 consensus: "Memory is a first-class architectural component." It now has dedicated benchmarks (LoCoMo, LongMemEval), its own research literature, measurable performance gaps, and a tool ecosystem.
2. Three Types of Memory
2-1. Episodic
What happened. Time-ordered events. - "User created the X function yesterday" - "Last session refactored the auth module"
2-2. Semantic
What is known. Time-invariant facts. - "User is a Python developer" - "Project uses PostgreSQL"
2-3. Procedural — added in 2026
How to do things. Repeatable procedures. - "PR procedure for this project: lint → test → review → merge" - "This user wants conventional-commit format checked before commit"
Mem0's v1.0.0 API explicitly added this type in 2026. Earlier APIs covered only episodic + semantic.
3. Vector vs Graph vs SQL/FTS5
Vector Memory (semantic search)
- Embed the query → retrieve top-N most similar memories
- Strength: semantic proximity, fuzzy match
- Weakness: weak at exact relationships
Example: "User mentioned Python" — retrieves relevant memories well.
Graph Memory (relational reasoning)
- Stores events, entities, and relationships as a graph
- Strength: multi-hop reasoning ("X is part of Y, which is at Z")
- Weakness: build/maintenance cost ↑
Example: "User is a Python developer building data pipelines with pandas, at a company that uses dbt and is migrating off Spark" — connected facts retrievable.
SQL + FTS5 (keyword + structured)
- SQLite's FTS5 module is a fast, lightweight keyword search
- Strength: zero infrastructure (single file), 4,300-memory query in <1ms
- Weakness: weak fuzzy match, no embedding semantics
Pinecone p95 latency: 25~50ms. SQLite+FTS5 at the same scale: under 1ms. Pure keyword wins on SQL.
Hybrid (RRF — Reciprocal Rank Fusion)
- Vector top-N + FTS5 top-N → fused via RRF
- Best of both
- OpenClaw memcore and wikycore implement this pattern
4. Solution Comparison (April 2026)
Mem0 (managed)
- Position: User personalization + cost efficiency
- Core tech: Memory Compression Engine — compresses chat history into dense representations
- Effect: claims 80% prompt-token reduction
- Pricing: Managed SaaS. Self-host also possible (open source).
Zep (graph)
- Position: Time-aware knowledge graph
- Core tech: Temporal Knowledge Graph — stores time validity of facts
- Strength: Time-dependent queries like "User used to use Python, now uses Rust"
- Latency: Sub-second
Memanto (academic)
- arXiv 2604.22085
- "Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents"
- Information-theoretic retrieval — ranks by uncertainty reduction
MemMachine (academic)
- arXiv 2604.04853
- "Ground-Truth-Preserving Memory System for Personalized AI Agents"
- Compression policy that prioritizes preservation of original information
memcore (OpenClaw)
- SQLite + vector + RRF + 3-tier memory layers
- Plan / Lessons / Sessions pattern
- v0.5 introduced MemPalace techniques (location-based recall)
5. Production Patterns
5-1. 3-Layer Memory (CLAUDE.md standard)
Layer 1: Project Knowledge (docs/) — permanent, keyword-indexed
Layer 2: Behavioral Rules (lessons.md) — what to do / not do
Layer 3: Session Restore (sessions/) — volatile snapshots
CLAUDE.md recommends this 3-layer approach. Claude Code, OpenClaw, and Hermes all use this skeleton.
5-2. Auto-Harvesting
End-of-session automatically stores new learnings to the right layer: - Error resolution → lessons.md - New domain knowledge → docs/{category}/ - Progress state → sessions/
CLAUDE.md's "Automatic Memory Harvesting (Instinct Pattern)" section formalizes this.
5-3. Dedup + Cascade Update
- Same pattern repeated 2× → promote to rule (increment lessons.md count)
- New info conflicts with existing memory → flag conflict + ask user
- One fact changes → check memory-map.md for connected files
5-4. Eviction Policy
Unbounded accumulation = noise. Example policy: - Time-based: archive after 90 days unused - Use-based: remove if count < 2 and unused > 30 days - LRU: under context pressure, drop oldest first
6. SQL vs Vector — Where to Start?
Starting (personal project, 1 user)
SQL + FTS5 only - Single SQLite file - 100~10,000-memory range is plenty - Zero infrastructure, backup = file copy - OpenClaw memcore's early version started here
Mid-scale (multi-user, 10K+ memories)
SQL + FTS5 + vector (hybrid) - Add Chroma or Qdrant - bge-m3 embeddings (free locally on oMLX) - Combine results via RRF
Large (millions of memories)
Mem0 / Zep / custom infrastructure - Mem0: automated user-persona extraction - Zep: temporal graph + complex relationships - Custom: ElastiCache + Neptune Analytics on AWS
7. Anti-Patterns
7-1. "Skipping vector DB is amateur"
At 4,300 memories: SQLite+FTS5 = <1ms, Pinecone p95 = 25~50ms. SQLite has zero infrastructure cost. Start with SQL.
7-2. "Just dump everything; we can search later"
Unbounded accumulation → noise. Without an eviction policy, retrieval accuracy decays after a year.
7-3. "Write memory once and forget it"
Memory is a living system. Without update / conflict resolution / archive flows, stale memory leads to wrong answers.
CLAUDE.md: "memory records can become stale over time. Verify before using."
7-4. "Pick an embedding model and never change it"
Switching embedding models invalidates all existing vectors. Migration cost is huge. Pick a proven one (bge-m3, voyage-3) up front.
8. Knowledge Graphs in Detail
Graph memory is rising in 2026 because: - LLMs are weak at multi-hop reasoning (vector retrieval limit) - Relations between facts don't reduce to similarity - Time dimension — when a fact was true
KG Memory Build Flow
- Input (conversation/document) → entity extraction
- Relations between entities → LLM-driven or NER + relation classifier
- Store in graph (Neo4j, Neptune, custom SQLite)
- Query starts from a node → graph traversal → connected facts
Cost vs Value
- Build: LLM call per input → cost ↑
- Search: traversal is fast
- Value: qualitative gain on multi-hop queries
Recommendation: For per-user personal assistants, KG memory (Zep) is worth it. For single-task agents (e.g., a coding agent), hybrid (SQL+vector) is sufficient.
Bottom Line
| Memory decision | Recommendation |
|---|---|
| Starting point | SQLite + FTS5 |
| Add semantic search | + vector (Chroma/Qdrant) + bge-m3 |
| Need relational reasoning | KG (Zep) |
| User personalization | Mem0 |
| Academic frontier | Memanto / MemMachine |
The single takeaway: "Memory is not an extension of context — it is a separate system."
Part 4 (next) covers what the agent actually does — Tools & Sandboxing.
First-Party Sources
- Mem0 State of AI Agent Memory 2026: mem0.ai/blog/state-of-ai-agent-memory-2026
- Memanto: arxiv.org/abs/2604.22085
- MemMachine: arxiv.org/abs/2604.04853
- AWS Mem0 + Neptune: aws.amazon.com/blogs/database/build-persistent-memory-for-agentic-ai
- Zep Temporal Knowledge Graph: getzep.com
๋๊ธ
๋๊ธ ์ฐ๊ธฐ