"Memory Systems — Preserving Information Outside the Context Window (Harness Series 3/6)"

Part 2 argued for small context windows. So where does the older information go? Into external memory systems. By 2026, memory itself has become its own infrastructure category.

This article covers agent memory — short-term vs long-term, three types, vector vs graph, solutions like Mem0 and Zep, and where SQLite+FTS5 ends and Pinecone begins.

Series Roadmap (6 parts)

  1. What Is Harness Engineering?
  2. Context Engineering
  3. Memory Systems ← this article
  4. Tools & Sandboxing
  5. Multi-Provider Routing
  6. Evaluation & Ops

1. Memory ≠ Extended Context Window

A common confusion: "context is memory."

The accurate split: - Context = the model input on the current turn (volatile, costs tokens) - Memory = external storage that persists across sessions (durable, queryable)

Context fills fresh every turn. Memory is queried every turn — only relevant fragments get injected into context.

2026 consensus: "Memory is a first-class architectural component." It now has dedicated benchmarks (LoCoMo, LongMemEval), its own research literature, measurable performance gaps, and a tool ecosystem.


2. Three Types of Memory

2-1. Episodic

What happened. Time-ordered events. - "User created the X function yesterday" - "Last session refactored the auth module"

2-2. Semantic

What is known. Time-invariant facts. - "User is a Python developer" - "Project uses PostgreSQL"

2-3. Procedural — added in 2026

How to do things. Repeatable procedures. - "PR procedure for this project: lint → test → review → merge" - "This user wants conventional-commit format checked before commit"

Mem0's v1.0.0 API explicitly added this type in 2026. Earlier APIs covered only episodic + semantic.


3. Vector vs Graph vs SQL/FTS5

Vector Memory (semantic search)

  • Embed the query → retrieve top-N most similar memories
  • Strength: semantic proximity, fuzzy match
  • Weakness: weak at exact relationships

Example: "User mentioned Python" — retrieves relevant memories well.

Graph Memory (relational reasoning)

  • Stores events, entities, and relationships as a graph
  • Strength: multi-hop reasoning ("X is part of Y, which is at Z")
  • Weakness: build/maintenance cost ↑

Example: "User is a Python developer building data pipelines with pandas, at a company that uses dbt and is migrating off Spark" — connected facts retrievable.

SQL + FTS5 (keyword + structured)

  • SQLite's FTS5 module is a fast, lightweight keyword search
  • Strength: zero infrastructure (single file), 4,300-memory query in <1ms
  • Weakness: weak fuzzy match, no embedding semantics

Pinecone p95 latency: 25~50ms. SQLite+FTS5 at the same scale: under 1ms. Pure keyword wins on SQL.

Hybrid (RRF — Reciprocal Rank Fusion)

  • Vector top-N + FTS5 top-N → fused via RRF
  • Best of both
  • OpenClaw memcore and wikycore implement this pattern

4. Solution Comparison (April 2026)

Mem0 (managed)

  • Position: User personalization + cost efficiency
  • Core tech: Memory Compression Engine — compresses chat history into dense representations
  • Effect: claims 80% prompt-token reduction
  • Pricing: Managed SaaS. Self-host also possible (open source).

Zep (graph)

  • Position: Time-aware knowledge graph
  • Core tech: Temporal Knowledge Graph — stores time validity of facts
  • Strength: Time-dependent queries like "User used to use Python, now uses Rust"
  • Latency: Sub-second

Memanto (academic)

  • arXiv 2604.22085
  • "Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents"
  • Information-theoretic retrieval — ranks by uncertainty reduction

MemMachine (academic)

  • arXiv 2604.04853
  • "Ground-Truth-Preserving Memory System for Personalized AI Agents"
  • Compression policy that prioritizes preservation of original information

memcore (OpenClaw)

  • SQLite + vector + RRF + 3-tier memory layers
  • Plan / Lessons / Sessions pattern
  • v0.5 introduced MemPalace techniques (location-based recall)

5. Production Patterns

5-1. 3-Layer Memory (CLAUDE.md standard)

Layer 1: Project Knowledge (docs/) — permanent, keyword-indexed
Layer 2: Behavioral Rules (lessons.md) — what to do / not do
Layer 3: Session Restore (sessions/) — volatile snapshots

CLAUDE.md recommends this 3-layer approach. Claude Code, OpenClaw, and Hermes all use this skeleton.

5-2. Auto-Harvesting

End-of-session automatically stores new learnings to the right layer: - Error resolution → lessons.md - New domain knowledge → docs/{category}/ - Progress state → sessions/

CLAUDE.md's "Automatic Memory Harvesting (Instinct Pattern)" section formalizes this.

5-3. Dedup + Cascade Update

  • Same pattern repeated 2× → promote to rule (increment lessons.md count)
  • New info conflicts with existing memory → flag conflict + ask user
  • One fact changes → check memory-map.md for connected files

5-4. Eviction Policy

Unbounded accumulation = noise. Example policy: - Time-based: archive after 90 days unused - Use-based: remove if count < 2 and unused > 30 days - LRU: under context pressure, drop oldest first


6. SQL vs Vector — Where to Start?

Starting (personal project, 1 user)

SQL + FTS5 only - Single SQLite file - 100~10,000-memory range is plenty - Zero infrastructure, backup = file copy - OpenClaw memcore's early version started here

Mid-scale (multi-user, 10K+ memories)

SQL + FTS5 + vector (hybrid) - Add Chroma or Qdrant - bge-m3 embeddings (free locally on oMLX) - Combine results via RRF

Large (millions of memories)

Mem0 / Zep / custom infrastructure - Mem0: automated user-persona extraction - Zep: temporal graph + complex relationships - Custom: ElastiCache + Neptune Analytics on AWS


7. Anti-Patterns

7-1. "Skipping vector DB is amateur"

At 4,300 memories: SQLite+FTS5 = <1ms, Pinecone p95 = 25~50ms. SQLite has zero infrastructure cost. Start with SQL.

7-2. "Just dump everything; we can search later"

Unbounded accumulation → noise. Without an eviction policy, retrieval accuracy decays after a year.

7-3. "Write memory once and forget it"

Memory is a living system. Without update / conflict resolution / archive flows, stale memory leads to wrong answers.

CLAUDE.md: "memory records can become stale over time. Verify before using."

7-4. "Pick an embedding model and never change it"

Switching embedding models invalidates all existing vectors. Migration cost is huge. Pick a proven one (bge-m3, voyage-3) up front.


8. Knowledge Graphs in Detail

Graph memory is rising in 2026 because: - LLMs are weak at multi-hop reasoning (vector retrieval limit) - Relations between facts don't reduce to similarity - Time dimension — when a fact was true

KG Memory Build Flow

  1. Input (conversation/document) → entity extraction
  2. Relations between entities → LLM-driven or NER + relation classifier
  3. Store in graph (Neo4j, Neptune, custom SQLite)
  4. Query starts from a node → graph traversal → connected facts

Cost vs Value

  • Build: LLM call per input → cost ↑
  • Search: traversal is fast
  • Value: qualitative gain on multi-hop queries

Recommendation: For per-user personal assistants, KG memory (Zep) is worth it. For single-task agents (e.g., a coding agent), hybrid (SQL+vector) is sufficient.


Bottom Line

Memory decision Recommendation
Starting point SQLite + FTS5
Add semantic search + vector (Chroma/Qdrant) + bge-m3
Need relational reasoning KG (Zep)
User personalization Mem0
Academic frontier Memanto / MemMachine

The single takeaway: "Memory is not an extension of context — it is a separate system."

Part 4 (next) covers what the agent actually doesTools & Sandboxing.


First-Party Sources

  • Mem0 State of AI Agent Memory 2026: mem0.ai/blog/state-of-ai-agent-memory-2026
  • Memanto: arxiv.org/abs/2604.22085
  • MemMachine: arxiv.org/abs/2604.04853
  • AWS Mem0 + Neptune: aws.amazon.com/blogs/database/build-persistent-memory-for-agentic-ai
  • Zep Temporal Knowledge Graph: getzep.com

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System