"Memory Systems — Preserving Information Outside the Context Window (Harness Series 3/6)"

4월 29, 2026

Part 2 argued for small context windows. So where does the older information go? Into external memory systems. By 2026, memory itself has become its own infrastructure category.

This article covers agent memory — short-term vs long-term, three types, vector vs graph, solutions like Mem0 and Zep, and where SQLite+FTS5 ends and Pinecone begins.

Series Roadmap (6 parts)

What Is Harness Engineering?
Context Engineering
Memory Systems ← this article
Tools & Sandboxing
Multi-Provider Routing
Evaluation & Ops

1. Memory ≠ Extended Context Window

A common confusion: "context is memory."

The accurate split: - Context = the model input on the current turn (volatile, costs tokens) - Memory = external storage that persists across sessions (durable, queryable)

Context fills fresh every turn. Memory is queried every turn — only relevant fragments get injected into context.

2026 consensus: "Memory is a first-class architectural component." It now has dedicated benchmarks (LoCoMo, LongMemEval), its own research literature, measurable performance gaps, and a tool ecosystem.

2. Three Types of Memory

2-1. Episodic

What happened. Time-ordered events. - "User created the X function yesterday" - "Last session refactored the auth module"

2-2. Semantic

What is known. Time-invariant facts. - "User is a Python developer" - "Project uses PostgreSQL"

2-3. Procedural — added in 2026

How to do things. Repeatable procedures. - "PR procedure for this project: lint → test → review → merge" - "This user wants conventional-commit format checked before commit"

Mem0's v1.0.0 API explicitly added this type in 2026. Earlier APIs covered only episodic + semantic.

3. Vector vs Graph vs SQL/FTS5

Vector Memory (semantic search)

Embed the query → retrieve top-N most similar memories
Strength: semantic proximity, fuzzy match
Weakness: weak at exact relationships

Example: "User mentioned Python" — retrieves relevant memories well.

Graph Memory (relational reasoning)

Stores events, entities, and relationships as a graph
Strength: multi-hop reasoning ("X is part of Y, which is at Z")
Weakness: build/maintenance cost ↑

Example: "User is a Python developer building data pipelines with pandas, at a company that uses dbt and is migrating off Spark" — connected facts retrievable.

SQL + FTS5 (keyword + structured)

SQLite's FTS5 module is a fast, lightweight keyword search
Strength: zero infrastructure (single file), 4,300-memory query in <1ms
Weakness: weak fuzzy match, no embedding semantics

Pinecone p95 latency: 25~50ms. SQLite+FTS5 at the same scale: under 1ms. Pure keyword wins on SQL.

Hybrid (RRF — Reciprocal Rank Fusion)

Vector top-N + FTS5 top-N → fused via RRF
Best of both
OpenClaw memcore and wikycore implement this pattern

4. Solution Comparison (April 2026)

Mem0 (managed)

Position: User personalization + cost efficiency
Core tech: Memory Compression Engine — compresses chat history into dense representations
Effect: claims 80% prompt-token reduction
Pricing: Managed SaaS. Self-host also possible (open source).

Zep (graph)

Position: Time-aware knowledge graph
Core tech: Temporal Knowledge Graph — stores time validity of facts
Strength: Time-dependent queries like "User used to use Python, now uses Rust"
Latency: Sub-second

Memanto (academic)

arXiv 2604.22085
"Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents"
Information-theoretic retrieval — ranks by uncertainty reduction

MemMachine (academic)

arXiv 2604.04853
"Ground-Truth-Preserving Memory System for Personalized AI Agents"
Compression policy that prioritizes preservation of original information

memcore (OpenClaw)

SQLite + vector + RRF + 3-tier memory layers
Plan / Lessons / Sessions pattern
v0.5 introduced MemPalace techniques (location-based recall)

5. Production Patterns

5-1. 3-Layer Memory (CLAUDE.md standard)

Layer 1: Project Knowledge (docs/) — permanent, keyword-indexed
Layer 2: Behavioral Rules (lessons.md) — what to do / not do
Layer 3: Session Restore (sessions/) — volatile snapshots

CLAUDE.md recommends this 3-layer approach. Claude Code, OpenClaw, and Hermes all use this skeleton.

5-2. Auto-Harvesting

End-of-session automatically stores new learnings to the right layer: - Error resolution → lessons.md - New domain knowledge → docs/{category}/ - Progress state → sessions/

CLAUDE.md's "Automatic Memory Harvesting (Instinct Pattern)" section formalizes this.

5-3. Dedup + Cascade Update

Same pattern repeated 2× → promote to rule (increment lessons.md count)
New info conflicts with existing memory → flag conflict + ask user
One fact changes → check memory-map.md for connected files

5-4. Eviction Policy

Unbounded accumulation = noise. Example policy: - Time-based: archive after 90 days unused - Use-based: remove if count < 2 and unused > 30 days - LRU: under context pressure, drop oldest first

6. SQL vs Vector — Where to Start?

Starting (personal project, 1 user)

SQL + FTS5 only - Single SQLite file - 100~10,000-memory range is plenty - Zero infrastructure, backup = file copy - OpenClaw memcore's early version started here

Mid-scale (multi-user, 10K+ memories)

SQL + FTS5 + vector (hybrid) - Add Chroma or Qdrant - bge-m3 embeddings (free locally on oMLX) - Combine results via RRF

Large (millions of memories)

Mem0 / Zep / custom infrastructure - Mem0: automated user-persona extraction - Zep: temporal graph + complex relationships - Custom: ElastiCache + Neptune Analytics on AWS

7. Anti-Patterns

7-1. "Skipping vector DB is amateur"

At 4,300 memories: SQLite+FTS5 = <1ms, Pinecone p95 = 25~50ms. SQLite has zero infrastructure cost. Start with SQL.

7-2. "Just dump everything; we can search later"

Unbounded accumulation → noise. Without an eviction policy, retrieval accuracy decays after a year.

7-3. "Write memory once and forget it"

Memory is a living system. Without update / conflict resolution / archive flows, stale memory leads to wrong answers.

CLAUDE.md: "memory records can become stale over time. Verify before using."

7-4. "Pick an embedding model and never change it"

Switching embedding models invalidates all existing vectors. Migration cost is huge. Pick a proven one (bge-m3, voyage-3) up front.

8. Knowledge Graphs in Detail

Graph memory is rising in 2026 because: - LLMs are weak at multi-hop reasoning (vector retrieval limit) - Relations between facts don't reduce to similarity - Time dimension — when a fact was true

KG Memory Build Flow

Input (conversation/document) → entity extraction
Relations between entities → LLM-driven or NER + relation classifier
Store in graph (Neo4j, Neptune, custom SQLite)
Query starts from a node → graph traversal → connected facts

Cost vs Value

Build: LLM call per input → cost ↑
Search: traversal is fast
Value: qualitative gain on multi-hop queries

Recommendation: For per-user personal assistants, KG memory (Zep) is worth it. For single-task agents (e.g., a coding agent), hybrid (SQL+vector) is sufficient.

Bottom Line

Memory decision	Recommendation
Starting point	SQLite + FTS5
Add semantic search	+ vector (Chroma/Qdrant) + bge-m3
Need relational reasoning	KG (Zep)
User personalization	Mem0
Academic frontier	Memanto / MemMachine

The single takeaway: "Memory is not an extension of context — it is a separate system."

Part 4 (next) covers what the agent actually does — Tools & Sandboxing.

First-Party Sources

Mem0 State of AI Agent Memory 2026: mem0.ai/blog/state-of-ai-agent-memory-2026
Memanto: arxiv.org/abs/2604.22085
MemMachine: arxiv.org/abs/2604.04853
AWS Mem0 + Neptune: aws.amazon.com/blogs/database/build-persistent-memory-for-agentic-ai
Zep Temporal Knowledge Graph: getzep.com