Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

A memory engine that runs without daemons, without dependencies, anywhere


ํ•ต์‹ฌ ์š”์•ฝ

This post covers how to design an AI agent memory engine (memcore) on a single-file SQLite backend. - Structure: 25 modules, ~6,464 lines, 22 DB tables - Principles: no daemon, single-file portability, no required external dependencies, host-agnostic - Core techniques: 4-layer hybrid search (topic · vector · FTS5 · LIKE), U-tag dialectic-based confidence evolution, ingestion-layer bias rejection rules


What This Post Covers

One approach to building AI agent memory without a server or cloud vector DB. Specifically: (1) the limitations of text-file-based memory, (2) vector DB alternatives compared and selection criteria, (3) how to layer search tiers, (4) how to evolve confidence scores over time, and (5) how to block biased memory accumulation at the ingestion stage.


Design Principles: Why Single-File SQLite

Text-file-based memory incurs increasing context injection cost as sessions grow longer, and search is constrained to LIKE pattern matching. Low-relevance records bleed in, degrading response quality.

Vector DBs (Pinecone, Weaviate, Chroma, etc.) are the common alternative, but they require external servers or introduce cloud dependencies. In an environment where a standalone agent runs on a single host, an external service failure propagates as a total agent failure.

From these constraints, four principles emerge:

  1. No daemon — operates as a library only; no separate process required
  2. Single-file portability — entire memory state moves as one .db file
  3. No required external dependencies — core functionality handled by standard library
  4. Host-agnostic — runs on macOS, Linux, any environment without constraint

SQLite satisfies all four conditions. Extensions (sqlite-vec, FTS5) integrate within the file itself, enabling vector and full-text search without additional infrastructure.


Schema: 22 Tables, Core/Extension Separated

The DB schema consists of 22 tables total: 11 core + 9 extension + 2 subsequently added.

Core tables hold memory entities, U-tag state, embedding vectors, FTS5 index, and topic links. Extension tables are separated by functional unit — session history, research triggers, monthly summaries, etc. This separation ensures schema migrations can target extensions without touching core.


4-Layer Hybrid Search

Search blends four layers by weight:

1. Topic matching  (weight 2.0)  →  fastest; memory classified by topic
2. Vector similarity (weight 1.5) →  semantic; bge-m3-mlx embeddings
3. FTS5 BM25      (default)      →  keyword-based full-text search
4. LIKE fallback  (weight 5.0)  →  fires only when the above three return nothing

Higher weight pushes that source's results toward the top. Topic matching uses human-assigned classifications, so accuracy is high and priority follows. LIKE is last resort — but a LIKE hit is an exact string match, which implies high reliability, hence weight 5.0.

The four layers operate independently. FTS5 absent: the other three still work. Vector absent: topic + FTS5 + LIKE cover basic search. The progressive-enhancement structure means disabling any single layer does not break the system.


Vector Engine Selection: sqlite-vec

Five candidates were evaluated on portability, speed, installation overhead, and dependency profile.

Library Score (35 max) Key Notes
sqlite-vec 28 SQLite extension; best portability
hnswlib 28 Fast; requires separate index file
faiss 22 Most features; heavy installation
Chroma 19 Requires server
Pinecone 15 Cloud-dependent

sqlite-vec and hnswlib tied on score; sqlite-vec won on portability. Loading as a SQLite extension means the vector index lives inside the .db file — no separate index file to manage.

The embedding model is bge-m3-mlx-fp16. Runs locally via oMLX, 1024 dimensions, simultaneous Korean/English support, zero additional cost. In environments with mixed-language text, a single bilingual model maintains retrieval consistency better than a model-switching architecture.


U-tag Dialectic: How Confidence Evolves

When a record says "the user likes coffee," distinguishing between a one-time observation and a repeatedly verified fact is essential to maintaining response quality. A simple accumulation counter is insufficient — a staged confidence evolution model is required.

U-tag (User Tag) manages confidence through a 4-stage structure:

observation  →  hypothesis  →  verified  →  user_md_review
(1x observed)   (2x observed)   (3x, different days)   (5x or 3x + 14 days)

Per-stage confidence calculation:

confidence = 0.1 * count  # max ~0.1–0.3

confidence = min(0.3 + 0.1 * count, 0.6)

confidence = min(0.5 + 0.1 * count, 0.95)

The critical condition is "different days." Three mentions in the same session and context carry approximately the same information as one observation. Only observations on different dates are treated as independent evidence, preventing confidence distortion.

Memories of type opinion decay over time:

confidence -= 0.02 * days_since_last_update

Human opinions change. Without decay, fixed confidence from past opinions continuously interferes with current judgment.


Bias Rejection Rules — Block at the Ingestion Stage

If the memory engine accumulates biased records, agent judgment becomes contaminated at its source. Rules hardcoded into the ingestion layer:

  • No collection of emotional statements — transient emotional states such as "I'm annoyed" or "not feeling it today" are not stored in memory
  • No joke-based interpretation — jokes and hyperbole are not registered as facts
  • No speculation-based interpretation — psychological states are not inferred from behavior. "Seems to like this" is an observation; "likes this" is not
  • No sensitive-information inference — health, financial, or relationship status is not inferred from observed behavior
  • No personality labeling — AI does not assign labels such as "introverted" or "perfectionist"; only definitions the user provides directly are registered

Placing rules solely in a prompt (CLAUDE.md) allows them to be bypassed or ignored depending on context. Hardcoding into the ingestion layer code rejects records at the point of entry, maintaining consistency regardless of context.


Migration Results

Results of migrating 40 existing text-file-based memories into the memcore schema:

Item Count
Source files 40
Curated memories 236
Entities 4
Topics 15
Topic links 514
Migration errors 0

236 memory records were extracted from free-form text. Deduplication and noise filtering accounted for the majority of the migration effort. Once structured into 4 entities, 15 topics, and 514 topic links, the topic-matching search layer (weight 2.0) becomes effective.


Current Scale

Item Value
Modules 25
Total lines of code ~6,464
Initial version (18 modules) ~3,300
DB tables 22
Tests 31 passing, 0 failing
Test files 6 files, ~450 lines

Growth from 18 modules / 3,300 lines to 25 modules / 6,464 lines. The majority of the line count increase came from stabilization and edge case handling, not new features.

Tests are especially critical in a memory engine. Confidence calculation errors of 0.01 produce no immediate symptom — judgment quality degrades gradually over time. Without test coverage, refactoring becomes impossible.


Limitations and Scope

Where this applies - Standalone agent memory running on a single host - Offline or air-gapped environments that must eliminate external service dependencies - Workloads requiring backup and portability at the .db file level - Korean/English mixed-language text requiring semantic search

Limitations - Weak under high-concurrency writes. SQLite uses file-level locking and is unsuitable for multi-writer workloads - The vector index uses full-scan; once record counts reach millions, a dedicated HNSW-based index becomes preferable - The U-tag dialectic relies on "different day" counts as the trust basis — susceptible to distortion from data sources with manipulated timestamps - Bias rejection rules filter at the ingestion layer, but legacy data already recorded requires a separate cleanup pass


Open Questions

  • The U-tag decay coefficient (0.02/day) is empirically set. If domain-specific evidence accumulates for varying decay rates, table-driven parameterization will be needed.
  • When topic-matching weight (2.0) and LIKE fallback weight (5.0) conflict in the same query, the final ranking rule is currently a simple sum. Whether per-query-type weight adjustment is needed depends on more query log data.
  • sqlite-vec is lightweight at current scale, but determining when to switch to approximate nearest neighbor (ANN) as vector count grows requires additional benchmarking.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System