Agent Memory Engine (3/10) — memcore Completed: 18 Modules, 3,300 Lines

4월 09, 2026

20 Tables, 8 CLI Commands, 295 Curated Rows, and Graceful Degradation

핵심 요약

memcore is a library that rebuilds a file-based memory pipeline on top of a single SQLite file. 18 modules, approximately 3,300 lines.
8 CLI commands cover the full operational lifecycle, each schedulable with a single cron entry: migrate / lint / warn / decay / wiki-lint / stats / backfill-vectors / ontology-sync.
The core design principle is graceful degradation — when a dependency is absent, functionality degrades rather than the system halting.

What You Will Take Away

Module partitioning and table design for a SQLite single-file agent memory engine
Hybrid prefetch strategy that falls back to FTS5 when vector search is unavailable
Pattern of delegating memory lifecycle operations (decay, validation, cleanup) to CLI commands and cron
How to decouple memory implementation from the host runtime using the MemoryProvider interface

Scope — Differences from the Initial Build

The initial version of memcore included 9 tables, 51 tests, and a migration path from the file-based bank/ structure. The "completed form" covered in this article adds 20 tables, a full CLI command set, a vector search module, and ontology synchronization. In other words, the target is a version where operational lifecycle and fallback strategies are embedded inside the engine itself, not just the storage layer.

Module Structure (18 Modules)

#	Module	Responsibility
1	`core`	SQLite connection, WAL mode, schema initialization
2	`ingest`	retain-extract / retain-merge unification
3	`prefetch`	FTS5 hybrid search (topic → FTS5 → LIKE)
4	`dialectic`	U-tag 3-phase (observe → hypothesize → verify)
5	`decay`	opinions confidence decay (0.02/day, remove < 0.30)
6	`lint`	Retain tag format validation
7	`entities`	Entity staleness detection (30 days)
8	`topics`	Topic registry + M2M relationships
9	`bank_migrate`	bank/ → SQLite conversion (READ-ONLY, --incremental)
10	`decisions`	LLM decision queue (TOPIC_CLASSIFY, CONFLICT_RESOLVE, CHANGELOG_SUMMARIZE)
11	`archive`	memory/ → archived/ relocation
12	`housekeeping`	recall TTL, session cleanup, stale orphan removal
13	`vectors`	sqlite-vec + bge-m3 embeddings (optional)
14	`wiki`	Karpathy LLM Wiki pattern (topic page CRUD)
15	`wiki_lint`	7 wiki checks (contradiction / stale / orphan / gap / size / citation / frontmatter)
16	`ontology`	CLAUDE.md → DB one-way cache (agent / relation / persona layers)
17	`promotion`	Cascade promotion (local → lessons → global)
18	`stats`	Statistics / health check

Responsibilities are partitioned in the following order: storage (core) → write (ingest/promotion) → read (prefetch) → cleanup (decay/housekeeping/archive) → validation (lint/wiki_lint) → auxiliary (vectors/wiki/ontology) → observability (stats). Each module has no knowledge of another module's internals; it shares only the connection object and public schema exposed by core.

20 Tables

Group	Table	Purpose
Meta	meta	Schema version, configuration
Knowledge	curated, curated_fts	Core knowledge + FTS5 index
Topics	topics, topic_curated	Topic registry + M2M mapping
Entities	entities	Per-project state
Episodes	episodes	Daily log
Identity	identity	MEMORY.md cache
U-tag	u_patterns, u_observations, u_hypotheses, u_verifications	Dialectic 3-phase
Decisions	decisions	LLM decision queue
Wiki	wiki_pages, wiki_sources, wiki_log	Karpathy wiki
Vectors	vec_curated	sqlite-vec embeddings (optional)
Ontology	ont_agents, ont_relations, ont_layers	CLAUDE.md cache
Promotions	promotions	local → lessons → global history

The tables split into three conceptual layers: fact-recording layer (curated, episodes, entities, identity), reasoning-process layer (u_patterns family, decisions, promotions), and auxiliary index layer (curated_fts, vec_curated, topic_curated). The schema ensures the fact-recording layer remains independently queryable even when the auxiliary index layer is empty.

8 CLI Commands

memcore migrate          # bank/ → SQLite conversion
memcore lint             # Retain tag + data integrity check
memcore warn             # Memory warning report
memcore decay            # Run opinions confidence decay
memcore stats            # Statistics / health check
memcore wiki-lint        # 7 wiki checks
memcore backfill-vectors # Bulk sqlite-vec embedding generation
memcore ontology-sync    # CLAUDE.md → DB synchronization

The CLI is a mechanism for pushing maintenance tasks that can be decoupled from the runtime to the outside. Operations like decay, lint, and stats must run periodically but do not need to run inside the conversation loop. Delegating them to cron/launchd reduces the runtime's responsibility to prefetch and ingest only. The system gains a self-maintenance cycle without an orchestration framework.

Current Data State

Item	Count
curated rows	295
topics	15
entities	4
topic_curated links	514
Distribution	knowledge 150 / pattern 28 / daily 17 / world 14 / identity 13 / experience 10 / opinion 4

The fact that topic_curated links (514) are approximately 1.74× curated rows (295) indicates that each item is linked to an average of 1–2 topics. Multi-topic linking, not single tagging, is the default form. The distribution shows knowledge-type entries accounting for over half. The low opinion count is indirect evidence that decay is operating as intended, removing low-confidence entries.

Core Design Principle — Graceful Degradation

memcore is designed so that code paths remain valid when a dependency is absent.

No sqlite-vec: falls back to FTS5 text search. Semantic search degrades to keyword search, but the path remains.
No bge-m3: vector build step is skipped. prefetch operates on FTS5 alone.
No CLAUDE.md: the ontology cache is empty; no other module is affected.
No wiki pages: wiki_lint returns "0 pages checked" and exits normally.

The practical advantage of this approach is tolerance for installation profile variance. The full-spec configuration is Apple Silicon + oMLX + sqlite-vec; the minimum configuration requires only Python + SQLite. Deploying the same codebase across different environments reduces branching install scripts and replaces them with in-runtime feature flags.

Interface with the Host System — MemoryProvider

memcore couples to the host runtime exclusively through the MemoryProvider abstract interface. The implementation is MemcoreProvider(MemoryProvider).

prefetch → memcore.prefetch.search
on_session_end → memcore.ingest
on_memory_write → memcore.promotion (passes Tier gate)
system_prompt_block → injects top-N curated entries into context

The host runtime has no knowledge of memcore's existence. It only needs to know the method signatures of the interface. This boundary ensures the host code does not change when the storage layer is replaced or upgraded. The same principle applies to vector backend replacement (sqlite-vec → Qdrant/FAISS) or memory engine experimentation (running two implementations in parallel A/B).

Limitations and Applicability

Single-file SQLite: appropriate for agent scenarios with low concurrent write volume. Not suited for multi-writer workloads.
Embedding cost: backfill-vectors is optimized for batch processing. Streaming or real-time embedding requires a separate path.
One-way ontology cache: reverse synchronization from DB back to CLAUDE.md is out of scope. Markdown is the source of truth.
CLI-driven operations: environments without cron/launchd (e.g., pure serverless) require an external scheduler.

The applicable scope can be summarized as: "local execution, single writer, agent environments with high dependency variance that require an observable and stable memory store."

Open Questions

How should the consistency model be defined when multiple agents reference the same memcore file concurrently?
Should the decay and promotion policy constants (0.02/day, removal threshold < 0.30, etc.) adapt based on corpus size?
If the storage layer is replaced with a non-SQLite backend (e.g., key-value store + external FTS) while preserving the MemoryProvider contract, which features degrade?

3,300 lines represents the scale of "one year's accumulated memory judgment logic redistributed onto single-file storage." The substance of the file-based → SQLite transition is not a format change. It is the ability to reduce each operational lifecycle step to a single CLI invocation and delegate it to an external scheduler.

Series overview: Series index

이 블로그 검색

MaJu Tech Notes