Agent Memory Engine (5/10) — Hermes Persistent Memory FTS5: Two-Layer Design

MEMORY.md / USER.md caps, state.db FTS5, session_search, Korean tokenization pitfalls, 8 external providers


ํ•ต์‹ฌ ์š”์•ฝ

  • Audience: Hermes is installed (#12) and you want to understand and tune how self-learning and long-term memory actually work.
  • What you'll get: Per /docs/user-guide/features/memory — the two-layer model (active ~1,300 tokens + SQLite FTS5 effectively unlimited), exact caps of MEMORY.md / USER.md, how session search works (Gemini Flash summarization), Korean FTS5 tokenizer pitfalls, comparison of 8 external providers, and the relationship between injection timing and prefix cache.
  • Prerequisite: hermes working, basic familiarity with ~/.hermes/.

1. Key insight — memory is two layers

A common misconception: "Hermes remembers everything." In reality, that's two separate systems glued together.

Layer What Size Loaded
Active Memory ~/.hermes/memories/MEMORY.md + USER.md ~1,300 token cap Injected into system prompt at session start
Session Search (FTS5) ~/.hermes/state.db (SQLite FTS5 index) Practically unlimited Agent calls session_search on demand

Frequently-referenced facts belong in Active; long-term archives live in FTS5. Miss this distinction and you either bloat the prompt or rely entirely on FTS5 and lose consistency.


2. Active memory — MEMORY.md + USER.md

2.1 Exact limits (official)

File Role Approx tokens Char cap
MEMORY.md Agent's personal notes (environment facts, conventions, learnings) ~800 tokens 2,200 chars
USER.md User profile (preferences, communication style, expectations) ~500 tokens 1,375 chars

Total ~1,300 tokens / 3,575 chars. Anything larger belongs to FTS5.

2.2 Injection timing — "prompt cache strategy"

Per the docs:

"The system prompt injection is captured once at session start and never changes mid-session."

Edit MEMORY.md during a session — it won't affect that session's system prompt. Changes apply to the next session. Reason: preserve prefix cache to keep costs down.

(Claude Code uses a similar strategy — see token & cache breakdown.)

2.3 Agent's manipulation API — 3 actions

The agent uses a memory tool with three actions.

Action Effect
add Add a new entry
replace Update via substring match
remove Delete via substring match

There is no read action. Reading happens automatically via the system prompt — the agent doesn't explicitly read.

2.4 Direct editing is also fine

vim ~/.hermes/memories/MEMORY.md
vim ~/.hermes/memories/USER.md

If the agent considers something worth remembering, it may add/replace similar content later. Human edits that reflect the agent's learning style persist most reliably.

2.5 Toggle in config

~/.hermes/config.yaml:

memories:
  memory:
    enabled: true
    # "customizable character limits" per docs
  user:
    enabled: true

Disable both → only FTS5 is used.


3. Session Search — SQLite FTS5 under the hood

3.1 Location & structure

  • DB file: ~/.hermes/state.db
  • Engine: SQLite + FTS5 full-text index
  • Stores: all message histories, lineage, metadata, FTS5 index
  • Capacity: up to your disk (practically unlimited)

3.2 session_search tool — agent-invoked

The agent calls session_search to query past sessions. Per the docs:

"The agent can search its past conversations using the session_search tool, which returns relevant past conversations with Gemini Flash summarization."

Flow:

  1. Agent formulates a query ("how did I fix the login bug last week")
  2. FTS5 returns matching session candidates
  3. Gemini Flash summarizes relevant portions
  4. Only the summary lands in the current context

Rather than dumping 10 old sessions into context, you get a 150–300-token summary. That's how memory exceeds the 1,300-token active cap.

3.3 User-side exploration

hermes sessions browse   # Interactive search UI
hermes sessions list     # Recent
hermes sessions export   # JSONL for external analysis

hermes sessions browse is the primary entry point.


4. FTS5 tokenizer for Korean (and other non-space-delimited languages)

SQLite FTS5's default tokenizer splits on spaces and punctuation. Korean, Japanese, Chinese have word boundaries that don't align with spaces — causing surprises.

4.1 Default behavior gotchas

  • "์‚ฌ์šฉ์ž ์ธ์ฆ ํ๋ฆ„" → split into 3 tokens ์‚ฌ์šฉ์ž, ์ธ์ฆ, ํ๋ฆ„ (space-based).
  • "์‚ฌ์šฉ์ž์ธ์ฆํ๋ฆ„" (no spaces) → becomes one token.
  • A search for "์ธ์ฆ" hits the former but misses the latter.
  • Agglutinative suffixes mean "์ธ์ฆ์„", "์ธ์ฆ์ด", "์ธ์ฆํ•œ" are distinct tokens.

4.2 What the Hermes docs say

The official docs do not address Korean tokenizer behavior. Observations:

  • Short keyword searches usually work.
  • Long-phrase searches can miss depending on spacing.
  • The Flash summarization step partially compensates for matching gaps.

4.3 Practical tips for non-English

  1. Write MEMORY.md with spaces. "API ํ‚ค ๊ด€๋ฆฌ" hits more than "APIํ‚ค๊ด€๋ฆฌ".
  2. Include English originals for key entities: "OpenClaw (์˜คํ”ˆํด๋กœ)".
  3. Minimize suffix variation. Prefer "์ธ์ฆ ํ๋ฆ„: ..." over "์ธ์ฆ ํ๋ฆ„์€ ...".
  4. Dates and numbers are well-tokenized. Include them verbatim.

4.4 English-heavy memory

Defaults are fine for English. Whether Hermes exposes advanced FTS5 syntax (OR, AND, NEAR(), MATCH) isn't documented, so test empirically.


5. Context Files — a separate axis from memory

Memory (self-learning) and context files (project instructions) are easy to confuse. Worth separating.

5.1 Five official files

File Role Discovery
.hermes.md / HERMES.md Project instructions (highest priority) Walks to git root
AGENTS.md Project conventions / architecture CWD + subdirs
CLAUDE.md Claude Code context compatibility CWD + subdirs
SOUL.md Global personality HERMES_HOME only
.cursorrules Cursor IDE conventions CWD only

5.2 Priority — "pick exactly one"

Per the docs:

"Only one project context type is loaded per session (first match wins): .hermes.mdAGENTS.mdCLAUDE.md.cursorrules."

  • If .hermes.md exists, the others are ignored.
  • SOUL.md is unaffected — it's always loaded separately (covers persona).

5.3 Size & truncation

  • Max 20,000 chars (~7,000 tokens) per file.
  • When exceeded: 70% head + 20% tail + 10% marker — the middle disappears.
  • Subdir AGENTS.md capped at 8,000 chars.
  • Truncation banner: [...truncated AGENTS.md: kept 14000+4000 of 25000 chars...]

5.4 Progressive discovery

As the agent navigates subdirectories, the local AGENTS.md loads dynamically. Monorepo patterns with per-subteam AGENTS.md work out of the box.

5.5 Security — prompt injection blocking

Every context file is scanned before injection for patterns like:

  • "ignore previous instructions"
  • Hidden divs
  • Credential exfiltration attempts
  • Zero-width characters

Blocked output:

[BLOCKED: AGENTS.md contained potential prompt injection...]

That guard matters when executing Hermes against a freshly cloned repo.


6. Eight external memory providers

Officially supported:

Provider Character
honcho Dialectic user modeling (observation → hypothesis → verified). Hermes's featured choice.
openviking (see official docs)
mem0 Popular open-source memory layer
hindsight (see official docs)
holographic (see official docs)
retaindb (see official docs)
byterover Vector + tree hybrid (also referenced in OpenClaw community)
supermemory SaaS memory integrated with Notion / GitHub / Gmail

6.1 Configure

hermes memory setup    # Interactive selection
hermes memory status   # Current config
hermes memory off      # Disable external

6.2 Honcho deep-dive

Honcho has a dedicated subcommand family.

hermes honcho status           # Connection status
hermes honcho peers            # Cross-profile peer identities
hermes honcho sessions         # Honcho session mappings
hermes honcho map              # Map directory → session
hermes honcho peer             # Show / set peer names
hermes honcho mode             # Show / set recall mode
hermes honcho tokens           # Show / set token budgets
hermes honcho identity         # Seed the AI peer representation
hermes honcho enable           # Enable for profile
hermes honcho disable          # Disable
hermes honcho sync             # Sync config across profiles
hermes honcho migrate          # Migration from openclaw-honcho

Honcho character: models users as "peers," and stages reasoning dialectically (observation 1× → hypothesis 2×+ → verified 3 days+). Designed to resist confirmation bias.

6.3 Selection guide

  • Local, solo use → Active + state.db is enough. Skip external.
  • Multi-device / long retention → SaaS options like supermemory / mem0.
  • High-quality vector search → byterover / mem0.
  • Deep user modeling → honcho.

7. Practical patterns — what goes where

7.1 Active memory (MEMORY.md / USER.md)

Should include: - "User's name / preferred address form" (USER.md) - "Preferred language: Korean/English mix OK" (USER.md) - "This developer prefers Rust over Node" (USER.md) - "Use pnpm; avoid npm" (MEMORY.md) - "Tests run with make test" (MEMORY.md)

Should NOT include: - Last week's bug fix log → FTS5 captures naturally - A specific function's implementation history → already in sessions - Long config dumps → use AGENTS.md or actual files

7.2 Session search (FTS5)

The agent handles it. Your only concern: name long sessions meaningfully at the end.

/title rate-limit-migration-2026-04

Later session_search hits on the title.

7.3 Context files

  • Per-project rules → AGENTS.md (codex / Claude Code compatible)
  • Personal persona → SOUL.md
  • Team-internal docs → markdown in-repo (agent reads via its read tool when relevant)

7.4 External providers

Don't start with them. Run the default 2-layer setup. Only add externals when you truly need cross-machine sync / massive session counts / team-shared memory.


8. Audit & backup

8.1 Inspect

cat ~/.hermes/memories/MEMORY.md
cat ~/.hermes/memories/USER.md

Open any time. If the agent "remembered" something wrong, edit or delete by hand.

8.2 SQLite query

sqlite3 ~/.hermes/state.db
.schema

SELECT id, title, created_at FROM sessions ORDER BY created_at DESC LIMIT 10;

Schema details aren't in the official docs; use .schema to confirm.

8.3 Backup

hermes backup -o ~/hermes-$(date +%Y%m%d).zip

Zips ~/.hermes/ — memory, sessions, config.

8.4 Privacy lens

  • Local-only by default — without external providers, memory never leaves your machine.
  • The bigger privacy surface is LLM provider request logs, not Hermes memory.

9. Counter-scenarios — when to disable memory

  • Sensitive one-off tasks → use --ignore-rules to skip memory/rules auto-injection.
  • Non-interactive CI → session persistence is unnecessary. Use a dedicated profile.
  • Shared server with multiple usershermes profile for isolation.
  • Suspected memory pollution → open MEMORY.md / USER.md, delete offending entries. Next session is clean.

10. 5-minute memory tuning checklist

  • [ ] cat ~/.hermes/memories/MEMORY.md — is it what you expected?
  • [ ] cat ~/.hermes/memories/USER.md — is the profile accurate?
  • [ ] Remove unwanted items (vim edit or tell agent via /memory)
  • [ ] hermes sessions stats — session store size
  • [ ] Prune old sessions: hermes sessions prune --days 90
  • [ ] (Optional) hermes memory status — external provider config
  • [ ] Backup: hermes backup -o ~/hermes-$(date +%Y%m%d).zip

11. What's next

With memory under control:

  1. OpenClaw → Hermes migration checklist (coming soon) — actual memory / skill transfer.
  2. Install + SOUL.md — foundation.
  3. Command cheatsheet — full hermes memory / hermes honcho surface.

References


This is post 14/15 in the "AI Coding CLI Entry Guide" series.
last verified: 2026-04-25 (per hermes-agent.nousresearch.com/docs/user-guide/features/memory + context-files).

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System