Agent Memory Engine (5/10) — Hermes Persistent Memory FTS5: Two-Layer Design

4월 24, 2026

MEMORY.md / USER.md caps, state.db FTS5, session_search, Korean tokenization pitfalls, 8 external providers

핵심 요약

Audience: Hermes is installed (#12) and you want to understand and tune how self-learning and long-term memory actually work.
What you'll get: Per /docs/user-guide/features/memory — the two-layer model (active ~1,300 tokens + SQLite FTS5 effectively unlimited), exact caps of MEMORY.md / USER.md, how session search works (Gemini Flash summarization), Korean FTS5 tokenizer pitfalls, comparison of 8 external providers, and the relationship between injection timing and prefix cache.
Prerequisite: hermes working, basic familiarity with ~/.hermes/.

1. Key insight — memory is two layers

A common misconception: "Hermes remembers everything." In reality, that's two separate systems glued together.

Layer	What	Size	Loaded
Active Memory	`~/.hermes/memories/MEMORY.md` + `USER.md`	~1,300 token cap	Injected into system prompt at session start
Session Search (FTS5)	`~/.hermes/state.db` (SQLite FTS5 index)	Practically unlimited	Agent calls `session_search` on demand

Frequently-referenced facts belong in Active; long-term archives live in FTS5. Miss this distinction and you either bloat the prompt or rely entirely on FTS5 and lose consistency.

2. Active memory — `MEMORY.md` + `USER.md`

2.1 Exact limits (official)

File	Role	Approx tokens	Char cap
`MEMORY.md`	Agent's personal notes (environment facts, conventions, learnings)	~800 tokens	2,200 chars
`USER.md`	User profile (preferences, communication style, expectations)	~500 tokens	1,375 chars

Total ~1,300 tokens / 3,575 chars. Anything larger belongs to FTS5.

2.2 Injection timing — "prompt cache strategy"

Per the docs:

"The system prompt injection is captured once at session start and never changes mid-session."

Edit MEMORY.md during a session — it won't affect that session's system prompt. Changes apply to the next session. Reason: preserve prefix cache to keep costs down.

(Claude Code uses a similar strategy — see token & cache breakdown.)

2.3 Agent's manipulation API — 3 actions

The agent uses a memory tool with three actions.

Action	Effect
`add`	Add a new entry
`replace`	Update via substring match
`remove`	Delete via substring match

There is no read action. Reading happens automatically via the system prompt — the agent doesn't explicitly read.

2.4 Direct editing is also fine

vim ~/.hermes/memories/MEMORY.md
vim ~/.hermes/memories/USER.md

If the agent considers something worth remembering, it may add/replace similar content later. Human edits that reflect the agent's learning style persist most reliably.

2.5 Toggle in config

~/.hermes/config.yaml:

memories:
  memory:
    enabled: true
    # "customizable character limits" per docs
  user:
    enabled: true

Disable both → only FTS5 is used.

3. Session Search — SQLite FTS5 under the hood

3.1 Location & structure

DB file: ~/.hermes/state.db
Engine: SQLite + FTS5 full-text index
Stores: all message histories, lineage, metadata, FTS5 index
Capacity: up to your disk (practically unlimited)

3.2 `session_search` tool — agent-invoked

The agent calls session_search to query past sessions. Per the docs:

"The agent can search its past conversations using the session_search tool, which returns relevant past conversations with Gemini Flash summarization."

Flow:

Agent formulates a query ("how did I fix the login bug last week")
FTS5 returns matching session candidates
Gemini Flash summarizes relevant portions
Only the summary lands in the current context

Rather than dumping 10 old sessions into context, you get a 150–300-token summary. That's how memory exceeds the 1,300-token active cap.

3.3 User-side exploration

hermes sessions browse   # Interactive search UI
hermes sessions list     # Recent
hermes sessions export   # JSONL for external analysis

hermes sessions browse is the primary entry point.

4. FTS5 tokenizer for Korean (and other non-space-delimited languages)

SQLite FTS5's default tokenizer splits on spaces and punctuation. Korean, Japanese, Chinese have word boundaries that don't align with spaces — causing surprises.

4.1 Default behavior gotchas

"사용자 인증 흐름" → split into 3 tokens 사용자, 인증, 흐름 (space-based).
"사용자인증흐름" (no spaces) → becomes one token.
A search for "인증" hits the former but misses the latter.
Agglutinative suffixes mean "인증을", "인증이", "인증한" are distinct tokens.

4.2 What the Hermes docs say

The official docs do not address Korean tokenizer behavior. Observations:

Short keyword searches usually work.
Long-phrase searches can miss depending on spacing.
The Flash summarization step partially compensates for matching gaps.

4.3 Practical tips for non-English

Write MEMORY.md with spaces. "API 키 관리" hits more than "API키관리".
Include English originals for key entities: "OpenClaw (오픈클로)".
Minimize suffix variation. Prefer "인증 흐름: ..." over "인증 흐름은 ...".
Dates and numbers are well-tokenized. Include them verbatim.

4.4 English-heavy memory

Defaults are fine for English. Whether Hermes exposes advanced FTS5 syntax (OR, AND, NEAR(), MATCH) isn't documented, so test empirically.

5. Context Files — a separate axis from memory

Memory (self-learning) and context files (project instructions) are easy to confuse. Worth separating.

5.1 Five official files

File	Role	Discovery
`.hermes.md` / `HERMES.md`	Project instructions (highest priority)	Walks to git root
`AGENTS.md`	Project conventions / architecture	CWD + subdirs
`CLAUDE.md`	Claude Code context compatibility	CWD + subdirs
`SOUL.md`	Global personality	`HERMES_HOME` only
`.cursorrules`	Cursor IDE conventions	CWD only

5.2 Priority — "pick exactly one"

Per the docs:

"Only one project context type is loaded per session (first match wins): .hermes.md → AGENTS.md → CLAUDE.md → .cursorrules."

If .hermes.md exists, the others are ignored.
SOUL.md is unaffected — it's always loaded separately (covers persona).

5.3 Size & truncation

Max 20,000 chars (~7,000 tokens) per file.
When exceeded: 70% head + 20% tail + 10% marker — the middle disappears.
Subdir AGENTS.md capped at 8,000 chars.
Truncation banner: [...truncated AGENTS.md: kept 14000+4000 of 25000 chars...]

5.4 Progressive discovery

As the agent navigates subdirectories, the local AGENTS.md loads dynamically. Monorepo patterns with per-subteam AGENTS.md work out of the box.

5.5 Security — prompt injection blocking

Every context file is scanned before injection for patterns like:

"ignore previous instructions"
Hidden divs
Credential exfiltration attempts
Zero-width characters

Blocked output:

[BLOCKED: AGENTS.md contained potential prompt injection...]

That guard matters when executing Hermes against a freshly cloned repo.

6. Eight external memory providers

Officially supported:

Provider	Character
honcho	Dialectic user modeling (observation → hypothesis → verified). Hermes's featured choice.
openviking	(see official docs)
mem0	Popular open-source memory layer
hindsight	(see official docs)
holographic	(see official docs)
retaindb	(see official docs)
byterover	Vector + tree hybrid (also referenced in OpenClaw community)
supermemory	SaaS memory integrated with Notion / GitHub / Gmail

6.1 Configure

hermes memory setup    # Interactive selection
hermes memory status   # Current config
hermes memory off      # Disable external

6.2 Honcho deep-dive

Honcho has a dedicated subcommand family.

hermes honcho status           # Connection status
hermes honcho peers            # Cross-profile peer identities
hermes honcho sessions         # Honcho session mappings
hermes honcho map              # Map directory → session
hermes honcho peer             # Show / set peer names
hermes honcho mode             # Show / set recall mode
hermes honcho tokens           # Show / set token budgets
hermes honcho identity         # Seed the AI peer representation
hermes honcho enable           # Enable for profile
hermes honcho disable          # Disable
hermes honcho sync             # Sync config across profiles
hermes honcho migrate          # Migration from openclaw-honcho

Honcho character: models users as "peers," and stages reasoning dialectically (observation 1× → hypothesis 2×+ → verified 3 days+). Designed to resist confirmation bias.

6.3 Selection guide

Local, solo use → Active + state.db is enough. Skip external.
Multi-device / long retention → SaaS options like supermemory / mem0.
High-quality vector search → byterover / mem0.
Deep user modeling → honcho.

7. Practical patterns — what goes where

7.1 Active memory (MEMORY.md / USER.md)

Should include: - "User's name / preferred address form" (USER.md) - "Preferred language: Korean/English mix OK" (USER.md) - "This developer prefers Rust over Node" (USER.md) - "Use pnpm; avoid npm" (MEMORY.md) - "Tests run with make test" (MEMORY.md)

Should NOT include: - Last week's bug fix log → FTS5 captures naturally - A specific function's implementation history → already in sessions - Long config dumps → use AGENTS.md or actual files

7.2 Session search (FTS5)

The agent handles it. Your only concern: name long sessions meaningfully at the end.

/title rate-limit-migration-2026-04

Later session_search hits on the title.

7.3 Context files

Per-project rules → AGENTS.md (codex / Claude Code compatible)
Personal persona → SOUL.md
Team-internal docs → markdown in-repo (agent reads via its read tool when relevant)

7.4 External providers

Don't start with them. Run the default 2-layer setup. Only add externals when you truly need cross-machine sync / massive session counts / team-shared memory.

8. Audit & backup

8.1 Inspect

cat ~/.hermes/memories/MEMORY.md
cat ~/.hermes/memories/USER.md

Open any time. If the agent "remembered" something wrong, edit or delete by hand.

8.2 SQLite query

sqlite3 ~/.hermes/state.db
.schema

SELECT id, title, created_at FROM sessions ORDER BY created_at DESC LIMIT 10;

Schema details aren't in the official docs; use .schema to confirm.

8.3 Backup

hermes backup -o ~/hermes-$(date +%Y%m%d).zip

Zips ~/.hermes/ — memory, sessions, config.

8.4 Privacy lens

Local-only by default — without external providers, memory never leaves your machine.
The bigger privacy surface is LLM provider request logs, not Hermes memory.

9. Counter-scenarios — when to disable memory

Sensitive one-off tasks → use --ignore-rules to skip memory/rules auto-injection.
Non-interactive CI → session persistence is unnecessary. Use a dedicated profile.
Shared server with multiple users → hermes profile for isolation.
Suspected memory pollution → open MEMORY.md / USER.md, delete offending entries. Next session is clean.

10. 5-minute memory tuning checklist

[ ] cat ~/.hermes/memories/MEMORY.md — is it what you expected?
[ ] cat ~/.hermes/memories/USER.md — is the profile accurate?
[ ] Remove unwanted items (vim edit or tell agent via /memory)
[ ] hermes sessions stats — session store size
[ ] Prune old sessions: hermes sessions prune --days 90
[ ] (Optional) hermes memory status — external provider config
[ ] Backup: hermes backup -o ~/hermes-$(date +%Y%m%d).zip

11. What's next

With memory under control:

OpenClaw → Hermes migration checklist (coming soon) — actual memory / skill transfer.
Install + SOUL.md — foundation.
Command cheatsheet — full hermes memory / hermes honcho surface.

References

Official Memory feature docs
Context Files docs
NousResearch/hermes-agent GitHub
SQLite FTS5 official docs — tokenizer internals

This is post 14/15 in the "AI Coding CLI Entry Guide" series.
last verified: 2026-04-25 (per hermes-agent.nousresearch.com/docs/user-guide/features/memory + context-files).

Series overview: Series index