Agent Memory Engine (5/10) — Hermes Persistent Memory FTS5: Two-Layer Design
MEMORY.md / USER.md caps, state.db FTS5, session_search, Korean tokenization pitfalls, 8 external providers
ํต์ฌ ์์ฝ
- Audience: Hermes is installed (#12) and you want to understand and tune how self-learning and long-term memory actually work.
- What you'll get: Per
/docs/user-guide/features/memory— the two-layer model (active ~1,300 tokens + SQLite FTS5 effectively unlimited), exact caps of MEMORY.md / USER.md, how session search works (Gemini Flash summarization), Korean FTS5 tokenizer pitfalls, comparison of 8 external providers, and the relationship between injection timing and prefix cache. - Prerequisite:
hermesworking, basic familiarity with~/.hermes/.
1. Key insight — memory is two layers
A common misconception: "Hermes remembers everything." In reality, that's two separate systems glued together.
| Layer | What | Size | Loaded |
|---|---|---|---|
| Active Memory | ~/.hermes/memories/MEMORY.md + USER.md |
~1,300 token cap | Injected into system prompt at session start |
| Session Search (FTS5) | ~/.hermes/state.db (SQLite FTS5 index) |
Practically unlimited | Agent calls session_search on demand |
Frequently-referenced facts belong in Active; long-term archives live in FTS5. Miss this distinction and you either bloat the prompt or rely entirely on FTS5 and lose consistency.
2. Active memory — MEMORY.md + USER.md
2.1 Exact limits (official)
| File | Role | Approx tokens | Char cap |
|---|---|---|---|
MEMORY.md |
Agent's personal notes (environment facts, conventions, learnings) | ~800 tokens | 2,200 chars |
USER.md |
User profile (preferences, communication style, expectations) | ~500 tokens | 1,375 chars |
Total ~1,300 tokens / 3,575 chars. Anything larger belongs to FTS5.
2.2 Injection timing — "prompt cache strategy"
Per the docs:
"The system prompt injection is captured once at session start and never changes mid-session."
Edit MEMORY.md during a session — it won't affect that session's system prompt. Changes apply to the next session. Reason: preserve prefix cache to keep costs down.
(Claude Code uses a similar strategy — see token & cache breakdown.)
2.3 Agent's manipulation API — 3 actions
The agent uses a memory tool with three actions.
| Action | Effect |
|---|---|
add |
Add a new entry |
replace |
Update via substring match |
remove |
Delete via substring match |
There is no read action. Reading happens automatically via the system prompt — the agent doesn't explicitly read.
2.4 Direct editing is also fine
vim ~/.hermes/memories/MEMORY.md
vim ~/.hermes/memories/USER.md
If the agent considers something worth remembering, it may add/replace similar content later. Human edits that reflect the agent's learning style persist most reliably.
2.5 Toggle in config
~/.hermes/config.yaml:
memories:
memory:
enabled: true
# "customizable character limits" per docs
user:
enabled: true
Disable both → only FTS5 is used.
3. Session Search — SQLite FTS5 under the hood
3.1 Location & structure
- DB file:
~/.hermes/state.db - Engine: SQLite + FTS5 full-text index
- Stores: all message histories, lineage, metadata, FTS5 index
- Capacity: up to your disk (practically unlimited)
3.2 session_search tool — agent-invoked
The agent calls session_search to query past sessions. Per the docs:
"The agent can search its past conversations using the
session_searchtool, which returns relevant past conversations with Gemini Flash summarization."
Flow:
- Agent formulates a query ("how did I fix the login bug last week")
- FTS5 returns matching session candidates
- Gemini Flash summarizes relevant portions
- Only the summary lands in the current context
Rather than dumping 10 old sessions into context, you get a 150–300-token summary. That's how memory exceeds the 1,300-token active cap.
3.3 User-side exploration
hermes sessions browse # Interactive search UI
hermes sessions list # Recent
hermes sessions export # JSONL for external analysis
hermes sessions browse is the primary entry point.
4. FTS5 tokenizer for Korean (and other non-space-delimited languages)
SQLite FTS5's default tokenizer splits on spaces and punctuation. Korean, Japanese, Chinese have word boundaries that don't align with spaces — causing surprises.
4.1 Default behavior gotchas
"์ฌ์ฉ์ ์ธ์ฆ ํ๋ฆ"→ split into 3 tokens์ฌ์ฉ์,์ธ์ฆ,ํ๋ฆ(space-based)."์ฌ์ฉ์์ธ์ฆํ๋ฆ"(no spaces) → becomes one token.- A search for
"์ธ์ฆ"hits the former but misses the latter. - Agglutinative suffixes mean
"์ธ์ฆ์","์ธ์ฆ์ด","์ธ์ฆํ"are distinct tokens.
4.2 What the Hermes docs say
The official docs do not address Korean tokenizer behavior. Observations:
- Short keyword searches usually work.
- Long-phrase searches can miss depending on spacing.
- The Flash summarization step partially compensates for matching gaps.
4.3 Practical tips for non-English
- Write MEMORY.md with spaces.
"API ํค ๊ด๋ฆฌ"hits more than"APIํค๊ด๋ฆฌ". - Include English originals for key entities:
"OpenClaw (์คํํด๋ก)". - Minimize suffix variation. Prefer
"์ธ์ฆ ํ๋ฆ: ..."over"์ธ์ฆ ํ๋ฆ์ ...". - Dates and numbers are well-tokenized. Include them verbatim.
4.4 English-heavy memory
Defaults are fine for English. Whether Hermes exposes advanced FTS5 syntax (OR, AND, NEAR(), MATCH) isn't documented, so test empirically.
5. Context Files — a separate axis from memory
Memory (self-learning) and context files (project instructions) are easy to confuse. Worth separating.
5.1 Five official files
| File | Role | Discovery |
|---|---|---|
.hermes.md / HERMES.md |
Project instructions (highest priority) | Walks to git root |
AGENTS.md |
Project conventions / architecture | CWD + subdirs |
CLAUDE.md |
Claude Code context compatibility | CWD + subdirs |
SOUL.md |
Global personality | HERMES_HOME only |
.cursorrules |
Cursor IDE conventions | CWD only |
5.2 Priority — "pick exactly one"
Per the docs:
"Only one project context type is loaded per session (first match wins):
.hermes.md→AGENTS.md→CLAUDE.md→.cursorrules."
- If
.hermes.mdexists, the others are ignored. - SOUL.md is unaffected — it's always loaded separately (covers persona).
5.3 Size & truncation
- Max 20,000 chars (~7,000 tokens) per file.
- When exceeded: 70% head + 20% tail + 10% marker — the middle disappears.
- Subdir AGENTS.md capped at 8,000 chars.
- Truncation banner:
[...truncated AGENTS.md: kept 14000+4000 of 25000 chars...]
5.4 Progressive discovery
As the agent navigates subdirectories, the local AGENTS.md loads dynamically. Monorepo patterns with per-subteam AGENTS.md work out of the box.
5.5 Security — prompt injection blocking
Every context file is scanned before injection for patterns like:
- "ignore previous instructions"
- Hidden divs
- Credential exfiltration attempts
- Zero-width characters
Blocked output:
[BLOCKED: AGENTS.md contained potential prompt injection...]
That guard matters when executing Hermes against a freshly cloned repo.
6. Eight external memory providers
Officially supported:
| Provider | Character |
|---|---|
| honcho | Dialectic user modeling (observation → hypothesis → verified). Hermes's featured choice. |
| openviking | (see official docs) |
| mem0 | Popular open-source memory layer |
| hindsight | (see official docs) |
| holographic | (see official docs) |
| retaindb | (see official docs) |
| byterover | Vector + tree hybrid (also referenced in OpenClaw community) |
| supermemory | SaaS memory integrated with Notion / GitHub / Gmail |
6.1 Configure
hermes memory setup # Interactive selection
hermes memory status # Current config
hermes memory off # Disable external
6.2 Honcho deep-dive
Honcho has a dedicated subcommand family.
hermes honcho status # Connection status
hermes honcho peers # Cross-profile peer identities
hermes honcho sessions # Honcho session mappings
hermes honcho map # Map directory → session
hermes honcho peer # Show / set peer names
hermes honcho mode # Show / set recall mode
hermes honcho tokens # Show / set token budgets
hermes honcho identity # Seed the AI peer representation
hermes honcho enable # Enable for profile
hermes honcho disable # Disable
hermes honcho sync # Sync config across profiles
hermes honcho migrate # Migration from openclaw-honcho
Honcho character: models users as "peers," and stages reasoning dialectically (observation 1× → hypothesis 2×+ → verified 3 days+). Designed to resist confirmation bias.
6.3 Selection guide
- Local, solo use → Active + state.db is enough. Skip external.
- Multi-device / long retention → SaaS options like supermemory / mem0.
- High-quality vector search → byterover / mem0.
- Deep user modeling → honcho.
7. Practical patterns — what goes where
7.1 Active memory (MEMORY.md / USER.md)
Should include:
- "User's name / preferred address form" (USER.md)
- "Preferred language: Korean/English mix OK" (USER.md)
- "This developer prefers Rust over Node" (USER.md)
- "Use pnpm; avoid npm" (MEMORY.md)
- "Tests run with make test" (MEMORY.md)
Should NOT include: - Last week's bug fix log → FTS5 captures naturally - A specific function's implementation history → already in sessions - Long config dumps → use AGENTS.md or actual files
7.2 Session search (FTS5)
The agent handles it. Your only concern: name long sessions meaningfully at the end.
/title rate-limit-migration-2026-04
Later session_search hits on the title.
7.3 Context files
- Per-project rules →
AGENTS.md(codex / Claude Code compatible) - Personal persona →
SOUL.md - Team-internal docs → markdown in-repo (agent reads via its
readtool when relevant)
7.4 External providers
Don't start with them. Run the default 2-layer setup. Only add externals when you truly need cross-machine sync / massive session counts / team-shared memory.
8. Audit & backup
8.1 Inspect
cat ~/.hermes/memories/MEMORY.md
cat ~/.hermes/memories/USER.md
Open any time. If the agent "remembered" something wrong, edit or delete by hand.
8.2 SQLite query
sqlite3 ~/.hermes/state.db
.schema
SELECT id, title, created_at FROM sessions ORDER BY created_at DESC LIMIT 10;
Schema details aren't in the official docs; use .schema to confirm.
8.3 Backup
hermes backup -o ~/hermes-$(date +%Y%m%d).zip
Zips ~/.hermes/ — memory, sessions, config.
8.4 Privacy lens
- Local-only by default — without external providers, memory never leaves your machine.
- The bigger privacy surface is LLM provider request logs, not Hermes memory.
9. Counter-scenarios — when to disable memory
- Sensitive one-off tasks → use
--ignore-rulesto skip memory/rules auto-injection. - Non-interactive CI → session persistence is unnecessary. Use a dedicated profile.
- Shared server with multiple users →
hermes profilefor isolation. - Suspected memory pollution → open MEMORY.md / USER.md, delete offending entries. Next session is clean.
10. 5-minute memory tuning checklist
- [ ]
cat ~/.hermes/memories/MEMORY.md— is it what you expected? - [ ]
cat ~/.hermes/memories/USER.md— is the profile accurate? - [ ] Remove unwanted items (vim edit or tell agent via
/memory) - [ ]
hermes sessions stats— session store size - [ ] Prune old sessions:
hermes sessions prune --days 90 - [ ] (Optional)
hermes memory status— external provider config - [ ] Backup:
hermes backup -o ~/hermes-$(date +%Y%m%d).zip
11. What's next
With memory under control:
- OpenClaw → Hermes migration checklist (coming soon) — actual memory / skill transfer.
- Install + SOUL.md — foundation.
- Command cheatsheet — full
hermes memory/hermes honchosurface.
References
- Official Memory feature docs
- Context Files docs
- NousResearch/hermes-agent GitHub
- SQLite FTS5 official docs — tokenizer internals
This is post 14/15 in the "AI Coding CLI Entry Guide" series.
last verified: 2026-04-25 (per hermes-agent.nousresearch.com/docs/user-guide/features/memory + context-files).
Series overview: Series index
๋๊ธ
๋๊ธ ์ฐ๊ธฐ