OpenClaw to Hermes Migration (1/13) — Current Structure Inventory: Snapshot Before Migration

4월 20, 2026

Single-persona interface + backend role separation, 4-tier memory, cron / LaunchAgent / script layering — an anatomy of what a solo AI system is actually made of.

What This Post Covers

The design pattern of separating concerns into a single conversational persona and multiple backend agents
The criteria for dividing workloads across cron, macOS LaunchAgent, and the script layer
A 4-tier memory structure (Identity / Curated / Project / Episode) that manages token budget and recall quality independently
The actual configuration values for oMLX-based hybrid search (embedding + keyword + MMR + temporal decay)
From a migration perspective, the criteria that distinguish "reusable assets" from "platform-coupled assets"

OpenClaw started as a solo AI system built on Claude Code and has since grown to include multiple agents, multiple schedulers and daemons, multiple operational scripts, several skills, and a 4-tier memory architecture. Ahead of the decision to migrate to a Hermes base, this inventory provides a complete overhead view of the current system. The structure is documented so that anyone building or referencing a similar system can understand exactly which role belongs to which layer.

1. Agent Layer — Single Persona + Role Separation

Multiple agents exist behind the scenes, but the user-facing interface is fixed to a single agent: mir. All others operate as backend roles. This design allows (a) a single conversational context to be maintained while (b) heavy workloads are delegated to dedicated agents, each with independently tuned models and prompts.

Agent	Model	Role
mir	gpt-5.4 (high)	Sole conversational interface, routing
monitor	omlx gemma-4	System status monitoring
researcher	gemini-3.1-flash-lite	Web research, AutoSearch Deep, Scrapling
communicator	gpt-5.4 (medium)	Writing
orchestrator	gpt-5.3-codex	Coding + CLI, sub-agent/skill factory
google	gemini-3.1-flash-lite	Google Workspace API
reviewer	gemini-3.1-flash-lite	Monthly memory review
memory-runner	omlx gemma-4	bank/ file I/O
memory-manager	gemini-2.5-flash	Reflect semantic judgment
telegram-ops	gemini-3.1-flash-lite	Telegram delivery
planner	mir-inherited (subagent)	PRD / planning

Model Selection Criteria

Heavy judgment / routing: GPT-5.4 high
Coding and CLI: GPT-5.3 codex
Lightweight tasks (research, delivery, review): Gemini 3.1 Flash Lite
Local security (never sent over the network): oMLX gemma-4 26b

Model assignment is determined by two axes only: inference cost and security boundary. Fixing the conversational interface to a single agent ensures that model heterogeneity is never exposed to the user experience.

2. Automation Layer — Cron

Scheduled batch work runs on cron. Jobs fall into two main categories.

name	schedule	agent
ai-research-scan	09:00	researcher
reddit-scan	every 15 min / 6 hr	researcher
reflect-orchestrate	03:00	memory-manager
memory-micro-cycle	:07 and :37 every hour	mir
daily-summary	22:40	mir
self-review-scan	22:05	mir
self-review-apply	22:12	mir
self-review-fix	every 10 min / 22–23h	mir
monthly-memory-review	1st of month, 09:00	reviewer

How It Works

Memory cycle: 03:00 Reflect (semantic merge and consolidation), micro-cycle at :07 and :37 each hour (short-interval housekeeping). Aligns bank/ and memory/ during hours when the user is not active.
Self-review: 22:05 scan → 22:12 apply → fix loop from 22:00 to 23:00. Rule violations and drift are self-corrected at the end of each day.

The cron layer is where the principle of "the system consolidates and corrects itself while the user is asleep" is implemented.

3. Daemon Layer — macOS LaunchAgent

Where cron is time-based, LaunchAgent is process-based. Anything that must remain alive goes here. Representative daemons:

ai.openclaw.gateway — Gateway daemon (port 18789)
ai.openclaw.update-check — Automatic updates
com.openclaw.omlx — oMLX server (gemma-4 26b)
com.openclaw.omlx-proxy — oMLX proxy
com.openclaw.daily-audit — 05:00 daily audit (Phase A–E)
com.claude-agent.watchdog — Session watchdog
com.claude-agent.autostart — Auto-start

The oMLX server and proxy form the local embedding and inference infrastructure. Data that must not leave the machine is routed exclusively through these two endpoints. The minimum unit for attaching local inference is: server + proxy + routing configuration pointing to the proxy.

4. Execution Layer — Operational Scripts

This is the largest layer by file count, with clearly defined functional categories.

Memory Pipeline

retain-extract.py / retain-merge.py — Retain tag extraction and merging
recall-tree.py / recall-match.py — Topic-cued recall
recall-cleanup.py — TTL cleanup
confidence-decay.py — Confidence decay for opinions
topics-validate.py / topics-expand.py — Topic consistency
session-cleanup.py / session-archive.py — Session lifecycle
user-pattern-stage.py — U-tag dialectic (observation → hypothesis → verification)
bank-lint.py / bank-size-watch.py — bank/ validation
entity-audit.py — entities/ staleness detection
memory-micro-cycle.py — 30-minute cycle orchestrator
decision-prepare.py / decision-apply.py — LLM decision queue
memory-warning-report.py / memory-optimize.py
memory-archive.sh — memory/ → archived/

Heartbeat

heartbeat-router.sh / heartbeat-update.sh / heartbeat-tick.py / heartbeat-health.py
proactive-exception-alerts.py
check-quotas.sh
gws-check.sh — Gmail/Calendar quick check

Self-Review

self-review-prescan.py / self-review-apply.py / self-review-fix.py
skill-candidate-stage.py

Daily Audit

daily-audit.sh / docs-snapshot.sh / release-check.sh / docs-update-ai.sh
agent-linter.sh / config-drift.sh / cron-analytics.sh / session-audit.sh / reflect-trace.py

External Integration

scrape.py — Scrapling + playwright + curl_cffi (Cloudflare bypass)
gws-wrapper.sh — Google Workspace CLI
omlx-serve.sh — oMLX daemon

Hooks (.claude/settings)

PostToolUse: JSON validation, AgentLinter, DOC_SYNC, debug warnings
PreToolUse (git commit): DOC_SYNC gate, AgentLinter
SessionStart / SessionEnd / PreCompact

Scripts follow a one-task-per-file principle. Cron, LaunchAgent, and hooks call these scripts as entry points. The upper layers decide "when and why"; scripts handle only "what."

5. Interface Layer — Skills

Slash-command skills are the entry points the user invokes directly.

General: brainstorming, writing-plans, verification, deep-interview, code-review, testing, git-commit, project-doctor, self-audit
OpenClaw-specific: oc-config, oc-agent, oc-channel, oc-skill, oc-memory, oc-doctor, oc-deploy, oc-backup

Classification from a Migration Perspective

General skills are platform-independent — portable to any other harness as-is.
oc-* skills are coupled to OpenClaw's internal structure (agent composition, channel routing, bank/ schema) and must be redesigned.

This binary classification is the first-pass filter for migration scope.

6. Memory Architecture — v0.6 4-Tier

This is the core design and the most carefully managed asset in the system. Four tiers separate concerns: token budget management and recall quality are handled independently.

Identity: MEMORY.md (600 tokens) — Identity file, always injected at the front of the context window
Curated: bank/ (world / experience / opinions / patterns / _changelog / _conflict_log / _map / index / topics)
Project: bank/entities/
Episode: memory/ (daily logs)

Search Stack

Embedding: oMLX bge-m3-mlx-fp16
Hybrid weighting: vector 0.7 / keyword 0.3
Reranking: MMR
Temporal correction: temporalDecay 30 days

Reflect

Multi-phase pipeline
U-tag dialectic (observation → hypothesis → verification)

Design Rationale

Identity is "always injected" → hard-capped at 600 tokens.
Curated is "retrieved based on query context" → vector + keyword + MMR ensures diversity; temporal decay weights recency.
Episode is "raw log" → rarely read directly; Reflect promotes entries to Curated.
Project is entity-scoped isolation → per-entity staleness detection (entity-audit) is possible.

The 4-tier design physically separates four distinct roles: what goes into context / what is retrieved / what is a promotion candidate / what is the source of record. When these roles are conflated, tokens are wasted or recall is contaminated with noise.

7. Channel Routing

Notifications are split by purpose.

Discord main channel (1487837031420792832): all reflect and self-review events
Telegram Forum topics: General / Server / Bot / Research / Memory / Heartbeat

When all events converge on a single channel, they become noise and notifications get ignored. Splitting by topic means the choice of which channel to monitor is itself the decision about "what must not be missed." The routing concept is not platform-coupled and is likely to carry over intact after migration.

Limitations and Scope

Layer count is complexity: The multi-layer structure — agent / cron / LaunchAgent / script / hook / skill — is near the upper bound of what a single person can maintain. Beyond this scale, layers must be consolidated.
oMLX local inference assumes macOS Apple Silicon: Implementing the same structure on a different OS requires replacing the local inference stack.
Cron + LaunchAgent is macOS-specific: Moving to Linux allows a 1:1 substitution with systemd timers and units, but when the scheduler changes, the failure-recovery behavior of the self-review loop must be re-verified as well.
Reusable vs. platform-coupled: Role separation principle, 4-tier memory, channel routing, and general skills are portable. oc-* skills, the Gateway daemon, and hook configuration require redesign per harness.

Open Questions

Can the 4-tier memory be reduced further — what is lost by collapsing to a 2-tier Identity + Curated structure?
In the single-persona + backend-separation model, how does routing overhead scale when the persona count increases from one to N?
For the Hermes migration, which assets carry over unchanged, which require redesign, and which are discarded — is the classification criterion "platform coupling" or "design debt"?

Series overview: Series index

이 블로그 검색

MaJu Tech Notes