OpenClaw Build and Operations (5/5) — OpenClaw vs Hermes: 18-Category Scoring

4월 09, 2026

OC 63/90 vs Hermes 75/90: Decision Criteria Beyond the Score

핵심 요약

A framework for comparing two agent platforms across 18 categories on a 0–5 scale. Result: Hermes 75/90 (83%), OpenClaw 63/90 (70%).
Distribution: OpenClaw leads in 7 categories (operations-focused), Hermes leads in 7 (autonomy and extensibility), 2 ties.
Key insight: Platform scores are not a ranking tool — they are a tool for asking "is the winning capability transferable?" Portable strengths and structural strengths must be treated differently.

What This Post Covers

A single aggregate score is a weak basis for platform selection decisions. This post covers: (1) the 18-category scorecard structure, (2) a two-axis interpretation method that separates "portability" from "architectural lock-in," and (3) criteria for determining which strengths are preserved and which weaknesses disappear during a platform migration.

Purpose of the Scoring Framework

"A is better" is not sufficient justification for a migration decision. Decomposing into 18 categories and assigning 0–5 per category makes win/loss positions explicit in numeric form. The role of the score is not to provide an answer — it is to generate questions: "Is losing in this category acceptable?"

18-Category Scorecard

OpenClaw Leads (7 categories)

Category	OC	Hermes	Delta	Notes
Channel / Gateway	5	4	+1	Discord + Telegram Forum with 6-topic granular routing
Agent Management	5	4	+1	8-layer binding (tool / model / permission / channel / isolation)
Memory	5	3	+2	4-tier + U-tag + Reflect 6-phase + semantic search
Scheduling	4	3	+1	9 cron jobs + 7 LaunchAgents + micro-cycle
Plugins	4	3	+1	oMLX integration, Scrapling, custom hooks
Config Management	4	3	+1	config-drift detection, agent-linter
Context Management	4	3	+1	safeguard compaction, softThreshold 6000

Hermes Leads (7 categories)

Category	OC	Hermes	Delta	Notes
Skill Auto-Generation	1	5	-4	Automatic skill generation from usage patterns
Learning Loop	2	5	-3	DSPy + GEPA self-evolution
Execution Backend	2	5	-3	6 backends: Docker / E2B / Modal / Fly.io / Lambda / local
Voice	0	4	-4	Native voice I/O
Sub-agents	2	4	-2	Native profile spawning + delegation
ML / Research	1	4	-3	Trajectory analysis, experiment framework
IDE Integration	1	4	-3	Native VS Code / JetBrains

Ties (2 categories)

Category	OC	Hermes
Model Routing	4	4
Tool System	4	4

Totals

OpenClaw: 63/90 (70%)
Hermes: 75/90 (83%)

Score Interpretation: Portability vs. Architectural Lock-in

1. Separating Platform-Native Strengths from Custom Layers

OC memory at 5 vs. Hermes memory at 3 does not reflect a design superiority of the platform itself. The score is a product of the custom layer built on top of OpenClaw — memcore, U-tag, Reflect 6-phase. These are not fixed attributes of the platform; they are layers on top of it, and therefore portable to another platform.

Applied pattern: label each category score as either "platform baseline" or "custom accumulated during operation." This separation makes it explicit what is lost vs. what is carried over during migration.

2. Architecturally Fixed Strengths Are Not Portable

Among Hermes' leading categories, the learning loop (DSPy + GEPA), 6-backend execution, voice, and trajectory analysis cannot be addressed by investing more time into OpenClaw. These are determined at the framework architecture level. Replicating them externally incurs a cost equivalent to rebuilding the platform.

Decision rule: classify whether a capability is "attachable as a layer" or "embedded in the architecture" before interpreting the score.

3. Weaknesses Offset by Portable Layers

Hermes' memory score of 3 stems from a 2,200-character MEMORY.md limit. If memcore — built on OpenClaw to overcome the same constraint — can fill that gap as a layer, then the Hermes + memcore combination converges on a memory score of 5 after migration.

Principle: a weakness that was resolved via a custom layer can be resolved again by applying the same layer to a new platform. Therefore, the weakness column in the scorecard should also flag "whether a portable solution already exists."

Limitations and Scope

0–5 scoring introduces subjectivity onto qualitative indicators. Applying the same framework to a different team can shift scores by ±1.
The 18 categories are domain-specific to agent platforms. A different domain (e.g., general PaaS comparison) requires a redesigned category set.
The boundary between "portable" and "architecturally fixed" is not always clear. Intermediate cases — features built on top of a plugin system — require individual assessment.

Open Questions

When platform scores are near-equal, what is the next axis that breaks the tie — maintenance cost, community, upstream change velocity?
When portable custom layers grow thick enough, at what point should the layer itself be extracted as a "platform-independent component"?

Further entries continue in the same measurement format in separate records.

Series overview: Series index

이 블로그 검색

MaJu Tech Notes