OpenClaw Build and Operations (5/5) — OpenClaw vs Hermes: 18-Category Scoring
OC 63/90 vs Hermes 75/90: Decision Criteria Beyond the Score
ํต์ฌ ์์ฝ
- A framework for comparing two agent platforms across 18 categories on a 0–5 scale. Result: Hermes 75/90 (83%), OpenClaw 63/90 (70%).
- Distribution: OpenClaw leads in 7 categories (operations-focused), Hermes leads in 7 (autonomy and extensibility), 2 ties.
- Key insight: Platform scores are not a ranking tool — they are a tool for asking "is the winning capability transferable?" Portable strengths and structural strengths must be treated differently.
What This Post Covers
A single aggregate score is a weak basis for platform selection decisions. This post covers: (1) the 18-category scorecard structure, (2) a two-axis interpretation method that separates "portability" from "architectural lock-in," and (3) criteria for determining which strengths are preserved and which weaknesses disappear during a platform migration.
Purpose of the Scoring Framework
"A is better" is not sufficient justification for a migration decision. Decomposing into 18 categories and assigning 0–5 per category makes win/loss positions explicit in numeric form. The role of the score is not to provide an answer — it is to generate questions: "Is losing in this category acceptable?"
18-Category Scorecard
OpenClaw Leads (7 categories)
| Category | OC | Hermes | Delta | Notes |
|---|---|---|---|---|
| Channel / Gateway | 5 | 4 | +1 | Discord + Telegram Forum with 6-topic granular routing |
| Agent Management | 5 | 4 | +1 | 8-layer binding (tool / model / permission / channel / isolation) |
| Memory | 5 | 3 | +2 | 4-tier + U-tag + Reflect 6-phase + semantic search |
| Scheduling | 4 | 3 | +1 | 9 cron jobs + 7 LaunchAgents + micro-cycle |
| Plugins | 4 | 3 | +1 | oMLX integration, Scrapling, custom hooks |
| Config Management | 4 | 3 | +1 | config-drift detection, agent-linter |
| Context Management | 4 | 3 | +1 | safeguard compaction, softThreshold 6000 |
Hermes Leads (7 categories)
| Category | OC | Hermes | Delta | Notes |
|---|---|---|---|---|
| Skill Auto-Generation | 1 | 5 | -4 | Automatic skill generation from usage patterns |
| Learning Loop | 2 | 5 | -3 | DSPy + GEPA self-evolution |
| Execution Backend | 2 | 5 | -3 | 6 backends: Docker / E2B / Modal / Fly.io / Lambda / local |
| Voice | 0 | 4 | -4 | Native voice I/O |
| Sub-agents | 2 | 4 | -2 | Native profile spawning + delegation |
| ML / Research | 1 | 4 | -3 | Trajectory analysis, experiment framework |
| IDE Integration | 1 | 4 | -3 | Native VS Code / JetBrains |
Ties (2 categories)
| Category | OC | Hermes |
|---|---|---|
| Model Routing | 4 | 4 |
| Tool System | 4 | 4 |
Totals
- OpenClaw: 63/90 (70%)
- Hermes: 75/90 (83%)
Score Interpretation: Portability vs. Architectural Lock-in
1. Separating Platform-Native Strengths from Custom Layers
OC memory at 5 vs. Hermes memory at 3 does not reflect a design superiority of the platform itself. The score is a product of the custom layer built on top of OpenClaw — memcore, U-tag, Reflect 6-phase. These are not fixed attributes of the platform; they are layers on top of it, and therefore portable to another platform.
Applied pattern: label each category score as either "platform baseline" or "custom accumulated during operation." This separation makes it explicit what is lost vs. what is carried over during migration.
2. Architecturally Fixed Strengths Are Not Portable
Among Hermes' leading categories, the learning loop (DSPy + GEPA), 6-backend execution, voice, and trajectory analysis cannot be addressed by investing more time into OpenClaw. These are determined at the framework architecture level. Replicating them externally incurs a cost equivalent to rebuilding the platform.
Decision rule: classify whether a capability is "attachable as a layer" or "embedded in the architecture" before interpreting the score.
3. Weaknesses Offset by Portable Layers
Hermes' memory score of 3 stems from a 2,200-character MEMORY.md limit. If memcore — built on OpenClaw to overcome the same constraint — can fill that gap as a layer, then the Hermes + memcore combination converges on a memory score of 5 after migration.
Principle: a weakness that was resolved via a custom layer can be resolved again by applying the same layer to a new platform. Therefore, the weakness column in the scorecard should also flag "whether a portable solution already exists."
Limitations and Scope
- 0–5 scoring introduces subjectivity onto qualitative indicators. Applying the same framework to a different team can shift scores by ±1.
- The 18 categories are domain-specific to agent platforms. A different domain (e.g., general PaaS comparison) requires a redesigned category set.
- The boundary between "portable" and "architecturally fixed" is not always clear. Intermediate cases — features built on top of a plugin system — require individual assessment.
Open Questions
- When platform scores are near-equal, what is the next axis that breaks the tie — maintenance cost, community, upstream change velocity?
- When portable custom layers grow thick enough, at what point should the layer itself be extracted as a "platform-independent component"?
Further entries continue in the same measurement format in separate records.
Series overview: Series index
๋๊ธ
๋๊ธ ์ฐ๊ธฐ