Model Routing Strategy — Why haiku/sonnet/opus Are Assigned by Task Type

OpenClaw 보안 아키텍처 설계 — 단일 사용자 로컬 AI 에이전트의 7계층 방어

Why using the most powerful model for every task is not optimal, and how to distribute them in practice


배경

Summary

  • Model routing is the strategy of assigning haiku/sonnet/opus based on task type. Using opus for mechanical tasks wastes cost; using haiku for complex judgment risks quality.
  • Models are fixed per agent role: orchestrator=opus (judgment), executor=sonnet (implementation), quality=sonnet (verification). Sub-agents follow isolation protocol.
  • The answer to "can't we just use opus for everything?": it's not only a cost issue — over-reasoning on simple tasks causes unintended side effects.

Background

AI models come in tiers. Within Claude: haiku (lightweight), sonnet (mid-tier), opus (top-tier). Cost and capability scale proportionally.

When designing an agent system for the first time, the intuitive choice is "run everything on opus." It's the best model, so it should produce the best results. In practice, two problems emerge.

First: cost. Calling opus for a single file search or keyword lookup causes token costs to accumulate quickly. In automated workflows, this difference is significant.

Second: over-reasoning. Assign opus to a simple file formatting task and the model begins evaluating whether the current structure is optimal. It suggests unsolicited improvements and over-interprets context. A simple task becomes unnecessarily complex.

Model routing solves this. It assigns each model to tasks that match its capability profile.


7계층 보안 아키텍처

Body

1. Three-Tier Model Routing Table

Task Type Model Rationale
File search, keyword lookup, simple formatting haiku Mechanical. No judgment required.
Content writing, research compilation, draft editing sonnet Implementation capability required.
Strategy design, review synthesis, fact-check arbitration opus Complex judgment + context understanding.

What haiku handles: "Find all .md files in this folder", "Convert this text to a markdown table", "List files containing this keyword" — tasks with a single correct answer requiring no judgment. Fast, inexpensive, accurate.

What sonnet handles: "Compile these materials into a source file", "Write a blog draft on this topic", "Structure these research results" — tasks that combine multiple inputs and generate prose. Judgment is limited; implementation capability is required.

What opus handles: "Determine whether this request is a simple inquiry or a deep analysis", "Three sources make conflicting claims — evaluate which is more credible", "Check the logical flow of this article for gaps" — tasks requiring comprehension of complex context and simultaneous application of multiple evaluation criteria.


2. Model Assignment by Agent Role

Routing at the individual task level requires deciding "which model fits this task?" for every call. That decision itself carries cost. Fixing a model per agent role eliminates the need for that decision.

Agent Model Role Output
main-orchestrator opus Conversation, judgment, classification, simple research/writing, publish confirmation Source + Draft (simple)
executor-agent sonnet Deep research, source file authoring, blog/twitter writing Source + Draft (complex)
quality-agent sonnet Fact-check, verification. Read-only. None

Why orchestrator uses opus: This agent receives user requests and classifies them. "Is this request simple or complex?" "Which pipeline preset applies?" "What is the user's intent?" These decisions require deep contextual understanding. Misclassification degrades the quality of the entire pipeline downstream.

Why executor uses sonnet: The execution agent operates within decisions already made by the orchestrator. Direction is fixed; only implementation remains. Organizing research output, authoring source files, and writing blog posts are tasks sonnet handles sufficiently well.

Why quality uses sonnet: The verification agent checks "does this article contain unsourced figures?" and "do claims align with their evidence?" This is checklist-based verification — opus-level complex reasoning is unnecessary. When conflicting sources are detected, the agent escalates to the orchestrator.


3. "Can't We Just Use Opus for Everything?"

This question will always come up. Three perspectives:

Cost. Opus carries a per-token cost orders of magnitude higher than haiku. Agent systems call models dozens to hundreds of times during automated workflows. Running ten file searches in haiku is negligible cost; in opus, it becomes meaningful spend.

Over-reasoning. Issue the command "find the most recent file in the sources/ folder" to opus, and the model may read the file contents and decide "the structure of this file could be improved." That is not what was asked. More capable models tend to act beyond the given scope, and this is sometimes an obstacle rather than a benefit.

Latency. Larger models take longer to respond. For mechanical tasks, speed matters. Haiku responds faster than opus, and this difference affects total pipeline throughput.

Conclusion: reserving opus for tasks that require genuine judgment is more efficient for the system as a whole.


4. Sub-Agent Isolation Protocol

Equally important as model routing is sub-agent isolation. Four rules apply when one agent invokes another.

1. Do not pass session history. When the orchestrator calls the executor, it does not hand over the full conversation history. Irrelevant context degrades sub-agent performance.

2. Extract only the necessary context from the handoff document. Pass only the information required for the task. "Write a blog post on this topic. The reference source is this. Follow this structure." That is sufficient.

3. One sub-agent = one task = minimum context. Do not assign multiple tasks to a single sub-agent. Three tasks require three sub-agent invocations.

4. After sub-agent completion: receive results only. Discard internal process. When the executor completes a blog post, only the finished post is received. Intermediate references, iterations, and internal decisions made during drafting are not retrieved.

The purpose of this isolation protocol is context contamination prevention. If a sub-agent's internal process bleeds into the parent agent's context, it becomes irrelevant noise for subsequent decisions.


Design Validation: Token Runaway

The most challenging practical problem in model routing design is token runaway. Even with correct model assignments, improper context propagation between agents causes token usage to grow exponentially.

This pattern was confirmed during actual validation. When migrating from a stable agent architecture (OpenClaw) to a new one (Hermes), token usage spiked sharply under specific conditions. The cause was not model assignment — it was context accumulation: agents retained intermediate state from completed tasks and passed it as context in subsequent calls.

Two countermeasures address this.

First, enforce the sub-agent isolation protocol (§4 above) strictly. Discard sub-agent internal state immediately upon task completion.

Second, design an agent rollback path. If the new architecture becomes unstable, the system must be able to revert to the prior stable structure immediately. Maintain the existing architecture in parallel until migration validation is complete.

This pattern applies not only to agent migration but to any model routing change. Token usage trends must be monitored after every routing modification.


Closing

The core of model routing is recognizing that "more expensive model ≠ better results." The right model produces the right results.

haiku for mechanical tasks, sonnet for implementation, opus for judgment. This distribution reduces cost while maintaining quality. Add sub-agent isolation, and each agent operates exclusively within its designated role.

The next post covers how to verify this entire structure is functioning correctly — the self-audit skill.

댓글

이 블로그의 인기 게시물

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System