Multi-Agent Routing Design — Classifier Strategy, Provider Abstraction, Cost Lifecycle

The most expensive mistake in routing layer design is deferring model selection until runtime.


Key Takeaways

  • The request classifier strategy is the primary determinant of routing layer quality.
  • Multi-provider abstraction is not about unifying interfaces — the core is fallback chain design.
  • Without integrating cost lifecycle into the model selection loop, a routing layer adds complexity without cost optimization.

Design Background

These are structural lessons from designing a multi-provider LLM routing layer (Router_Control) in Node.js + TypeScript. Target providers: OpenAI, Claude, Ollama, oMLX. The goal was automatic distribution of requests to the appropriate model based on request type.

This post focuses not on a success story, but on which design decisions determined outcomes.


Body

1. Classifier Strategy — The Real Quality Determinant of Routing

The first design decision in a routing layer: how to classify requests.

The initial design was keyword-based classification. If the request text contained keywords like "code," "translate," or "summarize," it routed to the corresponding model. Fast and simple to implement.

The problem: keywords do not represent intent. "What do you think of this code?" is an interpretation request, not code generation. Keyword-based classification cannot distinguish between the two.

The approach that worked:

interface ClassifierResult {
  taskType: 'code-gen' | 'code-review' | 'translation' | 'embedding' | 'general';
  complexity: 'low' | 'medium' | 'high';
  latencyRequirement: 'realtime' | 'batch';
  costBudget: 'local-only' | 'cloud-ok' | 'premium-ok';
}

Represent the classification result as a multi-dimensional struct rather than a single label. Combining task type, complexity, latency requirements, and cost budget is what makes routing decisions meaningfully precise.

The classifier itself can be delegated to an LLM, but that incurs classifier call costs. A two-tier structure — rule-based for simple requests, LLM classifier only for complex ones — was the cost-accuracy balance point.


2. Multi-Provider Abstraction — The Interface Unification Trap

The standard approach for multi-provider routing is to define a common interface:

interface LLMProvider {
  complete(prompt: string, options: CompletionOptions): Promise<CompletionResult>;
}

This abstraction is clean at the code level. In production, it has a structural problem.

Each provider fails differently. OpenAI fails with rate limit 429. Ollama fails with a timeout when the model isn't loaded. oMLX terminates the process on memory exhaustion. Hiding these differences behind a common interface means fallback logic cannot distinguish failure causes.

A structurally sounder approach:

interface ProviderError {
  type: 'rate-limit' | 'model-unavailable' | 'resource-exhausted' | 'network';
  retryable: boolean;
  fallbackSuggestion: ProviderId | null;
}

Type failures per provider, and branch the fallback chain by failure type. Rate limits retry to another endpoint on the same provider; unloaded models immediately pass to the next provider; resource exhaustion downgrades to a lighter model.

The objective is not interface unification — fallback chain design by failure type is what matters.


3. Cost Lifecycle — Meaningless Without Integration Into the Selection Loop

If a routing layer aims for cost optimization, cost information must exist inside the model selection loop.

The problem with the initial design: cost tracking was logging. Token consumption was recorded and displayed on a dashboard, but that information did not influence model selection for subsequent requests. Monitoring and control were decoupled.

Integrating cost lifecycle into the routing loop:

interface CostBudgetTracker {
  sessionBudget: number;           // total session budget
  consumed: number;                // consumed so far
  remainingRatio: number;          // remaining ratio
  nextModelConstraint(): CostBudget; // model constraint based on remaining budget
}

Remaining budget above 50%: premium models permitted. Below 20%: local-only enforced. This constraint combines with the classifier result to determine the final model.

Without this structure, the routing layer does not optimize cost — it only adds routing complexity.


4. OpenClaw–Hermes Integration History — Factual Record

The routing layer was designed for integration with the OpenClaw (OC) multi-agent platform. The structural transitions that occurred in this context are recorded as-is.

Sequence: OC stable operation → Hermes (HM) first migration attempt → token explosion → OC regression → OC redesign → HM retry (currently in validation).

Router_Control entered Suspended status due to priority re-adjustment during the OC→HM transition. This was not a technical failure. The token explosion issue was on the HM side; the routing layer itself operated at prototype level.

Once HM retry validation completes, resuming the routing layer based on the design patterns covered here — multi-dimensional classifier, typed fallback chains, cost loop integration — is under consideration.

Related PR: harness git PR#12497.


5. The Design Value of Suspended Status

Marking a component as "Suspended" in an ontology serves a purpose beyond simple status indication.

  • Context preservation: The design rationale for why it started, what it achieved, and why it stopped is preserved.
  • Reduced resumption cost: Resumption does not require redesigning from scratch — it can restart from the previous decision point.
  • Decision reference: When the same problem is encountered again, prior decision context is immediately retrievable.
  • Ontology honesty: An ontology with only Active and Completed hides suspended components. Explicit Suspended status exposes the system's actual state.

In multi-agent systems, each component carries a maintenance cost. Continuously evaluating value against cost — and honestly transitioning low-value components to Suspended — is how overall system efficiency is maintained.


Lessons Learned

Designing the classifier with a single label: Initially routing on taskType alone. Routing accuracy improved meaningfully when complexity and cost budget were added as separate dimensions.

Fallback logic that did not distinguish failure causes: Any failure was simply passed to the next provider. Only after typing failure modes did the responses to rate limits and resource exhaustion diverge.

Cost tracking that stayed at logging: Costs were recorded but never fed back into the model selection loop. Separating monitoring from control produces visibility, not optimization.


Conclusion

The real quality of a multi-provider routing layer depends not on the cleanliness of interface abstraction, but on three design decisions: multi-dimensional classifier, typed fallback chain by failure mode, and cost loop integration. Without these three inside the selection loop, a routing layer is middleware that only adds complexity.

Router_Control is currently Suspended, but the design patterns remain valid. Resumption on this foundation is planned after HM validation completes.

댓글

이 블로그의 인기 게시물

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System