"Why I Ditched One All-Purpose AI for Nine Specialists — A Multi-Agent Design Story"

멀티 에이전트 온톨로지 실전 구현 — 메시지 허브, 컨텍스트 격리, 메모리 설계 2편

A role-based multi-agent architecture that solves context pollution and security vulnerabilities


핵심 요약

  • A single-agent setup failed within three days due to context pollution and security gaps
  • Tasks were split into four categories — coordination, execution, research, writing — forming a nine-agent system
  • Model assignment strategy, fallback chains, and least-privilege access controls tackle both cost and security

Background

When I first set up my AI agent platform, I assigned everything to a single agent. Email checks, web research, code generation, memory maintenance — one agent handled it all.

It failed in three days. Two critical problems surfaced:

  1. Context Pollution — Massive web-crawl results lingered in memory while the agent drafted email replies, causing the model to confuse contexts and waste tokens.
  2. No Permission Boundaries — File-system access for memory management and external web searches require fundamentally different permissions. Giving one agent blanket tool access was a security time bomb.
1. 메시지 허브 — 에이전트 간 통신 설계

The Design

Classification: Four Categories

A clear taxonomy was needed to divide agents. All tasks map to one of four categories:

Category Characteristics Required Tools
Coordination Request analysis, agent delegation sessions_spawn, exec, read, write
Execution Code generation, CLI, automation read, write, edit, exec
Research Web search, data collection web_search, web_fetch, read
Writing Documentation, email drafts read, write, edit

Adding dedicated roles for memory management and monitoring brought the total to nine agents.

3. 메모리 3계층 구조

Nine Agents at a Glance

Agent ID Role Model Tier
mir Coordinator, user interface Premium
monitor System health checks, pattern detection Cheap/Free
researcher Web research, trend scanning Balanced
communicator Content creation, email/document drafting Premium
orchestrator Coding, CLI automation, skill building Balanced
google Google Workspace API execution (Gmail, Calendar, Drive) Cheap/Free
memory-manager Reflect pipeline orchestration Balanced
memory-runner Reflect execution (spawned by manager) Free (Local)
reviewer Monthly memory review and summarization Cheap

Model Assignment Strategy

Different tasks get different models:

  • mir (Premium) — Needs top-tier models for accurate intent parsing
  • researcher (Balanced) — Requires fast throughput for large-volume text summarization, so speed-optimized models take priority
  • memory-runner (Free) — Simple repetitive file I/O runs entirely on a local model

Fallback Chains and Security

Three principles guard against cloud API failures:

  1. Provider Diversification — Alternate across multiple providers so a single outage never takes down the system
  2. Performance-Based Degradation — Start with the highest-performance model and step down to cheaper alternatives on failure
  3. Local Safety Net — A local model serves as the last line of defense, ensuring minimum functionality even during internet outages

Security follows least-privilege: the monitor agent gets only read and exec permissions; the researcher gets web_search and web_fetch but no file-system access. Agent spawning is controlled through explicit allowlists.

Lessons Learned

Too few agents and you get context pollution plus security holes. Too many — splitting "email read-only" from "email write-only," for instance — and management overhead explodes. The sweet spot is dividing by role-sized units: research, memory management, monitoring, and so on.

Takeaway

A nine-agent system organized around coarse-grained roles keeps contexts clean while delivering both security and cost efficiency. Trading the convenience of a single agent bought system-wide stability and scalability.

댓글

이 블로그의 인기 게시물

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System