Agent Operations Retrospective (5/7) — When Security Policy Breaks Delegation

ํ•ต์‹ฌ ์š”์•ฝ — What This Post Covers

  • Intersection rule: A sub-agent's effective toolset is the intersection of its own toolset and the parent's. This security constraint prevents privilege escalation, but shrinking the parent silently disables the sub.
  • Delegation prerequisite: The parent must hold any tool it intends to delegate. If it doesn't, the sub's effective tools drop to zero and the model synthesizes (hallucinates) plausible answers over an empty schema.
  • L0 vs L1: The trade-off between structural enforcement (L0, toolset removal) and behavioral discipline (L1, system prompt). As long as sub-delegation exists, L0 toolset reduction is not a safe optimization target.
  • Correct token optimization layer: In a delegation graph, the right levers are call-frequency control, schema caching, and behavioral discipline — not parent toolset reduction.

Background — A Token Optimization Attempt That Failed

Hermes had a token runaway problem. The main agent was loading a heavy schema wholesale on every invocation. OpenClaw had already stabilized its security and self-improvement loop; this issue was isolated to Hermes.

The initial fix was straightforward: reduce platform_toolsets.discord from 17 tools to 10. Keep only what the main agent uses directly — discord, skill, delegate_task — and strip the heavy browser/terminal/web toolsets. The reasoning: heavy work is delegated to sub-agents, so only the sub needs those tools. This design did not work. The following explains why, and distills the pattern so it doesn't repeat in systems with the same structure.

Symptom — Hallucinations Appeared Immediately After Toolset Reduction

Two anomaly signals appeared simultaneously in the GitHub PR lookup path right after the reduction:

  1. The sub-agent returned PR #110 and PR #109 as "query results." The actual recent PR numbers in that repository were in the 12000s.
  2. The sub-agent's tool call log was empty — PR data was generated with no tool invocations recorded.

The second signal was decisive. A result without any tool call means the model synthesized values from its training distribution. The temporal proximity to the toolset reduction supported a causal inference.

How It Works — The Intersection Rule

The root cause is tools/delegate_tool.py:311-313:

if toolsets:
    # Intersect with parent — subagent must not gain tools the parent lacks
    child_toolsets = _strip_blocked_tools([t for t in toolsets if t in parent_toolsets])

The comment describes the behavior exactly: a sub-agent cannot acquire a toolset the parent does not hold. parent_toolsets is derived from the parent's active toolset earlier in the same file.

The design intent is clear — prevent privilege escalation. If a sub could hold broader permissions than its parent, the concept of constraining the main agent would be meaningless.

The Conflict — Two Policies That Cannot Coexist

Policy Requirement
Intersection rule (security) sub ⊆ parent. Sub can only hold a subset of what the parent holds.
Operational intent (token optimization) Parent is light; heavy work goes to sub. Implies sub ⊋ parent or sub ⊅ parent.

These two policies are mathematically incompatible. The moment the parent's toolset was reduced from 17 to 10, the browser/terminal toolsets the sub requested fell outside the parent's set and were excluded by intersection. Sub effective tools = 0.

Hallucination Mechanism

When effective tools = 0, the LLM has two possible responses:

  1. Explicitly refuse with "no tools available."
  2. Synthesize a result from training priors over the empty schema.

The critical point: the intersection rule silently empties the sub's toolset — it does not send an explicit "tools revoked" signal to the sub. From the sub's perspective, receiving an empty schema is indistinguishable from having no tools assigned in the first place. Choosing path 1 (refusal) requires separate behavioral discipline at the system prompt level. Without that discipline, the model took path 2. The result was PR #110 and #109.

In short: the security rule worked exactly as designed, and the hallucination was a side effect of that correct operation.

Response — Abandon L0, Switch to L1

Two options were available:

  • A. Patch around the intersection rule: Remove or relax the intersection logic so a sub can hold tools beyond what the parent holds. This collapses the security model.
  • B. Abandon L0 structural enforcement, switch to L1: Restore the parent toolset and insert behavioral discipline into the system prompt.

Option B was adopted. The rationale: privilege escalation prevention ranks above token savings.

L0 vs L1 Comparison

Layer Mechanism Enforcement Cost
L0 Remove tools from the system entirely LLM cannot violate (structural) Incompatible with delegation
L1 Prepend behavioral discipline to system prompt LLM can violate (probabilistic) Delegation remains viable

Redesign Applied

  1. Restored all 17 tools in platform_toolsets.discord.
  2. Rewrote the behavioral discipline section in SOUL.md:
  3. Explicitly states that structural constraints have been lifted.
  4. Embeds a self-check step the main agent must run before accepting sub-agent responses.
  5. Declares that the model itself is the last line of defense against hallucination.
  6. The previous configuration was preserved as a backup at the redesign checkpoint (rollback path secured).

Verification — Re-running the Same Request

After restoration, the same PR lookup request was re-executed.

  • Main agent calls: delegate_task ×1, skill_view ×2.
  • Sub-agent effective tools: restored to double-digit count; browser toolset utilized normally.
  • Returned PR numbers: #12497, #12495, #12494 (against the repository presumed to be NousResearch/hermes-agent).
  • Data matched actuals from that point in time. PR #110 / #109 hallucination eliminated.

Extractable Patterns

Reusable principles from this case:

  1. In a delegation graph, the parent's toolset is the ceiling. Reducing the parent is equivalent to lowering the sub's capability ceiling. The effect is inversely proportional to any token savings.
  2. Compatibility between security rules and operational intent must be verified at design time. Both can be individually reasonable yet non-viable in combination. Intersection rule × parent lightening is that case.
  3. L0 structural enforcement is only safely applicable to single-agent configurations. Once a delegation graph is introduced, the system must drop to L1.
  4. Correct token optimization layers: (a) call-frequency control (e.g., Token Guard real-time monitoring), (b) schema caching, (c) blocking unnecessary calls via behavioral discipline. Parent toolset reduction does not belong on this list.
  5. Empty-schema hallucination prevention requires explicit refusal discipline. The behavioral rule that causes a model facing an empty toolset to choose refusal must be designed separately at L1.

Limitations and Open Questions

Switching to L1 moves what the structure once guaranteed into what the model must now uphold. Even with a 4-step self-check embedded in the system prompt, the probability that the model skips the procedure on some calls is not zero. Questions remaining open at the current observation stage:

  • How frequently does the same hallucination pattern recur?
  • Does the sub-agent reliably generate refusal responses when it encounters empty results?
  • What fraction of actual calls trigger the main agent's self-check?

These questions cannot be answered until sufficient log samples accumulate. The current state is not "resolved" — it is "conditionally operational."

Scope of Applicability

  • Applicable: Multi-agent systems using Claude Code or equivalent frameworks with a custom delegation layer and an intersection-style security policy.
  • Not applicable: Single-agent configurations already using L0 toolset reduction. Without a delegation graph, the intersection does not produce this failure.
  • Generalizable principle: In any multi-agent system with a permission boundary rule (intersection, sandboxing, capability scoping, etc.), reducing the parent's capability directly reduces the ceiling on every descendant in the delegation graph.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System