Agent Operations Retrospective (2/7) — 10 Million Tokens Wasted Daily in an AI Cron Scheduler

A structural defect that loaded full LLM context for tasks that only needed bash


ํ•ต์‹ฌ ์š”์•ฝ

  • Structural defect pattern: the [SILENT] tag was interpreted as "suppress output" only — it never triggered an "skip LLM" execution path
  • Measurement and diagnosis of 6 bash-only jobs consuming ~196 ticks/day and ~10.1M tokens/day unnecessarily
  • After migrating to macOS launchd: ticks/day dropped 196→24, tokens/day dropped ~10.5M→0.4M — approximately 96% reduction
  • Core principle: LLM scheduler design must start with per-job LLM necessity classification → execution path branching

What This Post Covers

This post documents a common token leak pattern that emerges when attaching a cron-style scheduler to an LLM agent. It covers: how to surface the problem through measurement when a tag-based filter fails to branch correctly; what module structure and plist templates to use when migrating bash-only jobs to an OS scheduler; and which missing upstream API parameter allows this leak to become structural.

The tone is diagnostic — measurable defects and remediation techniques, not a narrative account.


The Problem Pattern: "Intent Tagged, Path Ignored"

A job named memory-session-scan was running every 10 minutes in the AI scheduler. It was a pure bash task — scanning session files and writing records to SQLite — yet token costs were being logged on every execution.

This job involves no non-deterministic judgment. Token charges on a pure I/O task are a signal that the tag interpretation layer is not connected to actual execution branching.


Root Cause: Context Loading Hardcoded Into the Execution Path

The defect is concentrated at scheduler.py:742.

def _execute_job(self, job: CronJob) -> None:
    ...
    result = self.agent.run_conversation(
        message=job.command,
        context_files=["SOUL.md", "USER.md", "CLAUDE.md"]
    )

Every call to run_conversation() loads SOUL.md, USER.md, and CLAUDE.md as context — 4K+ tokens combined. The critical point: this path is applied unconditionally to every job. Bash-only jobs tagged [SILENT] travel the same path.

The [SILENT] tag was intended as a marker for "execute without LLM," but in the actual implementation it only suppressed log output after execution. No branching logic on the tag existed inside _execute_job(). The tag is present; it creates no branch in the execution path. This is the canonical form of the defect.


Measurement: Quantifying the Waste

Token waste calculated per job across the 6 offending jobs:

Job Frequency Ticks/day Est. tokens/tick Daily waste
memory-session-scan every 10 min 144 ~38,000 ~5.5M
memory-micro-cycle every 30 min 48 ~94,000 ~4.5M
4 others various 4 various ~0.1M
Total 196 ~10.1M

10.1M tokens per day. The fraction of those tokens where LLM judgment was actually required: zero. All six jobs consist entirely of deterministic operations — file scanning, log rotation, checksum comparison.

Even without translating to dollar cost, these tokens consume context window capacity and add API response latency. The waste structure is identical on local models.


Remediation: Migrate Bash-Only Jobs to the OS Scheduler

Jobs that require no LLM judgment belong at the OS scheduler layer. On macOS, that is launchd.

Create a new modules/cron-launchd/ module with a simple structure:

modules/cron-launchd/
├── __init__.py
├── manager.py          # plist generation, install, uninstall
└── plists/             # generated plist files

For each bash-only job, generate a .plist and install it under ~/Library/LaunchAgents/. Integrate the CLI as a memcore subcommand:

python -m memcore launchd-install

python -m memcore launchd-status

python -m memcore launchd-uninstall

plist template for memory-session-scan:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.hermes.memory-session-scan</string>
    <key>ProgramArguments</key>
    <array>
        <string>/path/to/python</string>
        <string>-m</string>
        <string>memcore</string>
        <string>session-scan</string>
    </array>
    <key>StartInterval</key>
    <integer>600</integer>
    <key>RunAtLoad</key>
    <false/>
</dict>
</plist>

Classification: Which Jobs Stay in LLM Cron vs. Move to OS Scheduler

The 5 jobs that remain in the AI scheduler after migration are those where LLM usage is justified:

Job Reason
heartbeat-tick Agent state assessment + response generation
daily-brief Daily summary generation (LLM required)
reflect Self-reflection cycle (LLM required)
research-scan Research trigger evaluation
monthly-memory Long-term memory compression (LLM required)

These cannot be replaced by bash and involve content generation or non-deterministic judgment. The branching criterion is simple: "Can this job's output be described as a deterministic function?" If yes → OS scheduler. If no → LLM cron.


Results

Metric Before After Change
AI cron jobs 11 5 -55%
Daily ticks ~196 ~24 -88%
Daily token consumption ~10.5M ~0.4M -96%

6 of 11 jobs moved to launchd. Token consumption dropped from 10.5M to ~0.4M per day.


Limitations and Upstream Issue

This fix addresses the symptom. The root cause is a design gap: run_conversation() has no skip_context_files parameter. Even when the caller signals that a job is bash-only, there is no path in the API layer to skip context file loading.

This gap is tracked as GitHub #7876. When skip_context_files=True is added, the launchd migration becomes unnecessary as a workaround — the AI cron can handle it internally. That said, even after the parameter lands, the separation principle of delegating deterministic jobs to the OS scheduler retains value. Keeping LLM schedulers responsible only for LLM-requiring work improves failure isolation and debuggability.


Applicability

This pattern is not specific to Hermes. It recurs structurally wherever periodic jobs are attached to LLM agents. If any of the following conditions apply, the same leak is likely present:

  • The scheduler forces context file loading per job unconditionally
  • Tags such as "silent", "quiet", or "no-llm" exist but create no branch in the execution path
  • Pure I/O jobs run periodically and token costs are being logged

Diagnosis is three steps: (1) log per-tick token consumption per job; (2) classify each job's output as deterministic or not; (3) migrate deterministic jobs to the OS scheduler, or add a context-loading skip path.


Open Questions

  • How to enforce tag semantics onto execution paths at design time (runtime validation? type system?)
  • On Linux/Windows, how far to abstract a systemd/Task Scheduler adapter to maintain equivalent separation
  • Can a scheduler auto-classify "does this job require LLM judgment" without developer annotation (static analysis heuristics)?

Tagging intent is insufficient. The structural lesson from this case: verify at design time that the tag actually creates a branch in the execution path.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System