Agent Operations Retrospective (2/7) — 10 Million Tokens Wasted Daily in an AI Cron Scheduler
A structural defect that loaded full LLM context for tasks that only needed bash
ํต์ฌ ์์ฝ
- Structural defect pattern: the
[SILENT]tag was interpreted as "suppress output" only — it never triggered an "skip LLM" execution path - Measurement and diagnosis of 6 bash-only jobs consuming ~196 ticks/day and ~10.1M tokens/day unnecessarily
- After migrating to macOS
launchd: ticks/day dropped 196→24, tokens/day dropped ~10.5M→0.4M — approximately 96% reduction - Core principle: LLM scheduler design must start with per-job LLM necessity classification → execution path branching
What This Post Covers
This post documents a common token leak pattern that emerges when attaching a cron-style scheduler to an LLM agent. It covers: how to surface the problem through measurement when a tag-based filter fails to branch correctly; what module structure and plist templates to use when migrating bash-only jobs to an OS scheduler; and which missing upstream API parameter allows this leak to become structural.
The tone is diagnostic — measurable defects and remediation techniques, not a narrative account.
The Problem Pattern: "Intent Tagged, Path Ignored"
A job named memory-session-scan was running every 10 minutes in the AI scheduler. It was a pure bash task — scanning session files and writing records to SQLite — yet token costs were being logged on every execution.
This job involves no non-deterministic judgment. Token charges on a pure I/O task are a signal that the tag interpretation layer is not connected to actual execution branching.
Root Cause: Context Loading Hardcoded Into the Execution Path
The defect is concentrated at scheduler.py:742.
def _execute_job(self, job: CronJob) -> None:
...
result = self.agent.run_conversation(
message=job.command,
context_files=["SOUL.md", "USER.md", "CLAUDE.md"]
)
Every call to run_conversation() loads SOUL.md, USER.md, and CLAUDE.md as context — 4K+ tokens combined. The critical point: this path is applied unconditionally to every job. Bash-only jobs tagged [SILENT] travel the same path.
The [SILENT] tag was intended as a marker for "execute without LLM," but in the actual implementation it only suppressed log output after execution. No branching logic on the tag existed inside _execute_job(). The tag is present; it creates no branch in the execution path. This is the canonical form of the defect.
Measurement: Quantifying the Waste
Token waste calculated per job across the 6 offending jobs:
| Job | Frequency | Ticks/day | Est. tokens/tick | Daily waste |
|---|---|---|---|---|
| memory-session-scan | every 10 min | 144 | ~38,000 | ~5.5M |
| memory-micro-cycle | every 30 min | 48 | ~94,000 | ~4.5M |
| 4 others | various | 4 | various | ~0.1M |
| Total | 196 | ~10.1M |
10.1M tokens per day. The fraction of those tokens where LLM judgment was actually required: zero. All six jobs consist entirely of deterministic operations — file scanning, log rotation, checksum comparison.
Even without translating to dollar cost, these tokens consume context window capacity and add API response latency. The waste structure is identical on local models.
Remediation: Migrate Bash-Only Jobs to the OS Scheduler
Jobs that require no LLM judgment belong at the OS scheduler layer. On macOS, that is launchd.
Create a new modules/cron-launchd/ module with a simple structure:
modules/cron-launchd/
├── __init__.py
├── manager.py # plist generation, install, uninstall
└── plists/ # generated plist files
For each bash-only job, generate a .plist and install it under ~/Library/LaunchAgents/. Integrate the CLI as a memcore subcommand:
python -m memcore launchd-install
python -m memcore launchd-status
python -m memcore launchd-uninstall
plist template for memory-session-scan:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.hermes.memory-session-scan</string>
<key>ProgramArguments</key>
<array>
<string>/path/to/python</string>
<string>-m</string>
<string>memcore</string>
<string>session-scan</string>
</array>
<key>StartInterval</key>
<integer>600</integer>
<key>RunAtLoad</key>
<false/>
</dict>
</plist>
Classification: Which Jobs Stay in LLM Cron vs. Move to OS Scheduler
The 5 jobs that remain in the AI scheduler after migration are those where LLM usage is justified:
| Job | Reason |
|---|---|
| heartbeat-tick | Agent state assessment + response generation |
| daily-brief | Daily summary generation (LLM required) |
| reflect | Self-reflection cycle (LLM required) |
| research-scan | Research trigger evaluation |
| monthly-memory | Long-term memory compression (LLM required) |
These cannot be replaced by bash and involve content generation or non-deterministic judgment. The branching criterion is simple: "Can this job's output be described as a deterministic function?" If yes → OS scheduler. If no → LLM cron.
Results
| Metric | Before | After | Change |
|---|---|---|---|
| AI cron jobs | 11 | 5 | -55% |
| Daily ticks | ~196 | ~24 | -88% |
| Daily token consumption | ~10.5M | ~0.4M | -96% |
6 of 11 jobs moved to launchd. Token consumption dropped from 10.5M to ~0.4M per day.
Limitations and Upstream Issue
This fix addresses the symptom. The root cause is a design gap: run_conversation() has no skip_context_files parameter. Even when the caller signals that a job is bash-only, there is no path in the API layer to skip context file loading.
This gap is tracked as GitHub #7876. When skip_context_files=True is added, the launchd migration becomes unnecessary as a workaround — the AI cron can handle it internally. That said, even after the parameter lands, the separation principle of delegating deterministic jobs to the OS scheduler retains value. Keeping LLM schedulers responsible only for LLM-requiring work improves failure isolation and debuggability.
Applicability
This pattern is not specific to Hermes. It recurs structurally wherever periodic jobs are attached to LLM agents. If any of the following conditions apply, the same leak is likely present:
- The scheduler forces context file loading per job unconditionally
- Tags such as "silent", "quiet", or "no-llm" exist but create no branch in the execution path
- Pure I/O jobs run periodically and token costs are being logged
Diagnosis is three steps: (1) log per-tick token consumption per job; (2) classify each job's output as deterministic or not; (3) migrate deterministic jobs to the OS scheduler, or add a context-loading skip path.
Open Questions
- How to enforce tag semantics onto execution paths at design time (runtime validation? type system?)
- On Linux/Windows, how far to abstract a systemd/Task Scheduler adapter to maintain equivalent separation
- Can a scheduler auto-classify "does this job require LLM judgment" without developer annotation (static analysis heuristics)?
Tagging intent is insufficient. The structural lesson from this case: verify at design time that the tag actually creates a branch in the execution path.
Series overview: Series index
๋๊ธ
๋๊ธ ์ฐ๊ธฐ