Agent Self-Improvement Harness (7/12) — Heartbeat v2: Multi-Mode State Machine and Escalation
Designing a notification system as a state machine
ํต์ฌ ์์ฝ
What this post covers: - Designing Heartbeat as a state machine — decomposing "alive" into a set of modes and state transitions. - Locking escalation to explicit, enumerated reasons — eliminating ambiguous alerts and treating "unknown" as a first-class citizen. - Reducing false-positives with a single parameter change — raising the cold-start timeout from 60s to 120s eliminates model-loading timeouts. - Proactive Preferences feedback loop — feeding user response and non-response into an EMA to dynamically adjust escalation thresholds.
v1 Limitations and v2 Design Goals
The v1 heartbeat operates on a binary state: alive / dead. The structural limitation is that it cannot represent alive but abnormal. This results in an accumulation of alerts where anomalies are detected but root cause is unclear.
v2 has three design goals: 1. Decompose state into the product of state × context 2. Fix alert conditions to explicit enums 3. Incorporate user response as a learning signal
Technique 1: Multi-Mode State Machine
v2 modes follow the {state}.{context} naming convention.
| State Cluster | Modes |
|---|---|
| idle | idle.normal, idle.degraded, idle.silent |
| working | working.normal, working.slow, working.stuck |
| recovering | recovering.from-crash, recovering.from-quota, recovering.from-network |
| escalation | escalating, escalated, cooling-down |
| maintenance | maintenance.scheduled, maintenance.unplanned |
| fallback | unknown |
How It Works
What matters is not the number of modes but the state transition graph. Example: working.normal → working.slow → working.stuck → escalating. Alerts are emitted only at state transition points, not at arbitrary intervals. This constraint structurally prevents repeated alerts for the same persisted state.
Technique 2: Multi-Stage Escalation Reason Enum
Alert conditions are locked to a set of explicit, enumerated reasons:
- Quota exceeds 80%
- Quota exceeds 100%
- Same error repeated 3 times
- Response time p95 threshold exceeded
- Cron job fails on 2 consecutive runs
- Memory directory size runaway
- Embedding server unresponsive
- External API returning 5xx consecutively
- Self-review blocking pattern detected
- Retain-tag validation failure rate spike
- Zero user response over extended period
- New agent self-diagnosis failure
- Entry into
unknownmode
The Role of Reason 13
Reason 13 is the most critical entry by design. It is the mechanism that promotes indeterminate state to a first-class citizen. A system that stays silent when classification fails carries greater latent risk than a system that fails on classifiable conditions. Designating unknown entry as an escalation reason establishes a path for the system to report "I don't know what I don't know."
Technique 3: Cold-Start Timeout Tuning
The dominant source of false-positive alerts was not complex logic but a single timeout value.
- Symptom: First heartbeat call times out → classified as
no response→ escalation - Root cause: Model cold-start loading time (initial weight loading + warmup) exceeds the default 60s timeout
- Fix: Separate cold-start-specific timeout, raised from
60s → 120s - Result: False-positives on this path eliminated
Generalizable Pattern
The principle that generalizes from this finding: cold and warm path timeouts must be separated into distinct constants. Handling both paths with a single timeout value means optimizing one path degrades the other. A full analysis of 60 operational failures is covered in Part 16.
Technique 4: Proactive Preferences Feedback Loop
Rather than keeping escalation thresholds static, user response is used as a signal to adjust them dynamically.
Input Signals
- No response / "quiet" signal: raise the escalation threshold for that mode
- "Why wasn't I notified" signal: lower the escalation threshold for that mode
Learning Parameters
- Exponential Moving Average (EMA) + 14-day window
- Learning rate too fast → alert misses (false-negatives)
- Learning rate too slow → repeated user correction requests
- A 14-day window balances convergence and responsiveness
Measured Outcome
EMA-based convergence structurally eliminates repeated threshold-tuning requests compared to static thresholds. However, the first two weeks of the learning period require significant user feedback — that upfront cost is the price for a quieter system afterward.
Limitations and Porting Direction
Current Limitations
- Potential misbehavior during the initial 2-week learning period
- High frequency of reason-13 (
unknown) entries increases alert fatigue — sub-classification ofunknownrequired - EMA window length (14 days) is an empirical constant; re-tuning required per domain
Hermes Port
- Heartbeat trigger: cron +
on_turn_starthook - State machine / escalation enum: ported as-is
- Proactive loop:
MemoryProvider.on_memory_writerecords user response patterns to memory → escalation threshold calculation references this memory
Applicability and Open Questions
Where This Design Applies
- Systems where alert frequency directly impacts user satisfaction
- Operational environments where state classification is feasible (observability in place)
- Channels where user feedback can be collected
Open Questions
- How far can the
unknownmode entry rate be reduced? - Is there a better learning curve than EMA (e.g., Kalman filter)?
- In multi-user environments, how should individual thresholds be separated from shared thresholds?
The essence of a good notification system reduces to: can you explicitly specify the conditions under which you will not alert? The mode set and escalation enum are that specification. Cold-start tuning and the EMA feedback loop are the mechanisms that protect that specification from misfiring.
Series overview: Series index
๋๊ธ
๋๊ธ ์ฐ๊ธฐ