Agent Self-Improvement Harness (7/12) — Heartbeat v2: Multi-Mode State Machine and Escalation

4월 08, 2026

Designing a notification system as a state machine

핵심 요약

What this post covers: - Designing Heartbeat as a state machine — decomposing "alive" into a set of modes and state transitions. - Locking escalation to explicit, enumerated reasons — eliminating ambiguous alerts and treating "unknown" as a first-class citizen. - Reducing false-positives with a single parameter change — raising the cold-start timeout from 60s to 120s eliminates model-loading timeouts. - Proactive Preferences feedback loop — feeding user response and non-response into an EMA to dynamically adjust escalation thresholds.

v1 Limitations and v2 Design Goals

The v1 heartbeat operates on a binary state: alive / dead. The structural limitation is that it cannot represent alive but abnormal. This results in an accumulation of alerts where anomalies are detected but root cause is unclear.

v2 has three design goals: 1. Decompose state into the product of state × context 2. Fix alert conditions to explicit enums 3. Incorporate user response as a learning signal

Technique 1: Multi-Mode State Machine

v2 modes follow the {state}.{context} naming convention.

State Cluster	Modes
idle	`idle.normal`, `idle.degraded`, `idle.silent`
working	`working.normal`, `working.slow`, `working.stuck`
recovering	`recovering.from-crash`, `recovering.from-quota`, `recovering.from-network`
escalation	`escalating`, `escalated`, `cooling-down`
maintenance	`maintenance.scheduled`, `maintenance.unplanned`
fallback	`unknown`

How It Works

What matters is not the number of modes but the state transition graph. Example: working.normal → working.slow → working.stuck → escalating. Alerts are emitted only at state transition points, not at arbitrary intervals. This constraint structurally prevents repeated alerts for the same persisted state.

Technique 2: Multi-Stage Escalation Reason Enum

Alert conditions are locked to a set of explicit, enumerated reasons:

Quota exceeds 80%
Quota exceeds 100%
Same error repeated 3 times
Response time p95 threshold exceeded
Cron job fails on 2 consecutive runs
Memory directory size runaway
Embedding server unresponsive
External API returning 5xx consecutively
Self-review blocking pattern detected
Retain-tag validation failure rate spike
Zero user response over extended period
New agent self-diagnosis failure
Entry into unknown mode

The Role of Reason 13

Reason 13 is the most critical entry by design. It is the mechanism that promotes indeterminate state to a first-class citizen. A system that stays silent when classification fails carries greater latent risk than a system that fails on classifiable conditions. Designating unknown entry as an escalation reason establishes a path for the system to report "I don't know what I don't know."

Technique 3: Cold-Start Timeout Tuning

The dominant source of false-positive alerts was not complex logic but a single timeout value.

Symptom: First heartbeat call times out → classified as no response → escalation
Root cause: Model cold-start loading time (initial weight loading + warmup) exceeds the default 60s timeout
Fix: Separate cold-start-specific timeout, raised from 60s → 120s
Result: False-positives on this path eliminated

Generalizable Pattern

The principle that generalizes from this finding: cold and warm path timeouts must be separated into distinct constants. Handling both paths with a single timeout value means optimizing one path degrades the other. A full analysis of 60 operational failures is covered in Part 16.

Technique 4: Proactive Preferences Feedback Loop

Rather than keeping escalation thresholds static, user response is used as a signal to adjust them dynamically.

Input Signals

No response / "quiet" signal: raise the escalation threshold for that mode
"Why wasn't I notified" signal: lower the escalation threshold for that mode

Learning Parameters

Exponential Moving Average (EMA) + 14-day window
Learning rate too fast → alert misses (false-negatives)
Learning rate too slow → repeated user correction requests
A 14-day window balances convergence and responsiveness

Measured Outcome

EMA-based convergence structurally eliminates repeated threshold-tuning requests compared to static thresholds. However, the first two weeks of the learning period require significant user feedback — that upfront cost is the price for a quieter system afterward.

Limitations and Porting Direction

Current Limitations

Potential misbehavior during the initial 2-week learning period
High frequency of reason-13 (unknown) entries increases alert fatigue — sub-classification of unknown required
EMA window length (14 days) is an empirical constant; re-tuning required per domain

Hermes Port

Heartbeat trigger: cron + on_turn_start hook
State machine / escalation enum: ported as-is
Proactive loop: MemoryProvider.on_memory_write records user response patterns to memory → escalation threshold calculation references this memory

Applicability and Open Questions

Where This Design Applies

Systems where alert frequency directly impacts user satisfaction
Operational environments where state classification is feasible (observability in place)
Channels where user feedback can be collected

Open Questions

How far can the unknown mode entry rate be reduced?
Is there a better learning curve than EMA (e.g., Kalman filter)?
In multi-user environments, how should individual thresholds be separated from shared thresholds?

The essence of a good notification system reduces to: can you explicitly specify the conditions under which you will not alert? The mode set and escalation enum are that specification. Cold-start tuning and the EMA feedback loop are the mechanisms that protect that specification from misfiring.

Series overview: Series index

이 블로그 검색

MaJu Tech Notes