"Claude Code --continue/--resume Prompt Cache Invalidation — Reproduced on v2.1.116"
Cache Destruction by Session-Resume Events, Isolated via TTL-matched Control Pairs
핵심 요약
- Symptom: Invoking
claude --continueor--resumeinvalidates the prompt cache even when within the one-hour TTL.deferred_tools_delta, introduced in v2.1.69, is identified as the root cause. Reproduction is confirmed on v2.1.116 despite patches released in v2.1.90–92. - Measurement method: A TTL-matched control-pair methodology was applied — comparing cache hit rates between consecutive intra-session turns (baseline) and the first turn immediately after a session resume (test), both at identical idle gaps, thereby eliminating TTL expiry as a confounding variable. Results: ~4-minute idle gap −56 pp; ~28-minute idle gap −99 pp.
- Response: GitHub issue #51764 is being tracked. Confirmation of the patch scope for the custom-agent + MCP + skill combination path is requested. Introduction of a
--no-deferred-tools-deltaopt-out flag is proposed.
Background — Problem Statement
Claude Code relies heavily on Prompt Caching in long-running agent environments. When a context exceeding 100k tokens is retransmitted on every turn, guaranteed cache hits confine costs to cache_read while minimizing cache_creation (the write premium). Per Anthropic's official documentation, the one-hour cache carries a 2× write premium (relative to standard input token cost) on writes and 0.1× on reads. As long as the structure that sustains 95–99% hit rates in normal operating sessions holds, the majority of token costs are eliminated at equivalent context sizes.
This structure breaks down under --continue/--resume. Since deferred_tools_delta was introduced in v2.1.69, session resume has been observed to reorder tool-result blocks and attachment blocks during rollout replay, which alters the cache key on the server side. GitHub issue #42338 was the original report of this problem; it was closed following patches in v2.1.90/91/92. Extensive cache analysis by community contributor ArkNill (Reference 5), a breaking-attachment-set identification gist by simpolism (Reference 6), and an environment reproduction report by taekim34 formed the primary evidence base for that issue.
Nevertheless, the same behavior is reproduced on v2.1.116. The distinction from prior reports is that the TTL-matched control pair methodology eliminates TTL expiry as a causal factor and isolates the session-resume event itself as the direct cause of cache destruction. This methodology advances beyond mere observation of "cache broke" by treating the resume event as an independent variable through controlled comparison.
Methodology — TTL-matched Control Pairs
Prior reports (#42338) used a single-case token-count approach ("470–512k cache_creation observed on 2–12-second exit→resume"). This approach does not easily rule out the counterargument: "perhaps the 5-minute TTL happened to expire." Even acknowledging the fact (Observation 1) that Claude Code internally uses a one-hour TTL rather than a five-minute TTL, the counterargument "the one-hour TTL may have happened to expire" remains available.
To eliminate the TTL factor entirely, per-turn usage data was collected from 14 jsonl session files. Duplicate counts were removed using requestId-based deduplication, after which two comparison sets were constructed:
- Baseline: consecutive turns within the same jsonl, idle gaps of 3–28 minutes
- Test: end of one jsonl → start of the next (session resume), at identical idle gaps
If a hit-rate difference is observed between the two sets, TTL cannot explain it, because the idle gaps are identical. The only controlled variable distinguishing the two cases is "whether a session boundary was crossed." Any difference is therefore attributable to the session-resume event itself. This design supports a causal claim through controlled comparison rather than simple observation.
Observations
Observation 1 — Claude Code's Internal Cache TTL
ephemeral_5m_input_tokens is zero across all intervals. All cache_creation flows into ephemeral_1h_input_tokens. Claude Code internally uses a one-hour TTL cache. Accordingly, intra-session idle periods within ~60 minutes should sustain the cache.
Observation 2 — Baseline (mid-session, same jsonl)
| Idle gap | cache_creation | cache_read | Hit rate |
|---|---|---|---|
| 3.3 min | 389 | 155,492 | 99.8% |
| 3.3 min | 376 | 80,990 | 99.5% |
| 3.7 min | 2,123 | 159,700 | 98.7% |
| 4.0 min | 3,209 | 107,456 | 97.1% |
| 4.0 min | 5,112 | 173,721 | 97.1% |
| 4.3 min | 2,015 | 220,492 | 99.1% |
| 4.8 min | 840 | 78,457 | 98.9% |
| 5.7 min | 264 | 82,612 | 99.7% |
| 5.8 min | 3,302 | 85,549 | 96.3% |
| 5.9 min | 1,283 | 240,818 | 99.5% |
| 6.0 min | 2,385 | 161,823 | 98.5% |
| 7.6 min | 4,945 | 114,763 | 95.9% |
| 27.4 min | 1,559 | 150,593 | 99.0% |
Average approximately 98% hit rate. The one-hour cache operates normally while the session process remains alive.
Observation 3 — Cross-session Transitions (same agent, new jsonl file)
| Transition | Idle gap | Next cc | Next cr | Hit rate | Verdict |
|---|---|---|---|---|---|
| f3f5c819 → e17536e7 | 28.2 min | 40,260 | 0 | 0.0% | Unexpected miss |
| e17536e7 → 096d96fa | 4.1 min | 23,849 | 16,702 | 41.2% | Unexpected miss |
| c0eb34ad → b6e07fac | 2423.6 min (>1h TTL) | 20,061 | 0 | 0.0% | TTL expiry, expected |
| b6e07fac → f3f5c819 | 2059.3 min (>1h TTL) | 41,766 | 0 | 0.0% | TTL expiry, expected |
TTL-matched Pairs — Direct Delta
| Idle gap | Intra-session (baseline) | Session resume (test) | Delta |
|---|---|---|---|
| ~4 min | 97–99% hit (cr > 100k, cc ≈ 3k) | 41% hit (cr 16.7k, cc 23.8k) | −56 pp |
| ~28 min | 99% hit (cr 150k, cc 1.5k) | 0% hit (cr 0, cc 40.3k) | −99 pp |
Identical idle gaps, identical agent workload, identical codebase state. The only difference is whether a session boundary was crossed. This delta supports the conclusion that the session-resume event itself destroys the cache.
Root-Cause Alignment — deferred_tools_delta Reordering Hypothesis
Issue #42338 identified deferred_tools_delta, introduced in v2.1.69, as the proximate cause. During rollout replay on session resume, this feature reorders tool-result blocks and attachment blocks. When block ordering changes, the byte prefix of the request changes; because the Anthropic server looks up cache keys by prefix match, a key miss results.
The observations are consistent with this hypothesis. The 4.1-minute resume case exhibits partial prefix matching (16,702 tokens cache_read + 23,849 tokens cache_creation). This matches the pattern of "prefix matching through block N, then the cache key breaking at the reordered point." Pure TTL expiry would produce a 0% hit with no partial match; the existence of partial matching is itself internal evidence that refutes the TTL hypothesis.
The 28.2-minute case shows a 0% hit rate, suggesting that reordering occurred at an earlier block in the prefix. This aligns with the positional-shift pattern of skill_listing, todo_reminders, and nested_memory injections catalogued in simpolism's gist (Reference 6). Whether a given resume produces "partial matching (41% hit rate)" or "total miss (0%)" is determined by which block is reordered first on resume.
Whether the v2.1.90–92 patches addressed this entire code path remains unclear. In particular, the block-composition patterns arising from combinations of custom agents, multiple MCP servers, skills, and hooks may fall outside the patch scope validated against simpler environments.
Impact — Measured Cost
Usage aggregated from a single 11-hour 52-minute session (197 unique turns):
| Item | Value |
|---|---|
| cache_read total | 60,869,087 tokens |
| cache_creation total | 2,017,110 tokens (all one-hour, 2× write premium) |
| input_tokens total | 9,923 tokens |
| output_tokens total | 446,687 tokens |
| cc / output ratio | 4.52× |
The cc/output ratio in a normally-cached session is typically observed at 1–2×. This session's 4.52× is anomalously high. The session includes two legitimate expiry transitions (>1h) where TTL was genuinely exceeded; those are expected costs. However, the f3f5c819 → e17536e7 transition (28.2 minutes, 40,260 cc) produced a total cache miss despite being a resume within TTL. Those 40,260 tokens of cache_creation would not have been consumed had the resume preserved the prefix.
The cache_creation for the one-hour cache carries a 2× cost relative to standard input. When unnecessary writes of the ~40,260-token magnitude are repeated on every session resume, accumulated costs reach a perceptible level in agent operations that include frequent automatic restarts.
For subscription-plan users, this cost is drawn directly from the rolling five-hour quota. cache_creation exhausts quota faster than cache_read. In overnight agent auto-restart patterns, resume-induced cache_creation can occupy a substantial portion of the quota, reducing the quota available for actual task processing.
Recommended Responses
Confirm and track issue status: GitHub issue #42338 was closed following the v2.1.90–92 patches, but reproduction is confirmed on v2.1.116. A new report has been submitted as issue #51764. Users experiencing the same behavior are advised to add their reproduction environment and version to that issue. Accumulated reproduction cases can influence patch prioritization.
Confirm patch scope: The official release notes for the v2.1.90–92 patches do not specify the scope of the fix in detail. The block-composition patterns arising in environments that combine custom agents + MCP servers + skills + hooks may have been omitted from the patch scope validated against simpler environments. Users operating that combination are advised either to verify via direct reproduction testing or to add their environment details to the issue as a request for confirmation from Anthropic.
Request --no-deferred-tools-delta opt-out flag: Given that the deferred_tools_delta feature was introduced as a performance improvement, providing an opt-out flag is more practical than removing the feature entirely. For harness users or long-running agent operators, cache stability may take precedence over full feature availability. A request for a --no-deferred-tools-delta or --strict-prefix flag is included in the Asks section of issue #51764.
Reproduction Steps
- Start a Claude Code session and accumulate context to ≥100k tokens, including CLAUDE.md, tool definitions, and prior conversation history.
- Exit the session with
/exitand resume within 1–5 minutes usingclaude --continue. The resume interval must be kept within the one-hour TTL. - On the first assistant turn of the resumed session, open
~/.claude/projects/<slug>/<sid>.jsonland check thecache_read_input_tokensandcache_creation_input_tokensfields in theusageobject for the correspondingrequestId. If a monitoring proxy is available, per-turn usage queries are also feasible. - Collect the control: within the same session, without using
/exit, execute the next turn after an identical idle gap. This must be a consecutive turn within the same jsonl file. The baseline should show ≥95%cache_read. - Compare hit rates between the two cases. If the post-resume hit rate is substantially lower (particularly in the 0–41% range), the bug is considered reproduced. Adding the reproduction environment (version, list of active features) to issue #51764 benefits the community.
Conclusion
The deferred_tools_delta-based cache-destruction bug is reproduced on v2.1.116. The TTL-matched control-pair methodology structurally supports the conclusion that the session-resume event itself is the cause of cache destruction, by removing TTL expiry as a variable. The hit-rate deltas of −56 pp at ~4 minutes and −99 pp at ~28 minutes represent functional cache invalidation, not mere performance degradation.
The patterns most materially affected by this issue are as follows. First, long-running agents where session restarts due to /compact or errors are frequent. Second, monitoring agents with overnight auto-restart structures. Third, complex harness environments maintaining contexts of 100k tokens or more.
The fact that Anthropic attempted a patch in v2.1.90–92 confirms the issue is real. Reproduction on v2.1.116 suggests either that the fix was incomplete or that a code path that re-triggers the behavior under certain environment combinations remains. It is expected that issue #51764 will be maintained in a trackable state until this problem is resolved.
References
- GitHub Issue #42338 — Original report (closed, locked): https://github.com/anthropics/claude-code/issues/42338
- GitHub Issue #34629 —
--print --resumeregression since v2.1.69: https://github.com/anthropics/claude-code/issues/34629 - GitHub Issue #46829 — Silent 1h→5m default TTL regression: https://github.com/anthropics/claude-code/issues/46829
- GitHub Issue #51764 — This report (open): https://github.com/anthropics/claude-code/issues/51764
- ArkNill — Claude Code cache analysis: https://github.com/ArkNill/claude-code-cache-analysis
- simpolism — Breaking attachment set analysis gist: https://gist.github.com/simpolism/302621e661f462f3e78684d96bf307ba
- Anthropic Prompt Caching official documentation: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
댓글
댓글 쓰기