"Claude Code --continue/--resume Prompt Cache Invalidation

"Claude Code --continue/--resume Prompt Cache Invalidation — Reproduced on v2.1.116"

4월 21, 2026

Cache Destruction by Session-Resume Events, Isolated via TTL-matched Control Pairs

핵심 요약

Symptom: Invoking claude --continue or --resume invalidates the prompt cache even when within the one-hour TTL. deferred_tools_delta, introduced in v2.1.69, is identified as the root cause. Reproduction is confirmed on v2.1.116 despite patches released in v2.1.90–92.
Measurement method: A TTL-matched control-pair methodology was applied — comparing cache hit rates between consecutive intra-session turns (baseline) and the first turn immediately after a session resume (test), both at identical idle gaps, thereby eliminating TTL expiry as a confounding variable. Results: ~4-minute idle gap −56 pp; ~28-minute idle gap −99 pp.
Response: GitHub issue #51764 is being tracked. Confirmation of the patch scope for the custom-agent + MCP + skill combination path is requested. Introduction of a --no-deferred-tools-delta opt-out flag is proposed.

Background — Problem Statement

Claude Code relies heavily on Prompt Caching in long-running agent environments. When a context exceeding 100k tokens is retransmitted on every turn, guaranteed cache hits confine costs to cache_read while minimizing cache_creation (the write premium). Per Anthropic's official documentation, the one-hour cache carries a 2× write premium (relative to standard input token cost) on writes and 0.1× on reads. As long as the structure that sustains 95–99% hit rates in normal operating sessions holds, the majority of token costs are eliminated at equivalent context sizes.

This structure breaks down under --continue/--resume. Since deferred_tools_delta was introduced in v2.1.69, session resume has been observed to reorder tool-result blocks and attachment blocks during rollout replay, which alters the cache key on the server side. GitHub issue #42338 was the original report of this problem; it was closed following patches in v2.1.90/91/92. Extensive cache analysis by community contributor ArkNill (Reference 5), a breaking-attachment-set identification gist by simpolism (Reference 6), and an environment reproduction report by taekim34 formed the primary evidence base for that issue.

Nevertheless, the same behavior is reproduced on v2.1.116. The distinction from prior reports is that the TTL-matched control pair methodology eliminates TTL expiry as a causal factor and isolates the session-resume event itself as the direct cause of cache destruction. This methodology advances beyond mere observation of "cache broke" by treating the resume event as an independent variable through controlled comparison.

Methodology — TTL-matched Control Pairs

Prior reports (#42338) used a single-case token-count approach ("470–512k cache_creation observed on 2–12-second exit→resume"). This approach does not easily rule out the counterargument: "perhaps the 5-minute TTL happened to expire." Even acknowledging the fact (Observation 1) that Claude Code internally uses a one-hour TTL rather than a five-minute TTL, the counterargument "the one-hour TTL may have happened to expire" remains available.

To eliminate the TTL factor entirely, per-turn usage data was collected from 14 jsonl session files. Duplicate counts were removed using requestId-based deduplication, after which two comparison sets were constructed:

Baseline: consecutive turns within the same jsonl, idle gaps of 3–28 minutes
Test: end of one jsonl → start of the next (session resume), at identical idle gaps

If a hit-rate difference is observed between the two sets, TTL cannot explain it, because the idle gaps are identical. The only controlled variable distinguishing the two cases is "whether a session boundary was crossed." Any difference is therefore attributable to the session-resume event itself. This design supports a causal claim through controlled comparison rather than simple observation.

Observations

Observation 1 — Claude Code's Internal Cache TTL

ephemeral_5m_input_tokens is zero across all intervals. All cache_creation flows into ephemeral_1h_input_tokens. Claude Code internally uses a one-hour TTL cache. Accordingly, intra-session idle periods within ~60 minutes should sustain the cache.

Observation 2 — Baseline (mid-session, same jsonl)

Idle gap	cache_creation	cache_read	Hit rate
3.3 min	389	155,492	99.8%
3.3 min	376	80,990	99.5%
3.7 min	2,123	159,700	98.7%
4.0 min	3,209	107,456	97.1%
4.0 min	5,112	173,721	97.1%
4.3 min	2,015	220,492	99.1%
4.8 min	840	78,457	98.9%
5.7 min	264	82,612	99.7%
5.8 min	3,302	85,549	96.3%
5.9 min	1,283	240,818	99.5%
6.0 min	2,385	161,823	98.5%
7.6 min	4,945	114,763	95.9%
27.4 min	1,559	150,593	99.0%

Average approximately 98% hit rate. The one-hour cache operates normally while the session process remains alive.

Observation 3 — Cross-session Transitions (same agent, new jsonl file)

Transition	Idle gap	Next cc	Next cr	Hit rate	Verdict
f3f5c819 → e17536e7	28.2 min	40,260	0	0.0%	Unexpected miss
e17536e7 → 096d96fa	4.1 min	23,849	16,702	41.2%	Unexpected miss
c0eb34ad → b6e07fac	2423.6 min (>1h TTL)	20,061	0	0.0%	TTL expiry, expected
b6e07fac → f3f5c819	2059.3 min (>1h TTL)	41,766	0	0.0%	TTL expiry, expected

TTL-matched Pairs — Direct Delta

Idle gap	Intra-session (baseline)	Session resume (test)	Delta
~4 min	97–99% hit (cr > 100k, cc ≈ 3k)	41% hit (cr 16.7k, cc 23.8k)	−56 pp
~28 min	99% hit (cr 150k, cc 1.5k)	0% hit (cr 0, cc 40.3k)	−99 pp

Identical idle gaps, identical agent workload, identical codebase state. The only difference is whether a session boundary was crossed. This delta supports the conclusion that the session-resume event itself destroys the cache.

$영향 — 실제 비용 측정$

Root-Cause Alignment — `deferred_tools_delta` Reordering Hypothesis

Issue #42338 identified deferred_tools_delta, introduced in v2.1.69, as the proximate cause. During rollout replay on session resume, this feature reorders tool-result blocks and attachment blocks. When block ordering changes, the byte prefix of the request changes; because the Anthropic server looks up cache keys by prefix match, a key miss results.

The observations are consistent with this hypothesis. The 4.1-minute resume case exhibits partial prefix matching (16,702 tokens cache_read + 23,849 tokens cache_creation). This matches the pattern of "prefix matching through block N, then the cache key breaking at the reordered point." Pure TTL expiry would produce a 0% hit with no partial match; the existence of partial matching is itself internal evidence that refutes the TTL hypothesis.

The 28.2-minute case shows a 0% hit rate, suggesting that reordering occurred at an earlier block in the prefix. This aligns with the positional-shift pattern of skill_listing, todo_reminders, and nested_memory injections catalogued in simpolism's gist (Reference 6). Whether a given resume produces "partial matching (41% hit rate)" or "total miss (0%)" is determined by which block is reordered first on resume.

Whether the v2.1.90–92 patches addressed this entire code path remains unclear. In particular, the block-composition patterns arising from combinations of custom agents, multiple MCP servers, skills, and hooks may fall outside the patch scope validated against simpler environments.

Impact — Measured Cost

Usage aggregated from a single 11-hour 52-minute session (197 unique turns):

Item	Value
cache_read total	60,869,087 tokens
cache_creation total	2,017,110 tokens (all one-hour, 2× write premium)
input_tokens total	9,923 tokens
output_tokens total	446,687 tokens
cc / output ratio	4.52×

The cc/output ratio in a normally-cached session is typically observed at 1–2×. This session's 4.52× is anomalously high. The session includes two legitimate expiry transitions (>1h) where TTL was genuinely exceeded; those are expected costs. However, the f3f5c819 → e17536e7 transition (28.2 minutes, 40,260 cc) produced a total cache miss despite being a resume within TTL. Those 40,260 tokens of cache_creation would not have been consumed had the resume preserved the prefix.

The cache_creation for the one-hour cache carries a 2× cost relative to standard input. When unnecessary writes of the ~40,260-token magnitude are repeated on every session resume, accumulated costs reach a perceptible level in agent operations that include frequent automatic restarts.

For subscription-plan users, this cost is drawn directly from the rolling five-hour quota. cache_creation exhausts quota faster than cache_read. In overnight agent auto-restart patterns, resume-induced cache_creation can occupy a substantial portion of the quota, reducing the quota available for actual task processing.

Recommended Responses

Confirm and track issue status: GitHub issue #42338 was closed following the v2.1.90–92 patches, but reproduction is confirmed on v2.1.116. A new report has been submitted as issue #51764. Users experiencing the same behavior are advised to add their reproduction environment and version to that issue. Accumulated reproduction cases can influence patch prioritization.

Confirm patch scope: The official release notes for the v2.1.90–92 patches do not specify the scope of the fix in detail. The block-composition patterns arising in environments that combine custom agents + MCP servers + skills + hooks may have been omitted from the patch scope validated against simpler environments. Users operating that combination are advised either to verify via direct reproduction testing or to add their environment details to the issue as a request for confirmation from Anthropic.

Request --no-deferred-tools-delta opt-out flag: Given that the deferred_tools_delta feature was introduced as a performance improvement, providing an opt-out flag is more practical than removing the feature entirely. For harness users or long-running agent operators, cache stability may take precedence over full feature availability. A request for a --no-deferred-tools-delta or --strict-prefix flag is included in the Asks section of issue #51764.

Reproduction Steps

Start a Claude Code session and accumulate context to ≥100k tokens, including CLAUDE.md, tool definitions, and prior conversation history.
Exit the session with /exit and resume within 1–5 minutes using claude --continue. The resume interval must be kept within the one-hour TTL.
On the first assistant turn of the resumed session, open ~/.claude/projects/<slug>/<sid>.jsonl and check the cache_read_input_tokens and cache_creation_input_tokens fields in the usage object for the corresponding requestId. If a monitoring proxy is available, per-turn usage queries are also feasible.
Collect the control: within the same session, without using /exit, execute the next turn after an identical idle gap. This must be a consecutive turn within the same jsonl file. The baseline should show ≥95% cache_read.
Compare hit rates between the two cases. If the post-resume hit rate is substantially lower (particularly in the 0–41% range), the bug is considered reproduced. Adding the reproduction environment (version, list of active features) to issue #51764 benefits the community.

Conclusion

The deferred_tools_delta-based cache-destruction bug is reproduced on v2.1.116. The TTL-matched control-pair methodology structurally supports the conclusion that the session-resume event itself is the cause of cache destruction, by removing TTL expiry as a variable. The hit-rate deltas of −56 pp at ~4 minutes and −99 pp at ~28 minutes represent functional cache invalidation, not mere performance degradation.

The patterns most materially affected by this issue are as follows. First, long-running agents where session restarts due to /compact or errors are frequent. Second, monitoring agents with overnight auto-restart structures. Third, complex harness environments maintaining contexts of 100k tokens or more.

The fact that Anthropic attempted a patch in v2.1.90–92 confirms the issue is real. Reproduction on v2.1.116 suggests either that the fix was incomplete or that a code path that re-triggers the behavior under certain environment combinations remains. It is expected that issue #51764 will be maintained in a trackable state until this problem is resolved.

References

GitHub Issue #42338 — Original report (closed, locked): https://github.com/anthropics/claude-code/issues/42338
GitHub Issue #34629 — --print --resume regression since v2.1.69: https://github.com/anthropics/claude-code/issues/34629
GitHub Issue #46829 — Silent 1h→5m default TTL regression: https://github.com/anthropics/claude-code/issues/46829
GitHub Issue #51764 — This report (open): https://github.com/anthropics/claude-code/issues/51764
ArkNill — Claude Code cache analysis: https://github.com/ArkNill/claude-code-cache-analysis
simpolism — Breaking attachment set analysis gist: https://gist.github.com/simpolism/302621e661f462f3e78684d96bf307ba
Anthropic Prompt Caching official documentation: https://platform.claude.com/docs/en/build-with-claude/prompt-caching

이 블로그 검색

MaJu Tech Notes