"Claude and Codex Session Resume — Who Retransmits What?"

4월 21, 2026

Is Session Resume Really Free?

When resuming a previous session with --continue or --resume, it is tempting to assume the server holds the conversation in memory and picks up where it left off at low cost. This assumption applies equally to Claude Code and OpenAI Codex CLI. The UX of a session "continuing" feels like server-side state persistence, but the underlying mechanics are different.

The short answer: neither system is server-stateful. Both tools replay locally stored conversation history and retransmit it to the server on each resume. The difference lies in the prompt cache policy operating on top of that replay, and in how stably the prefix is reconstructed at replay time. In Claude Code v2.1.116, a bug has been observed in which the prefix is not reassembled stably on resume.

Anthropic Prompt Cache Official Spec

Anthropic's prompt caching is opt-in. The default TTL is 5 minutes; a 1-hour TTL option is available at additional cost.

The pricing structure is as follows:

cache_write (5 min): 1.25× baseline input token cost
cache_write (1 hour): 2× baseline input token cost
cache_read: 0.1× baseline input token cost (90% discount)

High hit rates make this very economical, but the write cost is the key variable. cache_read is cheap, but cache_write at the 1-hour TTL costs twice the baseline input rate. For cache_write to be offset by subsequent cache_read savings, repeated requests must arrive with the same prefix within the TTL window.

Cache invalidation is hierarchical: tools → system → messages in order; if an upper block changes, everything below it is invalidated. The lookback window is 20 blocks, and when a cache entry is actually used, its TTL is automatically refreshed at no additional cost.

Source: Anthropic Prompt Caching Official Documentation

OpenAI Automatic Prompt Cache Spec

OpenAI's cache requires no opt-in. It is applied automatically to requests of 1,024 tokens or more, with a 50% discount on cache hits. The standard TTL is 5–10 minutes, and during off-peak hours it may be retained for up to 1 hour. No manual cache-point marking is required; the system operates via automatic prefix matching.

There are two key differences from Anthropic. First, there is no cache_write premium. No additional cost is incurred when a cache entry is constructed, so a low hit rate does not result in a loss. Second, the discount rate is 50%, lower than Anthropic's 90% (on cache_read). In summary, OpenAI offers a conservative discount with no risk, while Anthropic offers a larger discount but requires paying the write cost upfront.

Codex CLI Resume Behavior — Correcting a Common Misconception

A common belief holds that "Codex resumes because the server remembers the conversation." This misconception originates from the fact that OpenAI's Responses API supports server-side conversation state via store: true + previous_response_id.

However, direct inspection of the codex-rs open-source code reveals different behavior:

protocol.rs:2525 — The ResumedHistory { conversation_id, history: Vec<RolloutItem>, rollout_path } struct replays the entire local rollout and spawns it as history. There is no previous_response_id field.
thread_manager.rs:531 — The resume_thread_from_rollout function calls RolloutRecorder::get_rollout_history(path) to read a local jsonl file and reconstruct the full conversation.
client.rs:998 — previous_response_id is reused only within a WebSocket session inside the same process. When the process is restarted, the connection is severed.

In other words, Codex CLI does not leverage Responses API server state when resuming after a process restart. It operates on a local rollout replay + server prefix caching structure, fundamentally identical to Claude Code.

This structure has an important implication. The number of tokens transmitted on resume grows proportionally with the length of the prior conversation, and if the prefix fails to hit the server cache, the full retransmission cost is borne in full. The belief that "resume is free because the server remembers" does not hold for either system.

Evidence — TTL-matched Measurement (Claude Code v2.1.116)

A comparative measurement was conducted to assess the difference in cache hit rate when using --resume in Claude Code, under matched idle-gap conditions. Same-session consecutive turns (baseline) and cross-session resumed turns (test) were compared under the same idle gap to eliminate the TTL variable mathematically.

Sample: 14 jsonl files, 197 turns, 11 hours 52 minutes of session.

Condition	4-min gap	28-min gap
Same session (baseline)	97–99% hit	99% hit
Cross session (resume)	41.2% hit	0% hit
Δ	−56 pp	−99 pp

Under identical TTL conditions, the cache hit rate dropped dramatically when a session was resumed. Inspecting the cache types used internally by Claude Code reveals that only ephemeral_1h (1-hour cache) is in use; ephemeral_5m registers at 0. The fact that hit rates are low despite ample TTL headroom indicates this is a prefix reassembly issue, not a TTL issue.

The identified cause is the deferred_tools_delta and attachment reordering introduced in v2.1.69. On resume, the injection order of skill_listing, todo_reminders, and nested_memory changes, breaking the prefix hash. This analysis was confirmed via simpolism's gist.

Impact — Actual Quota Consumption

Figures observed in the session above (197 turns, 11 hours 52 minutes):

cache_creation_input_tokens / output_tokens ratio (cc/output ratio): 4.52× (normal range: 1–2×)
Within resume segments falling inside the 1-hour TTL window: 40,260 tokens of cache_creation occurred. This is cost that would have been avoided had the prefix been preserved.
In observed cases, approximately 20% of Claude Pro/Max subscribers' 5-hour rolling quota was consumed by resume-triggered cache regeneration.

cache_creation at the 1-hour TTL costs 2× the baseline input rate. Compared to cache_read at 0.1×, this is a 20× more expensive path. When the cache fails to hit and must be reconstructed on every turn, caching — which was expected to reduce costs — ends up increasing them.

Response

GitHub Issue #51764 (reproduced on v2.1.116) has been filed, continuing the tracking thread from the previously closed #42338. Some code paths patched in v2.1.90–92 may be covered, but reproduction has been confirmed in v2.1.116 on the code path combining custom agents + MCP + skills + hooks.

How to verify directly: Aggregate cache_read_input_tokens and cache_creation_input_tokens per turn from ~/.claude/projects/<slug>/*.jsonl. If the cc/output ratio begins to exceed 2×, there is a high probability that prefix reassembly is being repeated. ArkNill's claude-code-cache-analysis script makes per-session aggregation straightforward. taekim34 documented a detailed reproduction path in the issue thread via an environment reproduction comment, covering the combination of a custom CLAUDE.md with a multi-stage hook setup.

Practical recommendation: In auto-restart watchdog patterns, consider switching from automatic claude --continue resume to a manual session start after stop. This pays only the CLAUDE.md reload cost and builds the prefix stably from scratch. A predictable cost structure is preferable to repeatedly replaying an unstable prefix.

Summary

Summarizing the resume structure of the two systems:

Claude Code: local jsonl replay + server prefix caching (opt-in, hierarchical invalidation)
Codex CLI: local rollout replay + server prefix caching (automatic, no write premium)

There is no system in which the server maintains state and resumes "for free." Resume always carries the cost of retransmitting the entire conversation.

Cost structure difference: Anthropic offers a large discount at cache_read 0.1×, but charges a write premium of up to 2×. OpenAI offers a 50% discount with no write premium. Which is cheaper depends on the workload and TTL hit rate; measuring actual hit rates takes precedence over simple comparison.

The resume bug in Claude Code v2.1.116 is a real problem in which cache_write cost is incurred redundantly and unnecessarily. Even under conditions where TTL is sufficient, prefix reassembly failure causes cache_write to repeat instead of cache_read. The issue is currently tracked under Issue #51764, and the direct measurement method described above can be used to verify reproduction in your own environment.

References

Anthropic Prompt Caching documentation: https://docs.claude.com/en/docs/build-with-claude/prompt-caching
Codex source: https://github.com/openai/codex
Issue #51764: https://github.com/anthropics/claude-code/issues/51764
Issue #42338 (closed/locked): https://github.com/anthropics/claude-code/issues/42338
Issue #34629, #46829 (related)
ArkNill cache analysis: https://github.com/ArkNill/claude-code-cache-analysis
simpolism gist: https://gist.github.com/simpolism/302621e661f462f3e78684d96bf307ba
taekim34 environment reproduction comment: https://github.com/anthropics/claude-code/issues/42338#issuecomment-4174599756

이 블로그 검색

MaJu Tech Notes