"Claude Code Token & Cache Cost Breakdown 2026 — The 1.25× vs 2× Multiplier Trap"
Official pricing, cache minimums, /usage tracking, and 10 ways to cut tokens
핵심 요약
- Audience: Claude Code users who were surprised by a monthly bill, or teams estimating budget before rollout.
- What you'll get: Current (April 2026) per-model pricing, cache multipliers (1.25× / 2× / 0.1×), per-model minimum cacheable tokens,
/usagefor live tracking, the official 10-item reduction playbook, and the hidden costs (extended thinking, agent teams, background). - Prerequisite: Claude Code installed (install guide). Pro/Max subscribers pay a flat rate — the numbers below don't apply to your bill, only to usage-bar planning.
1. Two billing axes — tokens and cache
Claude Code charges based on Anthropic API token consumption. Cost comes from two axes.
- Input / output tokens — every turn bills separately for the prompt (input) and the response (output).
- Prompt cache — caching the system prompt, CLAUDE.md, or long context makes retransmissions of the same content cost 0.1×. But cache writes are more expensive than standard input (1.25× or 2×).
Without grasping these two axes, you can't explain "why did the second response cost less than the first?"
Pro/Max/Team/Enterprise subscribers pay a flat monthly rate — per-token costs don't map to billing. What matters is the usage-bar limit per plan. The table below is for Anthropic Console pay-as-you-go.
2. Official pricing (April 2026, per million tokens)
Per Anthropic's pricing page.
| Model | Input | 5-min write | 1-hour write | Cache read | Output |
|---|---|---|---|---|---|
| Opus 4.7 | $5 | $6.25 | $10 | $0.50 | $25 |
| Opus 4.6 | $5 | $6.25 | $10 | $0.50 | $25 |
| Opus 4.5 | $5 | $6.25 | $10 | $0.50 | $25 |
| Sonnet 4.6 | $3 | $3.75 | $6 | $0.30 | $15 |
| Sonnet 4.5 | $3 | $3.75 | $6 | $0.30 | $15 |
| Haiku 4.5 | $1 | $1.25 | $2 | $0.10 | $5 |
| Haiku 3.5 | $0.80 | $1.00 | $1.60 | $0.08 | $4 |
Observations: - Output is 5× input. Long responses drive cost fastest. - Cache writes are a one-time premium. Reads that follow are 0.1×. - Opus is 5× Sonnet. Model choice is a major lever.
3. Cache multiplier structure — 1.25× vs 2× vs 0.1×
Official rules:
| Operation | Multiplier | Meaning |
|---|---|---|
| 5-min TTL write (default ephemeral) | 1.25× input | Wins if reused within 5 minutes |
| 1-hour TTL write (extended) | 2× input | Wins for 5-min to 1-hour gaps |
| Cache read (hit/refresh) | 0.1× input | 90% discount on retransmitting same content |
Example — Sonnet 4.6, a 50,000-token context reused 10 times in one session:
- No cache: 50,000 × 10 × $3/MTok = $1.50 (input only)
- 5-min cache: first write 50,000 × $3.75/MTok + reads 50,000 × 9 × $0.30/MTok = $0.19 + $0.135 = $0.325 (~78% savings)
- 1-hour cache: first write 50,000 × $6/MTok + reads 50,000 × 9 × $0.30/MTok = $0.30 + $0.135 = $0.435 (~71% savings)
If you reuse within 5 minutes, 5-min TTL always wins. But idle over 5 minutes invalidates the cache and you pay full write cost again — 1-hour wins in that regime.
3.1 What TTL does Claude Code use?
Caveat: The official
/costspage doesn't explicitly specify Claude Code's default TTL. Observationally, Claude Code targets cross-session resume use, behaving as a 1-hour TTL-only client. Budget with the 2× multiplier to be safe.
So Claude Code's upside is "cache hits survive long sessions". The downside is "cache write premium is 2×, not 1.25×".
4. Per-model minimum cacheable tokens
If the cacheable portion of a request is below the minimum, caching silently doesn't apply (no error, just bypassed).
| Model | Minimum cacheable tokens |
|---|---|
| Opus 4.7 / 4.6 / 4.5 | 4,096 |
| Sonnet 4.6 | 2,048 |
| Sonnet 4.5 | 1,024 |
| Haiku 4.5 | 4,096 |
| Haiku 3.5 | 2,048 |
Using Opus on a project with a short CLAUDE.md often misses the 4K threshold. In that regime Sonnet is more economical.
4.1 Cache breakpoint limit
Max 4 cache_control breakpoints per request. You decide where to draw cache boundaries (system prompt / tool definitions / long files). As a user you don't configure this directly — Claude Code handles it internally.
5. Track live cost with /usage
Run /usage (aliases /cost, /stats) in-session to see tokens, cost, and code change counts.
Official example:
Total cost: $0.55
Total duration (API): 6m 19.7s
Total duration (wall): 6h 33m 10.2s
Total code changes: 0 lines added, 0 lines removed
The dollar figure is locally estimated from token counts. Authoritative billing is on the Anthropic Console Usage page.
5.1 Show on status line
/statusline can keep context/tokens/cost in the terminal bar continuously — recommended.
/statusline
Interactive setup; can auto-configure from your shell prompt.
5.2 Enterprise averages (reference)
Published numbers: - Average ~$13/developer/active day - Monthly $150–250/developer - 90% of users stay under $30/active day
These are enterprise API deployments. Individual Pro/Max experience varies significantly.
6. The 10-item official reduction playbook
6.1 /clear between tasks
When switching to unrelated work, /clear. Stale context wastes tokens on every subsequent message.
6.2 /compact with focus instructions
/compact Focus on code samples and API usage
Telling Claude what to preserve avoids losing key context.
Or set a default in CLAUDE.md:
When you are using compact, please focus on test output and code changes
6.3 Model choice — Sonnet default, Opus when complex
Sonnet handles most coding well and costs 5× less than Opus. Reserve Opus for architectural decisions or multi-step reasoning.
6.4 Cut MCP server overhead
MCP tool definitions are deferred by default — only tool names enter context until Claude invokes a specific tool. Use /context to see what's consuming space. Disable unused servers with /mcp.
Prefer CLI tools (gh, aws, gcloud, sentry-cli) — they add no per-tool listing.
6.5 Code intelligence plugins
For typed languages, install a code intelligence plugin. A "go to definition" call replaces grep + multiple file reads.
6.6 Preprocess with hooks
A hook can grep ERROR from a 10,000-line log before Claude ever sees it. Official example — PreToolUse hook filtering test output to failures only:
.claude/settings.json:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [{"type": "command", "command": "~/.claude/hooks/filter-test-output.sh"}]
}
]
}
}
filter-test-output.sh:
#!/bin/bash
input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command')
if [[ "$cmd" =~ ^(npm test|pytest|go test) ]]; then
filtered_cmd="$cmd 2>&1 | grep -A 5 -E '(FAIL|ERROR|error:)' | head -100"
echo "{\"hookSpecificOutput\":{\"hookEventName\":\"PreToolUse\",\"permissionDecision\":\"allow\",\"updatedInput\":{\"command\":\"$filtered_cmd\"}}}"
else
echo "{}"
fi
Tens of thousands of tokens become hundreds. Full hook mechanics: hooks complete guide.
6.7 Move CLAUDE.md into skills
CLAUDE.md is loaded every session. Move detailed workflow instructions (PR reviews, DB migrations) to skills — they load on demand. Keep CLAUDE.md under 200 lines.
6.8 Tune extended thinking
Extended thinking is on by default. Thinking tokens bill as output tokens — default budget can be tens of thousands per request. For simple tasks, this is waste.
/effort low
MAX_THINKING_TOKENS=8000 claude
6.9 Delegate verbose work to subagents
Test runs, documentation fetches, and log parsing consume significant context. Delegate these to subagents — verbose output stays in the subagent's context, only a summary returns.
6.10 Write specific prompts
"Improve this codebase" triggers broad scanning. "Add input validation to the login function in auth.ts" finishes with minimal file reads.
7. Three hidden costs
7.1 Background
Claude Code uses some tokens while idle:
- Conversation summarization jobs (for claude --resume)
- Status-checking commands
Official: under $0.04 per session. Small, but multiplies across many open sessions.
7.2 Agent teams
A feature-flag capability (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1) that spawns multiple instances. Each teammate runs its own context window, so plan-mode costs are ~7× a standard session.
Mitigation: - Use Sonnet for teammates (never Opus) - Keep teams small - Keep spawn prompts focused - Clean up when done
7.3 Default extended thinking on Opus
Complex work on Opus 4.7 can push thinking to hundreds of thousands of tokens. Make checking /usage a habit.
8. Team rollout — TPM/RPM recommendations
Official per-user recommendations by org size (Anthropic API plans):
| Team size | TPM/user | RPM/user |
|---|---|---|
| 1-5 | 200k-300k | 5-7 |
| 5-20 | 100k-150k | 2.5-3.5 |
| 20-50 | 50k-75k | 1.25-1.75 |
| 50-100 | 25k-35k | 0.62-0.87 |
| 100-500 | 15k-20k | 0.37-0.47 |
| 500+ | 10k-15k | 0.25-0.35 |
Larger teams get lower per-user allocations because concurrency drops. Limits apply at the org level, so individuals can temporarily exceed their share.
9. PostToolUse hook for cost logging
/usage shows the current session only. For long-term tracking, append to a file with a hook.
.claude/hooks/log-cost.sh:
#!/bin/bash
INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name')
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id')
TS=$(date +%s)
echo "$TS | $SESSION_ID | $TOOL_NAME" >> "$CLAUDE_PROJECT_DIR/tasks/log/tool-usage.tsv"
exit 0
.claude/settings.local.json:
{
"hooks": {
"PostToolUse": [
{
"matcher": "*",
"hooks": [
{"type": "command", "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/log-cost.sh"}
]
}
]
}
}
The example only counts tool calls, but parsing /usage output in a PostToolBatch or Stop hook yields precise monthly stats.
10. Counter-scenarios — when this optimization doesn't apply
- Individual Pro/Max subscribers → flat-rate billing; per-token cost is irrelevant. Only the usage-bar matters. Pro hits a monthly cap → wait/upgrade.
- Short sessions only → below the cache minimum (Opus 4K, Sonnet 2K), caching doesn't kick in. No effect from optimization.
- Prototyping / exploration → speed beats cost. Overusing
/compactor model switching disrupts thinking flow. - Enterprise API deployment → team-wide tracking is better handled by a gateway (LiteLLM per official docs) than per-user hooks.
11. What's next
With cost structure in hand:
- Hooks complete guide — the foundation for preprocessing and cost-logging hooks above.
- CLAUDE.md authoring guide — keep it under 200 lines; move details to skills.
- Slash commands complete reference —
/usage,/context,/compact,/clear,/model,/effortall documented.
References
- Official Costs docs
- Official Pricing page
- Anthropic Prompt Caching reference
- Anthropic Console Usage
This is post 5/15 in the "AI Coding CLI Entry Guide" series. last verified: 2026-04-25 (per Anthropic official pricing and Claude Code Costs docs).
댓글
댓글 쓰기