"LLM Reasoning Modes (2/6) — Claude's Thinking: From Fixed Budget to Adaptive"

Part 1 established that thinking costs. Part 2 looks at how Claude controls that thinking. The pivot is the move from a fixed token budget (budget_tokens) to adaptive thinking, where the model decides for itself.

Claude's reasoning control has two layers: Thinking (whether to think, and how deep) and effort (the token budget for the whole response). They work together but are separate knobs. Part 2 covers the first — Thinking — and effort is Part 3.

In One Paragraph

Old Claude gave thinking a fixed token budget via thinking: {type: "enabled", budget_tokens: N}. Current models dropped that for adaptive thinking (thinking: {type: "adaptive"}) — the model decides when and how much to think per request. budget_tokens is deprecated (still works) on Opus 4.6 / Sonnet 4.6, and removed (400 error) on Opus 4.7, 4.8, and Fable 5. Thinking depth is now controlled by effort. The raw chain of thought is never returned; display only toggles a summary.


1. Extended Thinking: The Fixed-Budget Era

Claude's first way of exposing thinking was an explicit token budget.

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[...],
)

The rules were simple.

  • budget_tokens must be less than max_tokens.
  • Minimum 1024.
  • The developer pins "this call may spend up to N tokens on thinking."

The problem is that this is rigid. Reserving a budget on an easy question wastes it; too small a budget on a hard question truncates the thinking. The developer had to estimate difficulty and set a number per call.

2. Why the Fixed Budget Was Dropped — Adaptive Thinking

Current Claude uses adaptive thinking.

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    messages=[...],
)

The key difference is who decides. A fixed budget lets the developer pin thinking volume; adaptive lets the model look at the request and decide when and how much to think. It thinks briefly or skips on easy questions and thinks longer on hard ones.

There's a side effect too. Adaptive thinking automatically enables interleaved thinking (thinking between tool calls) — no beta header required. In an agent loop, the "look at the tool result, plan the next step" reasoning slots in naturally.

So what controls thinking depth? Effort (Part 3). Adaptive thinking leaves "whether and when" to the model; effort sets the tone for "how deep." Their combination replaces the single fixed budget.

3. Per-Model Status — What Works and What's Blocked

This transition is staged differently per model. Here's the table you'll hit most often when migrating.

Model budget_tokens (fixed budget) Adaptive thinking Notes
Opus 4.5 Used (manual thinking) effort works alongside the thinking budget
Opus 4.6 Deprecated (still allowed) Recommended transitional escape hatch only
Sonnet 4.6 Deprecated (still allowed) Recommended defaults to high effort
Opus 4.7 Removed → 400 Used thinking is off if not specified
Opus 4.8 Removed → 400 Used same request surface as 4.7
Fable 5 Removed → 400 Always on omit thinking entirely; disabled also 400s

How to read it:

  • Opus 4.7 / 4.8: you must explicitly set thinking: {type: "adaptive"} for thinking to turn on. Omitting the field runs without thinking. Sending budget_tokens returns 400.
  • Fable 5: thinking is always on. Omit the thinking field entirely; {type: "disabled"} returns 400.
  • Opus 4.5: the one model that uses manual thinking (budget_tokens) together with effort.

For new code the answer is singular — don't use budget_tokens; go with thinking: {type: "adaptive"} + effort. The very concept of a "fixed thinking budget" is on its way out.

4. Thinking Blocks and display — Visible vs Hidden

Thinking happens and is billed, but the raw text is never returned. The response stream contains a thinking-type block, and what it carries is decided by thinking.display.

  • display: "summarized" → a readable summary is included.
  • display: "omitted" → the thinking block arrives but its text is an empty string.

The trap: the default is omitted (on Fable 5, Mythos 5, Opus 4.8, and 4.7). This is a silent change from Opus 4.6, where it was summarized. As a result, a UI that meant to show progress can suddenly look like "a long pause, then the answer." To surface a summary, set it explicitly.

thinking={"type": "adaptive", "display": "summarized"}

display controls visibility only. Under any setting, thinking happens and is billed the same. And under no setting is the raw chain of thought exposed — a summary is the ceiling.

5. Multi-Turn Rule — Echo Thinking Blocks Back Unchanged

In agents and multi-turn there's one rule to keep. When continuing on the same model, send the thinking blocks back exactly as received, unmodified. Even an empty-text block goes back as-is — the API rejects modified blocks, not blocks you've read.

Conversely, when handing off to a different model, those thinking blocks are dropped from the prompt (usually silently, not an error). The drop happens before billing, so it costs nothing and there's nothing to strip out. Ordinary thinking blocks replay freely across models; treat this drop as normal behavior.

6. Thinking and effort (Next Part Preview)

To summarize, current Claude's reasoning control reads like this.

  • adaptive thinking = the model decides "whether and when to think."
  • effort = the tone for "how deep to think, and how many tokens to spend across the whole response."

At high, xhigh, and max effort the model almost always thinks deeply; at lower effort it may skip thinking on easy problems. So effort dials adaptive thinking's eagerness up or down.

Part 3 takes effort head-on — what low / medium / high / xhigh / max each change, not only in thinking but in tool calls and preamble.


Parameter behavior and per-model support are grounded in Anthropic's primary documentation (extended thinking, adaptive thinking, effort) and the model migration guide.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System

"ML Foundations (6/9) — Neural Networks: From Perceptron to MLP"