"LLM Reasoning Modes (3/6) — Claude effort in Full: low·medium·high·xhigh·max"
Part 2 covered Claude's Thinking (whether to think, and when). Part 3 takes the dial that controls the depth of that thinking and the whole response head-on —
effort. We dissect whatlow / medium / high / xhigh / maxeach change, not only in thinking but in tool calls and preamble.
Claude's reasoning control has two layers — Thinking (whether and when to think) and effort (how deep, and how many tokens across the whole response). Part 2 looked at the first; Part 3 decomposes effort to the end. What makes effort unusual is that a single dial controls not just thinking depth but every token in the response.
In One Paragraph
Claude's
effortis a GA parameter set viaoutput_config: {effort: "low"|"medium"|"high"|"xhigh"|"max"}(no beta header). It first appeared on Opus 4.5 (then behind the beta headereffort-2025-11-24) and went GA from 4.6. The default ishigh— and settinghighis identical to omitting the parameter — on both the API and Claude Code. effort's defining trait: it controls all tokens in the response (text and explanations, tool calls and function arguments, thinking depth), not just thinking. It is a behavioral signal, not a strict token cap — at low effort Claude still thinks on hard problems, just less than it would at higher effort for the same problem.
1. The Parameter: What and How
effort lives inside output_config on the message request.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "medium"},
messages=[{"role": "user", "content": "..."}],
)
The facts, pinned down:
- Parameter:
output_config: {effort: "low"|"medium"|"high"|"xhigh"|"max"}. GA, no beta header required. - First introduced on Opus 4.5 (originally behind the beta header
effort-2025-11-24). GA from 4.6 onward. - Default =
high. Settinghighis exactly the same as omitting the parameter. This default applies on both the API and Claude Code.
2. What effort Controls — Every Token in the Response
This is why effort is more than a "thinking knob." effort controls all tokens in the response together.
- (a) text and explanations: how detailed the answer body is, how much it spells out.
- (b) tool calls and function arguments: how often and how it invokes tools.
- (c) extended/adaptive thinking depth: how deeply it thinks.
Bundling these three under a single dial is what separates effort from a pure thinking control. It doesn't only change thinking depth — the same knob also moves the verbosity of explanations and the tool-use pattern at the same time.
And effort is a behavioral signal, not a strict token budget. Setting low doesn't forbid thinking on hard problems — Claude still thinks on them, just less than it would at higher effort for the same problem. effort tells the model "what posture of effort to bring to this work"; it is not a hard ceiling that cuts the token counter off.
3. effort and Tool Use — The Same Dial Shifts Agent Behavior
That effort also governs tool calls is felt most in agentic workflows. Low and high effort diverge in behavior even on the same task.
| Lower effort | Higher effort | |
|---|---|---|
| Tool calls | Combine ops into fewer calls, fewer calls total | More calls, more granular |
| How it starts | Acts without preamble | Explains the plan before acting |
| Completion report | Terse confirmation | Detailed change summary |
| Code comments | Minimal | More comments |
So lower effort is "say less, batch it, move fast," and higher effort is "lay out the plan, be thorough." Because the same effort dial pulls thinking depth and tool behavior together, lowering effort makes the response shorter while also consolidating and reducing tool calls.
4. Per-Level Guidance — Anthropic's effort Doc
The recommended use for each level, taken straight from Anthropic's effort documentation:
| Level | Description | Use case |
|---|---|---|
max |
Absolute maximum capability, no constraint on token spend, deepest reasoning | Reserve for genuinely frontier problems. On most workloads it adds significant cost for small gains, and on structured-output / less-intelligence-sensitive tasks it can lead to overthinking |
xhigh |
Extended capability for long-horizon work | Agentic/coding runs over ~30 min, token budgets in the millions. The recommended starting point for coding/agentic on Opus 4.7/4.8 |
high |
High capability (= the default) | Complex reasoning, hard coding, agentic. Often the sweet spot for quality vs token efficiency |
medium |
Balanced, moderate token savings | The drop-in for average workflows |
low |
Most efficient, significant savings, some capability loss | Simple tasks, subagents, high-volume, latency-sensitive work |
The principle is the one repeated since Part 1 — not "always maximum" but match the dial to task difficulty. Even max is not a universal win; on a narrow output space it can actually reduce quality.
5. 4.7/4.8 Respect effort More Strictly
Across generations, the interpretation of effort changed too. Opus 4.7/4.8 respect effort more strictly than 4.6 — especially at low/medium, they scope work to exactly what's asked. They don't expand into work you didn't request.
Two practical rules follow.
- If reasoning is shallow on a hard task, raise effort rather than prompting around it. Bumping effort a level is more direct and behaves as intended, versus contorting the prompt with "think harder."
- At
xhigh/max, set a largemax_tokens(start ~64k). The model needs room to think and then act across tool calls and subagents. Dial effort up but leavemax_tokenstight, and the response gets cut off before the model can fully unfold.
6. effort × thinking Interaction — Per Model
effort and Thinking (Part 2) are separate knobs, but how they mesh differs per model.
| Model | Thinking mode | Relation to effort |
|---|---|---|
| Fable 5 / Mythos 5 | adaptive, always on | effort controls depth; {type:"disabled"} returns 400 |
| Opus 4.8 / 4.7 | adaptive thinking | effort is the recommended depth control; manual budget_tokens returns 400 |
| Opus 4.6 | adaptive thinking | effort recommended; budget_tokens deprecated but still accepted |
| Sonnet 4.6 | adaptive thinking | effort controls thinking depth |
| Opus 4.5 | manual thinking (budget_tokens) |
effort works alongside the thinking budget |
The read is simple. On current models (4.6-4.8, Fable 5), the official knob to raise or lower thinking depth is effort. Pinning thinking volume directly with budget_tokens survives only on Opus 4.5; the newer generations removed it (see Part 2).
7. Model Support Matrix
effort levels have different support per model. Here's the table to check before sending.
| effort level | Supported models |
|---|---|
low / medium / high |
Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8, Sonnet 4.6, Fable 5 |
max |
Opus 4.6 and later, Sonnet 4.6, Fable 5 — not Opus 4.5, not Haiku, not older Sonnets |
xhigh |
Added on Opus 4.7, also Opus 4.8, Fable 5 (sits between high and max) |
Additional notes:
- Sending effort to Sonnet 4.5 / Haiku 4.5 returns an error.
xhighis the intermediate step — deeper thanhigh, lighter thanmax— introduced on Opus 4.7.maxis not available on Opus 4.5 (from 4.6 onward).
8. Task Budgets — A Separate, Complementary Control
If effort controls per-turn depth, there's a separate knob for the cumulative token spend across a whole loop — Task Budgets.
- Parameter:
output_config: {task_budget: {type: "tokens", total: N}}. Beta headertask-budgets-2026-03-13, minimum 20,000. - Behavior: tells the model how many tokens it has for a full agentic loop. The model sees a running countdown and self-moderates against it.
- Difference from
max_tokens:max_tokensis an enforced per-response ceiling the model is not aware of. A Task Budget is a loop-level budget the model is aware of and self-moderates against.
In short — effort = per-turn depth, Task Budget = cumulative loop spend. The two don't compete; they complement. Set depth with effort, and cap the total of a long agentic run by letting the model self-moderate against a Task Budget.
What Comes Next
That's effort as seen from the API surface. Part 4 looks at the same effort in Claude Code in practice — the five CLI levels, /effort and the CLAUDE_CODE_EFFORT_LEVEL env var, session persistence, and what ultracode actually is (it shows up in the menu but is not an API-level effort).
This article's parameter values, defaults, and per-model support are grounded in Anthropic's primary effort documentation and the model migration guide.
Series overview: Series index
댓글
댓글 쓰기