"LLM Reasoning Modes (3/6) — Claude effort in Full: low·medium·high·xhigh·max"

6월 15, 2026

Part 2 covered Claude's Thinking (whether to think, and when). Part 3 takes the dial that controls the depth of that thinking and the whole response head-on — effort. We dissect what low / medium / high / xhigh / max each change, not only in thinking but in tool calls and preamble.

Claude's reasoning control has two layers — Thinking (whether and when to think) and effort (how deep, and how many tokens across the whole response). Part 2 looked at the first; Part 3 decomposes effort to the end. What makes effort unusual is that a single dial controls not just thinking depth but every token in the response.

In One Paragraph

Claude's effort is a GA parameter set via output_config: {effort: "low"|"medium"|"high"|"xhigh"|"max"} (no beta header). It first appeared on Opus 4.5 (then behind the beta header effort-2025-11-24) and went GA from 4.6. The default is high — and setting high is identical to omitting the parameter — on both the API and Claude Code. effort's defining trait: it controls all tokens in the response (text and explanations, tool calls and function arguments, thinking depth), not just thinking. It is a behavioral signal, not a strict token cap — at low effort Claude still thinks on hard problems, just less than it would at higher effort for the same problem.

1. The Parameter: What and How

effort lives inside output_config on the message request.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "medium"},
    messages=[{"role": "user", "content": "..."}],
)

The facts, pinned down:

Parameter: output_config: {effort: "low"|"medium"|"high"|"xhigh"|"max"}. GA, no beta header required.
First introduced on Opus 4.5 (originally behind the beta header effort-2025-11-24). GA from 4.6 onward.
Default = high. Setting high is exactly the same as omitting the parameter. This default applies on both the API and Claude Code.

2. What effort Controls — Every Token in the Response

This is why effort is more than a "thinking knob." effort controls all tokens in the response together.

(a) text and explanations: how detailed the answer body is, how much it spells out.
(b) tool calls and function arguments: how often and how it invokes tools.
(c) extended/adaptive thinking depth: how deeply it thinks.

Bundling these three under a single dial is what separates effort from a pure thinking control. It doesn't only change thinking depth — the same knob also moves the verbosity of explanations and the tool-use pattern at the same time.

And effort is a behavioral signal, not a strict token budget. Setting low doesn't forbid thinking on hard problems — Claude still thinks on them, just less than it would at higher effort for the same problem. effort tells the model "what posture of effort to bring to this work"; it is not a hard ceiling that cuts the token counter off.

3. effort and Tool Use — The Same Dial Shifts Agent Behavior

That effort also governs tool calls is felt most in agentic workflows. Low and high effort diverge in behavior even on the same task.

	Lower effort	Higher effort
Tool calls	Combine ops into fewer calls, fewer calls total	More calls, more granular
How it starts	Acts without preamble	Explains the plan before acting
Completion report	Terse confirmation	Detailed change summary
Code comments	Minimal	More comments

So lower effort is "say less, batch it, move fast," and higher effort is "lay out the plan, be thorough." Because the same effort dial pulls thinking depth and tool behavior together, lowering effort makes the response shorter while also consolidating and reducing tool calls.

4. Per-Level Guidance — Anthropic's effort Doc

The recommended use for each level, taken straight from Anthropic's effort documentation:

Level	Description	Use case
`max`	Absolute maximum capability, no constraint on token spend, deepest reasoning	Reserve for genuinely frontier problems. On most workloads it adds significant cost for small gains, and on structured-output / less-intelligence-sensitive tasks it can lead to overthinking
`xhigh`	Extended capability for long-horizon work	Agentic/coding runs over ~30 min, token budgets in the millions. The recommended starting point for coding/agentic on Opus 4.7/4.8
`high`	High capability (= the default)	Complex reasoning, hard coding, agentic. Often the sweet spot for quality vs token efficiency
`medium`	Balanced, moderate token savings	The drop-in for average workflows
`low`	Most efficient, significant savings, some capability loss	Simple tasks, subagents, high-volume, latency-sensitive work

The principle is the one repeated since Part 1 — not "always maximum" but match the dial to task difficulty. Even max is not a universal win; on a narrow output space it can actually reduce quality.

5. 4.7/4.8 Respect effort More Strictly

Across generations, the interpretation of effort changed too. Opus 4.7/4.8 respect effort more strictly than 4.6 — especially at low/medium, they scope work to exactly what's asked. They don't expand into work you didn't request.

Two practical rules follow.

If reasoning is shallow on a hard task, raise effort rather than prompting around it. Bumping effort a level is more direct and behaves as intended, versus contorting the prompt with "think harder."
At xhigh/max, set a large max_tokens (start ~64k). The model needs room to think and then act across tool calls and subagents. Dial effort up but leave max_tokens tight, and the response gets cut off before the model can fully unfold.

6. effort × thinking Interaction — Per Model

effort and Thinking (Part 2) are separate knobs, but how they mesh differs per model.

Model	Thinking mode	Relation to effort
Fable 5 / Mythos 5	adaptive, always on	effort controls depth; `{type:"disabled"}` returns 400
Opus 4.8 / 4.7	adaptive thinking	effort is the recommended depth control; manual `budget_tokens` returns 400
Opus 4.6	adaptive thinking	effort recommended; `budget_tokens` deprecated but still accepted
Sonnet 4.6	adaptive thinking	effort controls thinking depth
Opus 4.5	manual thinking (`budget_tokens`)	effort works alongside the thinking budget

The read is simple. On current models (4.6-4.8, Fable 5), the official knob to raise or lower thinking depth is effort. Pinning thinking volume directly with budget_tokens survives only on Opus 4.5; the newer generations removed it (see Part 2).

7. Model Support Matrix

effort levels have different support per model. Here's the table to check before sending.

effort level	Supported models
`low` / `medium` / `high`	Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8, Sonnet 4.6, Fable 5
`max`	Opus 4.6 and later, Sonnet 4.6, Fable 5 — not Opus 4.5, not Haiku, not older Sonnets
`xhigh`	Added on Opus 4.7, also Opus 4.8, Fable 5 (sits between `high` and `max`)

Additional notes:

Sending effort to Sonnet 4.5 / Haiku 4.5 returns an error.
xhigh is the intermediate step — deeper than high, lighter than max — introduced on Opus 4.7.
max is not available on Opus 4.5 (from 4.6 onward).

8. Task Budgets — A Separate, Complementary Control

If effort controls per-turn depth, there's a separate knob for the cumulative token spend across a whole loop — Task Budgets.

Parameter: output_config: {task_budget: {type: "tokens", total: N}}. Beta header task-budgets-2026-03-13, minimum 20,000.
Behavior: tells the model how many tokens it has for a full agentic loop. The model sees a running countdown and self-moderates against it.
Difference from max_tokens: max_tokens is an enforced per-response ceiling the model is not aware of. A Task Budget is a loop-level budget the model is aware of and self-moderates against.

In short — effort = per-turn depth, Task Budget = cumulative loop spend. The two don't compete; they complement. Set depth with effort, and cap the total of a long agentic run by letting the model self-moderate against a Task Budget.

What Comes Next

That's effort as seen from the API surface. Part 4 looks at the same effort in Claude Code in practice — the five CLI levels, /effort and the CLAUDE_CODE_EFFORT_LEVEL env var, session persistence, and what ultracode actually is (it shows up in the menu but is not an API-level effort).

This article's parameter values, defaults, and per-model support are grounded in Anthropic's primary effort documentation and the model migration guide.

Series overview: Series index

이 블로그 검색

MaJu Tech Notes