"LLM Reasoning Modes (3/6) — Claude effort in Full: low·medium·high·xhigh·max"

Part 2 covered Claude's Thinking (whether to think, and when). Part 3 takes the dial that controls the depth of that thinking and the whole response head-on — effort. We dissect what low / medium / high / xhigh / max each change, not only in thinking but in tool calls and preamble.

Claude's reasoning control has two layers — Thinking (whether and when to think) and effort (how deep, and how many tokens across the whole response). Part 2 looked at the first; Part 3 decomposes effort to the end. What makes effort unusual is that a single dial controls not just thinking depth but every token in the response.

In One Paragraph

Claude's effort is a GA parameter set via output_config: {effort: "low"|"medium"|"high"|"xhigh"|"max"} (no beta header). It first appeared on Opus 4.5 (then behind the beta header effort-2025-11-24) and went GA from 4.6. The default is high — and setting high is identical to omitting the parameter — on both the API and Claude Code. effort's defining trait: it controls all tokens in the response (text and explanations, tool calls and function arguments, thinking depth), not just thinking. It is a behavioral signal, not a strict token cap — at low effort Claude still thinks on hard problems, just less than it would at higher effort for the same problem.


1. The Parameter: What and How

effort lives inside output_config on the message request.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "medium"},
    messages=[{"role": "user", "content": "..."}],
)

The facts, pinned down:

  • Parameter: output_config: {effort: "low"|"medium"|"high"|"xhigh"|"max"}. GA, no beta header required.
  • First introduced on Opus 4.5 (originally behind the beta header effort-2025-11-24). GA from 4.6 onward.
  • Default = high. Setting high is exactly the same as omitting the parameter. This default applies on both the API and Claude Code.

2. What effort Controls — Every Token in the Response

This is why effort is more than a "thinking knob." effort controls all tokens in the response together.

  • (a) text and explanations: how detailed the answer body is, how much it spells out.
  • (b) tool calls and function arguments: how often and how it invokes tools.
  • (c) extended/adaptive thinking depth: how deeply it thinks.

Bundling these three under a single dial is what separates effort from a pure thinking control. It doesn't only change thinking depth — the same knob also moves the verbosity of explanations and the tool-use pattern at the same time.

And effort is a behavioral signal, not a strict token budget. Setting low doesn't forbid thinking on hard problems — Claude still thinks on them, just less than it would at higher effort for the same problem. effort tells the model "what posture of effort to bring to this work"; it is not a hard ceiling that cuts the token counter off.

3. effort and Tool Use — The Same Dial Shifts Agent Behavior

That effort also governs tool calls is felt most in agentic workflows. Low and high effort diverge in behavior even on the same task.

Lower effort Higher effort
Tool calls Combine ops into fewer calls, fewer calls total More calls, more granular
How it starts Acts without preamble Explains the plan before acting
Completion report Terse confirmation Detailed change summary
Code comments Minimal More comments

So lower effort is "say less, batch it, move fast," and higher effort is "lay out the plan, be thorough." Because the same effort dial pulls thinking depth and tool behavior together, lowering effort makes the response shorter while also consolidating and reducing tool calls.

4. Per-Level Guidance — Anthropic's effort Doc

The recommended use for each level, taken straight from Anthropic's effort documentation:

Level Description Use case
max Absolute maximum capability, no constraint on token spend, deepest reasoning Reserve for genuinely frontier problems. On most workloads it adds significant cost for small gains, and on structured-output / less-intelligence-sensitive tasks it can lead to overthinking
xhigh Extended capability for long-horizon work Agentic/coding runs over ~30 min, token budgets in the millions. The recommended starting point for coding/agentic on Opus 4.7/4.8
high High capability (= the default) Complex reasoning, hard coding, agentic. Often the sweet spot for quality vs token efficiency
medium Balanced, moderate token savings The drop-in for average workflows
low Most efficient, significant savings, some capability loss Simple tasks, subagents, high-volume, latency-sensitive work

The principle is the one repeated since Part 1 — not "always maximum" but match the dial to task difficulty. Even max is not a universal win; on a narrow output space it can actually reduce quality.

5. 4.7/4.8 Respect effort More Strictly

Across generations, the interpretation of effort changed too. Opus 4.7/4.8 respect effort more strictly than 4.6 — especially at low/medium, they scope work to exactly what's asked. They don't expand into work you didn't request.

Two practical rules follow.

  • If reasoning is shallow on a hard task, raise effort rather than prompting around it. Bumping effort a level is more direct and behaves as intended, versus contorting the prompt with "think harder."
  • At xhigh/max, set a large max_tokens (start ~64k). The model needs room to think and then act across tool calls and subagents. Dial effort up but leave max_tokens tight, and the response gets cut off before the model can fully unfold.

6. effort × thinking Interaction — Per Model

effort and Thinking (Part 2) are separate knobs, but how they mesh differs per model.

Model Thinking mode Relation to effort
Fable 5 / Mythos 5 adaptive, always on effort controls depth; {type:"disabled"} returns 400
Opus 4.8 / 4.7 adaptive thinking effort is the recommended depth control; manual budget_tokens returns 400
Opus 4.6 adaptive thinking effort recommended; budget_tokens deprecated but still accepted
Sonnet 4.6 adaptive thinking effort controls thinking depth
Opus 4.5 manual thinking (budget_tokens) effort works alongside the thinking budget

The read is simple. On current models (4.6-4.8, Fable 5), the official knob to raise or lower thinking depth is effort. Pinning thinking volume directly with budget_tokens survives only on Opus 4.5; the newer generations removed it (see Part 2).

7. Model Support Matrix

effort levels have different support per model. Here's the table to check before sending.

effort level Supported models
low / medium / high Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8, Sonnet 4.6, Fable 5
max Opus 4.6 and later, Sonnet 4.6, Fable 5 — not Opus 4.5, not Haiku, not older Sonnets
xhigh Added on Opus 4.7, also Opus 4.8, Fable 5 (sits between high and max)

Additional notes:

  • Sending effort to Sonnet 4.5 / Haiku 4.5 returns an error.
  • xhigh is the intermediate step — deeper than high, lighter than max — introduced on Opus 4.7.
  • max is not available on Opus 4.5 (from 4.6 onward).

8. Task Budgets — A Separate, Complementary Control

If effort controls per-turn depth, there's a separate knob for the cumulative token spend across a whole loopTask Budgets.

  • Parameter: output_config: {task_budget: {type: "tokens", total: N}}. Beta header task-budgets-2026-03-13, minimum 20,000.
  • Behavior: tells the model how many tokens it has for a full agentic loop. The model sees a running countdown and self-moderates against it.
  • Difference from max_tokens: max_tokens is an enforced per-response ceiling the model is not aware of. A Task Budget is a loop-level budget the model is aware of and self-moderates against.

In short — effort = per-turn depth, Task Budget = cumulative loop spend. The two don't compete; they complement. Set depth with effort, and cap the total of a long agentic run by letting the model self-moderate against a Task Budget.

What Comes Next

That's effort as seen from the API surface. Part 4 looks at the same effort in Claude Code in practice — the five CLI levels, /effort and the CLAUDE_CODE_EFFORT_LEVEL env var, session persistence, and what ultracode actually is (it shows up in the menu but is not an API-level effort).


This article's parameter values, defaults, and per-model support are grounded in Anthropic's primary effort documentation and the model migration guide.

Series overview: Series index

댓글

이 블로그의 인기 게시물

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System

"ML Foundations (6/9) — Neural Networks: From Perceptron to MLP"