"LLM Reasoning Modes (5/6) — OpenAI and Codex reasoning_effort: from minimal to xhigh"
Parts 3-4 showed that Claude's single
effortknob tunes thinking, text, and tool calls all at once. Part 5 looks at how OpenAI handles the same trade-off. The key difference: OpenAI splits the knob in two —reasoning_effortfor thinking depth,verbosityfor output length.
OpenAI also exposes a knob to tune quality, latency, and cost per call without switching models. It's called reasoning_effort. But it differs from Claude in two decisive ways: the knob is split into two, and the set of values plus the default change by model generation. Part 5 takes both head-on.
In One Paragraph
OpenAI controls thinking depth with
reasoning_effort(the Responses/Chat API; in the Codex CLI config it'smodel_reasoning_effort). The values differ by model generation — GPT-5 isminimal·low·medium·high, GPT-5.5 isnone·low·medium·high·xhigh, and the gpt-5-codex family islow·medium·high·xhigh(nominimal; configuring it normalizes tolow). The default ismedium, the recommended balanced starting point. And OpenAI splits output length into a separate parameter,verbosity— in contrast to Claude's singleeffort, which moves thinking depth and output expansiveness together. Reasoning tokens are not returned verbatim (summaries only) and are billed as output tokens.
1. The Name of the Knob — reasoning_effort and model_reasoning_effort
In OpenAI, the parameter that tunes thinking depth is reasoning_effort. It surfaces in two places.
- API (Responses / Chat):
reasoning_effort. - Codex CLI config:
model_reasoning_effort.
It's the same conceptual knob; just remember the key name differs between calling the API directly in code and setting it in the Codex CLI config file. The meaning is identical — it sets the tone for how many reasoning (thinking) tokens the model spends before the visible answer.
Higher effort means more reasoning tokens, longer inference, and better accuracy on hard tasks. Lower effort is the reverse. It's the same kind of trade-off knob as Claude's effort, but as the next section shows, the set of values differs by model generation.
2. Values by Model Generation — Same Knob, Different Notches
The first trap you hit: the values reasoning_effort accepts differ by model. A value that exists on one model is absent on another.
| Model generation | Accepted values | Default | Notes |
|---|---|---|---|
| GPT-5 | minimal, low, medium, high |
medium |
supports minimal |
| GPT-5.5 | none, low, medium, high, xhigh |
medium |
bottom is none, adds xhigh at the top |
| GPT-5.2-Codex / gpt-5-codex | low, medium, high, xhigh |
medium |
no minimal — configuring it normalizes to low |
How to read it:
- GPT-5 puts
minimalat the bottom. - GPT-5.5 shifts the bottom to
noneand addsxhighat the top. - The gpt-5-codex family (including GPT-5.2-Codex) is tuned for coding and agentic work and does not support
minimal. If you configureminimal, it normalizes tolow. So on a Codex model, the lightest "reasoning-nearly-off" step islow.
Because the notches differ per generation, when moving code or config to a different model you must stay within the values that model accepts.
3. minimal — The Latency-Shaving Step
minimal runs with few or no reasoning tokens to minimize latency — especially time-to-first-token.
When to use it:
- Deterministic, lightweight tasks with a narrow output space — extraction, formatting, short rewrites, simple classification.
- Work where deep thinking barely improves accuracy and only adds latency.
When to avoid it:
- Multi-step planning or tool-heavy workflows. Shaving reasoning on these hurts quality.
minimal exists on GPT-5 and is absent on the gpt-5-codex family (configuring it normalizes to low). On GPT-5.5 the bottom rung is none.
4. The Default Is medium — and When to Go Higher
The default, per the table above, is medium. It's the recommended starting point that balances quality, reliability, latency, and cost. Absent a specific reason, start here and move toward whichever side you need.
- If answers come out shallow on a hard task, raise effort (
high, andxhighon models that support it). More reasoning tokens, longer inference, higher accuracy on hard tasks. - For lightweight, deterministic tasks, lower it (
low, GPT-5'sminimal, GPT-5.5'snone). Less latency and cost.
Note that where Claude defaults to high, OpenAI defaults to medium — a common point of confusion when moving between the two platforms, since "leave it at the default" means a different amount of thinking on each.
5. reasoning_effort vs verbosity — Splitting the Knob in Two
Here is the core design difference in OpenAI. Separate from reasoning_effort, OpenAI has a parameter called verbosity.
reasoning_effort= the depth of thinking. How many reasoning tokens to spend before the visible answer.verbosity(low/medium/high) = the length/expansiveness of the output. How far to unfold the answer.
The two are orthogonal. verbosity lets you tune answer length without rewriting the prompt — e.g., think deeply (high reasoning_effort) but answer briefly (low verbosity), or the reverse.
Let's make the contrast with Claude explicit.
| Thinking depth | Output length/expansiveness | |
|---|---|---|
| Claude | effort controls both |
(controlled by the same effort) |
| OpenAI | reasoning_effort |
verbosity (separate) |
So Claude's single effort knob moves thinking depth and output/tool verbosity together. OpenAI splits this into two knobs, turning thinking depth (reasoning_effort) and output length (verbosity) independently. Two design philosophies for the same trade-off.
6. How Reasoning Tokens Are Returned and Billed
OpenAI's reasoning tokens are not returned verbatim — they're hidden, with optional summaries only. And reasoning tokens are billed as output tokens (general OpenAI platform behavior and pricing). So raising reasoning_effort increases the hidden reasoning tokens, and your output-token cost rises accordingly.
This shares the same broad frame as Claude — on both platforms, thinking/reasoning tokens are billed at the expensive output-token tier, and the raw chain of thought is not returned (a summary is the ceiling). That's why the effort knob is not only a quality lever but a direct cost lever.
7. Codex CLI — config.toml and the -c Override
In the Codex CLI, the same knob surfaces in both the config file and on the command line.
Write the model and reasoning effort into ~/.codex/config.toml.
model = "gpt-5.2-codex"
model_reasoning_effort = "high" # low | medium | high | xhigh
To use a different effort for a single run, override the setting with -c.
codex -m gpt-5.2-codex -c model_reasoning_effort="xhigh" "your prompt"
The full enum the config accepts is none | minimal | low | medium | high | xhigh. But whether a value actually applies is subject to per-model support. For instance, the gpt-5-codex family has no minimal (configuring it normalizes to low), so on a default Codex model the lightest step is low.
8. Which Value to Pick — Summary
- Default to
medium— the balance of quality, reliability, latency, and cost. Start here. - Lightweight, deterministic tasks (extraction, formatting, short rewrites, simple classification) →
minimal(GPT-5) /none(GPT-5.5) /low(Codex family). Shaves latency. - Hard, multi-step, tool-heavy tasks →
high, andxhighon supporting models. Spend more reasoning tokens and time to gain accuracy. - If answer length is the problem, move
verbosity, not effort. Thinking depth and output length are separate knobs.
What Comes Next
We've now dissected Claude's effort (Parts 3-4) and OpenAI's reasoning_effort plus verbosity (Part 5) separately. The final Part 6 puts them side by side — comparing the cost, latency, and quality trade-offs with benchmark figures, and laying out how to match the knob to task difficulty. That's where the answer to "does more thinking always help?" lives.
Parameter names, per-model values, defaults, and Codex configuration are grounded in OpenAI's GPT-5 new-parameters docs, the GPT-5.5 and GPT-5.2-Codex model docs, and Codex CLI configuration sources. Reasoning-token billing (at the output-token tier) is stated as general OpenAI platform behavior.
Series overview: Series index
๋๊ธ
๋๊ธ ์ฐ๊ธฐ