Some reasoning models do not return their reasoning tokens. While most models and providers make reasoning tokens available in the response, some (like the OpenAI o-series) do not.
Controlling Reasoning Tokens
You can control reasoning tokens in your requests using thereasoning parameter:
Max Tokens for Reasoning
Supported models Currently supported by:- Gemini thinking models
- Anthropic reasoning models (by using the reasoning.max_tokens parameter)
- Some Alibaba Qwen thinking models (mapped to thinking_budget)
For Alibaba, support varies by model — please check the individual model descriptions to confirm whether reasoning.max_tokens (via thinking_budget) is available.
"max_tokens": 2000- Directly specifies the maximum number of tokens to use for reasoning
Reasoning Effort Level
Supported models Currently supported by OpenAI reasoning models (o1 series, o3 series, GPT-5 series) and Grok models"effort": "xhigh"- Allocates the largest portion of tokens for reasoning (approximately 95% of max_tokens)"effort": "high"- Allocates a large portion of tokens for reasoning (approximately 80% of max_tokens)"effort": "medium"- Allocates a moderate portion of tokens (approximately 50% of max_tokens)"effort": "low"- Allocates a smaller portion of tokens (approximately 20% of max_tokens)"effort": "minimal"- Allocates an even smaller portion of tokens (approximately 10% of max_tokens)"effort": "none"- Disables reasoning entirely
Excluding Reasoning Tokens
If you want the model to use reasoning internally but not include it in the response:"exclude": true- The model will still use reasoning, but it won’t be returned in the response
Enable Reasoning with Default Config
To enable reasoning with the default parameters:"enabled": true- Enables reasoning at the “medium” effort level with no exclusions.
Examples
Basic Usage with Reasoning Tokens
Using Max Tokens for Reasoning
For models that support direct token allocation (like Anthropic models), you can specify the exact number of tokens to use for reasoning:Excluding Reasoning Tokens from Response
If you want the model to use reasoning internally but not include it in the response:Advanced Usage: Reasoning Chain-of-Thought
This example shows how to use reasoning tokens in a more complex workflow. It injects one model’s reasoning into another model to improve its response quality:Preserving Reasoning
To preserve reasoning context across multiple turns, you can pass it back to the API in one of two ways:message.reasoning(string): Pass the plaintext reasoning as a string field on the assistant messagemessage.reasoning_details(array): Pass the full reasoning_details block
reasoning_details when working with models that return special reasoning types (such as encrypted or summarized) - this preserves the full structure needed for those models.
For models that only return raw reasoning strings, you can use the simpler reasoning field. You can also use reasoning_content as an alias - it functions identically to reasoning.
Model Support
Preserving reasoning is currently supported by these proprietary models:- All OpenAI reasoning models (o1 series, o3 series, GPT-5 series and newer)
- All Anthropic reasoning models (Claude 3.7 series and newer)
- All Gemini Reasoning models
- All xAI reasoning models
- MiniMax M2 / M2.1
- Kimi K2 Thinking / K2.5
- INTELLECT-3
- Nemotron 3 Nano
- MiMo-V2-Flash
- All Z.ai reasoning models (GLM 4.5 series and newer)
Note: standard interleaved thinking only. The preserved thinking feature for Z.ai models is currently not supported.
reasoning_details functionality works identically across all supported reasoning models. You can easily switch between OpenAI reasoning models (like blackboxai/openai/gpt-5.2) and Anthropic reasoning models (like blackboxai/anthropic/claude-sonnet-4.5) without changing your code structure.
Preserving reasoning blocks is useful specifically for tool calling. When models like Claude invoke tools, it is pausing its construction of a response to await external information. When tool results are returned, the model will continue building that existing response. This necessitates preserving reasoning blocks during tool use, for a couple of reasons:
- Reasoning continuity: The reasoning blocks capture the model’s step-by-step reasoning that led to tool requests. When you post tool results, including the original reasoning ensures the model can continue its reasoning from where it left off.
- Context maintenance: While tool results appear as user messages in the API structure, they’re part of a continuous reasoning flow. Preserving reasoning blocks maintains this conceptual flow across multiple API calls.
Example: Preserving Reasoning Blocks with Tool Calls
Reasoning Details API Shape
When reasoning models generate responses, the reasoning information is structured in a standardized format through thereasoning_details array. This section documents the API response structure for reasoning details in both streaming and non-streaming responses.
reasoning_details Array Structure
Thereasoning_details field contains an array of reasoning detail objects. Each object in the array represents a specific piece of reasoning information and follows one of three possible types. The location of this array differs between streaming and non-streaming responses.
- Non-streaming responses:
reasoning_detailsappears inchoices[].message.reasoning_details - Streaming responses:
reasoning_detailsappears inchoices[].delta.reasoning_detailsfor each chunk
Common Fields
All reasoning detail objects share these common fields:id(string | null): Unique identifier for the reasoning detailformat(string): The format of the reasoning detail, with possible values:"unknown"- Format is not specified"openai-responses-v1"- OpenAI responses format version 1"xai-responses-v1"- xAI responses format version 1"anthropic-claude-v1"- Anthropic Claude format version 1 (default)
index(number, optional): Sequential index of the reasoning detail
Reasoning Detail Types
1. Summary Type (reasoning.summary)
Contains a high-level summary of the reasoning process:2. Encrypted Type (reasoning.encrypted)
Contains encrypted reasoning data that may be redacted or protected:3. Text Type (reasoning.text)
Contains raw text reasoning with optional signature verification:Response Examples
Non-Streaming Response
In non-streaming responses,reasoning_details appears in the message:
Streaming Response
In streaming responses,reasoning_details appears in delta chunks as the reasoning is generated:
Streaming Behavior Notes:
- Each reasoning detail chunk is sent as it becomes available
- The
reasoning_detailsarray in each chunk may contain one or more reasoning objects - For encrypted reasoning, the content may appear as
[REDACTED]in streaming responses - The complete reasoning sequence is built by concatenating all chunks in order
Legacy Parameters
For backward compatibility, BLACKBOX AI still supports the following legacy parameters:include_reasoning: true- Equivalent toreasoning: {}include_reasoning: false- Equivalent toreasoning: { exclude: true }
reasoning parameter for better control and future compatibility.
Provider-Specific Reasoning Implementation
Anthropic Models with Reasoning Tokens
The latest Claude models, such asblackboxai/anthropic/claude-3.7-sonnet, support working with and returning reasoning tokens.
You can enable reasoning on Anthropic models only using the unified reasoning parameter with either effort or max_tokens.
Reasoning Max Tokens for Anthropic Models
When using Anthropic models with reasoning:- When using the
reasoning.max_tokensparameter, that value is used directly with a minimum of 1024 tokens. - When using the
reasoning.effortparameter, the budget_tokens are calculated based on themax_tokensvalue.
budget_tokens = max(min(max_tokens * {effort_ratio}, 128000), 1024)
effort_ratio is 0.95 for xhigh effort, 0.8 for high effort, 0.5 for medium effort, 0.2 for low effort, and 0.1 for minimal effort.
Example: Streaming with Anthropic Reasoning Tokens
Google Gemini 3 Models with Thinking Levels
Gemini 3 models (such asblackboxai/google/gemini-3-pro-preview and blackboxai/google/gemini-3-flash-preview) use Google’s thinkingLevel API instead of the older thinkingBudget API used by Gemini 2.5 models.
BLACKBOX AI maps the reasoning.effort parameter directly to Google’s thinkingLevel values:
| BLACKBOX AI reasoning.effort | Google thinkingLevel |
|---|---|
| ”minimal" | "minimal" |
| "low" | "low" |
| "medium" | "medium" |
| "high" | "high" |
| "xhigh" | "high” (mapped down) |
Token Consumption is Determined by Google
When usingthinkingLevel, the actual number of reasoning tokens consumed is determined internally by Google. There are no publicly documented token limit breakpoints for each level. For example, setting effort: "low" might result in several hundred reasoning tokens depending on the complexity of the task. This is expected behavior and reflects how Google implements thinking levels internally.
If a model doesn’t support a specific effort level (for example, if a model only supports low and high), BLACKBOX AI will map your requested effort to the nearest supported level.
Using max_tokens with Gemini 3
If you specifyreasoning.max_tokens explicitly, BLACKBOX AI will pass it through as thinkingBudget to Google’s API. However, for Gemini 3 models, Google internally maps this budget value to a thinkingLevel, so you will not get precise token control. The actual token consumption is still determined by Google’s thinkingLevel implementation, not by the specific budget value you provide.
Example: Using Thinking Levels with Gemini 3
Reasoning with the Responses API
Reasoning models likeblackboxai/openai/gpt-5.3-codex are LLMs trained with reinforcement learning to perform reasoning. They think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel at complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows.
Get Started with Reasoning
Call the Responses API and specify your reasoning model and reasoning effort:reasoning.effort parameter guides the model on how many reasoning tokens to generate before creating a response. The default value is medium.
| Value | Description |
|---|---|
"none" | Disables reasoning entirely — no reasoning tokens are generated |
"low" | Favors speed and economical token usage |
"medium" | Balanced between speed and reasoning accuracy (default) |
"high" | Favors more complete reasoning for complex tasks |
"xhigh" | Maximum reasoning depth — allocates the largest portion of tokens for thinking |
How Reasoning Works
Reasoning models introduce reasoning tokens in addition to input and output tokens. The models use these reasoning tokens to “think,” breaking down the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens and discards the reasoning tokens from its context.Managing the Context Window
It’s important to ensure there’s enough space in the context window for reasoning tokens when creating responses. Depending on the problem’s complexity, the models may generate anywhere from a few hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used is visible in theusage object of the response, under output_tokens_details:
Allocating Space for Reasoning
If the generated tokens reach the context window limit or themax_output_tokens value you’ve set, you’ll receive a response with a status of incomplete and incomplete_details with reason set to max_output_tokens. This might occur before any visible output tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.
To prevent this, ensure there’s sufficient space in the context window or adjust the max_output_tokens value to a higher number. We recommend reserving at least 25,000 tokens for reasoning and outputs when you start experimenting with these models.
Reasoning Summaries
You can view a summary of the model’s reasoning using thesummary parameter inside the reasoning object. Different models support different reasoning summary settings.
To access the most detailed summarizer available for a model, set the value of this parameter to auto. auto will be equivalent to detailed for most reasoning models today, but there may be more granular settings in the future.
| Value | Description |
|---|---|
"auto" | Uses the most detailed summarizer available for the model (recommended) |
"detailed" | Full step-by-step reasoning summary |
"concise" | A shorter, high-level summary of the reasoning process |
summary array in the reasoning output item. This output will not be included unless you explicitly opt in by setting the summary field.
output array with both a reasoning summary and the assistant message:
Tool Calling
Use tools with reasoning models
Best Practices
Preserve reasoning signatures across turns and avoid common errors
Interleaved Thinking
Thinking between tool calls on the Messages API