> ## Documentation Index
> Fetch the complete documentation index at: https://docs.blackbox.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Reasoning and Interleaved Thinking

> Enable advanced reasoning capabilities and interleaved thinking for sophisticated AI responses with transparent reasoning steps.

For models that support it, the BLACKBOX AI API can return Reasoning Tokens, also known as thinking tokens. BLACKBOX AI normalizes the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified interface across different providers.

Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.

Reasoning tokens are included in the response by default if the model decides to output them. Reasoning tokens will appear in the reasoning field of each message, unless you decide to exclude them.

<Note>
  Some reasoning models do not return their reasoning tokens. While most models and providers make reasoning tokens available in the response, some (like the OpenAI o-series) do not.
</Note>

<Warning>
  **Important**: Interleaved thinking increases token usage and response latency. Consider your budget and performance requirements when enabling this feature.
</Warning>

## Controlling Reasoning Tokens

You can control reasoning tokens in your requests using the `reasoning` parameter:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
{
  "model": "your-model",
  "messages": [],
  "reasoning": {
    // One of the following (not both):
    "effort": "high", // Can be "xhigh", "high", "medium", "low", "minimal" or "none" (OpenAI-style)
    "max_tokens": 2000, // Specific token limit (Anthropic-style)
    // Optional: Default is false. All models support this.
    "exclude": false, // Set to true to exclude reasoning tokens from response
    // Or enable reasoning with the default parameters:
    "enabled": true // Default: inferred from `effort` or `max_tokens`
  }
}
```

The reasoning config object consolidates settings for controlling reasoning strength across different models. See the Note for each option below to see which models are supported and how other models will behave.

## Max Tokens for Reasoning

**Supported models**
Currently supported by:

* Gemini thinking models
* Anthropic reasoning models (by using the reasoning.max\_tokens parameter)
* Some Alibaba Qwen thinking models (mapped to thinking\_budget)

<Note>
  For Alibaba, support varies by model — please check the individual model descriptions to confirm whether reasoning.max\_tokens (via thinking\_budget) is available.
</Note>

For models that support reasoning token allocation, you can control it like this:

* `"max_tokens": 2000` - Directly specifies the maximum number of tokens to use for reasoning

For models that only support reasoning.effort (see below), the max\_tokens value will be used to determine the effort level.

## Reasoning Effort Level

**Supported models**
Currently supported by OpenAI reasoning models (o1 series, o3 series, GPT-5 series) and Grok models

* `"effort": "xhigh"` - Allocates the largest portion of tokens for reasoning (approximately 95% of max\_tokens)
* `"effort": "high"` - Allocates a large portion of tokens for reasoning (approximately 80% of max\_tokens)
* `"effort": "medium"` - Allocates a moderate portion of tokens (approximately 50% of max\_tokens)
* `"effort": "low"` - Allocates a smaller portion of tokens (approximately 20% of max\_tokens)
* `"effort": "minimal"` - Allocates an even smaller portion of tokens (approximately 10% of max\_tokens)
* `"effort": "none"` - Disables reasoning entirely

For models that only support reasoning.max\_tokens, the effort level will be set based on the percentages above.

## Excluding Reasoning Tokens

If you want the model to use reasoning internally but not include it in the response:

* `"exclude": true` - The model will still use reasoning, but it won't be returned in the response

Reasoning tokens will appear in the reasoning field of each message.

## Enable Reasoning with Default Config

To enable reasoning with the default parameters:

* `"enabled": true` - Enables reasoning at the "medium" effort level with no exclusions.

## Examples

### Basic Usage with Reasoning Tokens

<CodeGroup>
  ```python Python (OpenAI SDK) theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  from openai import OpenAI
  client = OpenAI(
      # Public api users: use https://api.blackbox.ai/chat/completions
      base_url="https://enterprise.blackbox.ai/chat/completions",
      api_key="<BLACKBOX_API_KEY>",
  )
  response = client.chat.completions.create(
      model="blackboxai/openai/o3-mini",
      messages=[
          {"role": "user", "content": "How would you build the world's tallest skyscraper?"}
      ],
      extra_body={
          "reasoning": {
              "effort": "high"
          }
      },
  )
  msg = response.choices[0].message
  print(getattr(msg, "reasoning", None))
  ```

  ```typescript TypeScript SDK theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: 'blackboxai/openai/o3-mini',
    messages: [
      { role: 'user', content: 'How would you build the world\'s tallest skyscraper?' }
    ],
    // @ts-ignore
    reasoning: {
      effort: 'high'
    }
  });

  const msg = response.choices[0].message;
  console.log(msg.reasoning);
  ```

  ```typescript TypeScript (OpenAI SDK) theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: 'blackboxai/openai/o3-mini',
    messages: [
      { role: 'user', content: 'How would you build the world\'s tallest skyscraper?' }
    ],
    // @ts-ignore
    extra_body: {
      reasoning: {
        effort: 'high'
      }
    }
  });

  const msg = response.choices[0].message;
  console.log(msg.reasoning);
  ```
</CodeGroup>

### Using Max Tokens for Reasoning

For models that support direct token allocation (like Anthropic models), you can specify the exact number of tokens to use for reasoning:

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  from openai import OpenAI
  client = OpenAI(
      # Public api users: use https://api.blackbox.ai/chat/completions
      base_url="https://enterprise.blackbox.ai/chat/completions",
      api_key="<BLACKBOX_API_KEY>",
  )
  response = client.chat.completions.create(
      model="blackboxai/anthropic/claude-sonnet-4.5",
      messages=[
          {"role": "user", "content": "What's the most efficient algorithm for sorting a large dataset?"}
      ],
      extra_body={
          "reasoning": {
              "max_tokens": 2000
          }
      },
  )
  msg = response.choices[0].message
  print(getattr(msg, "reasoning", None))
  print(getattr(msg, "content", None))
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: 'blackboxai/anthropic/claude-sonnet-4.5',
    messages: [
      { role: 'user', content: 'What\'s the most efficient algorithm for sorting a large dataset?' }
    ],
    // @ts-ignore
    reasoning: {
      max_tokens: 2000
    }
  });

  const msg = response.choices[0].message;
  console.log(msg.reasoning);
  console.log(msg.content);
  ```
</CodeGroup>

### Excluding Reasoning Tokens from Response

If you want the model to use reasoning internally but not include it in the response:

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  from openai import OpenAI
  client = OpenAI(
      # Public api users: use https://api.blackbox.ai/chat/completions
      base_url="https://enterprise.blackbox.ai/chat/completions",
      api_key="<BLACKBOX_API_KEY>",
  )
  response = client.chat.completions.create(
      model="blackboxai/deepseek/deepseek-r1",
      messages=[
          {"role": "user", "content": "Explain quantum computing in simple terms."}
      ],
      extra_body={
          "reasoning": {
              "effort": "high",
              "exclude": True
          }
      },
  )
  msg = response.choices[0].message
  print(getattr(msg, "content", None))
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: 'blackboxai/deepseek/deepseek-r1',
    messages: [
      { role: 'user', content: 'Explain quantum computing in simple terms.' }
    ],
    // @ts-ignore
    reasoning: {
      effort: 'high',
      exclude: true
    }
  });

  const msg = response.choices[0].message;
  console.log(msg.content);
  ```
</CodeGroup>

### Advanced Usage: Reasoning Chain-of-Thought

This example shows how to use reasoning tokens in a more complex workflow. It injects one model's reasoning into another model to improve its response quality:

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  from openai import OpenAI
  client = OpenAI(
      # Public api users: use https://api.blackbox.ai/chat/completions
      base_url="https://enterprise.blackbox.ai/chat/completions",
      api_key="<BLACKBOX_API_KEY>",
  )

  question = "Which is bigger: 9.11 or 9.9?"

  def do_req(model: str, content: str, reasoning_config: dict | None = None):
      payload = {
          "model": model,
          "messages": [{"role": "user", "content": content}],
          "stop": "</think>",
      }
      if reasoning_config:
          payload.update(reasoning_config)
      return client.chat.completions.create(**payload)

  # Get reasoning from a capable model
  content = f"{question} Please think this through, but don't output an answer"
  reasoning_response = do_req("blackboxai/deepseek/deepseek-r1", content)
  reasoning = getattr(reasoning_response.choices[0].message, "reasoning", "")

  # Let's test! Here's the naive response:
  simple_response = do_req("blackboxai/openai/gpt-4o-mini", question)
  print(getattr(simple_response.choices[0].message, "content", None))

  # Here's the response with the reasoning token injected:
  content = f"{question}. Here is some context to help you: {reasoning}"
  smart_response = do_req("blackboxai/openai/gpt-4o-mini", content)
  print(getattr(smart_response.choices[0].message, "content", None))
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  const question = "Which is bigger: 9.11 or 9.9?";

  async function doReq(model: string, content: string, reasoningConfig?: any) {
    const payload: any = {
      model,
      messages: [{ role: 'user', content }],
      stop: '</think>',
    };

    if (reasoningConfig) {
      Object.assign(payload, reasoningConfig);
    }

    return client.chat.completions.create(payload);
  }

  // Get reasoning from a capable model
  const content = `${question} Please think this through, but don't output an answer`;
  const reasoningResponse = await doReq('blackboxai/deepseek/deepseek-r1', content);
  const reasoning = reasoningResponse.choices[0].message.reasoning || '';

  // Let's test! Here's the naive response:
  const simpleResponse = await doReq('blackboxai/openai/gpt-4o-mini', question);
  console.log(simpleResponse.choices[0].message.content);

  // Here's the response with the reasoning token injected:
  const enhancedContent = `${question}. Here is some context to help you: ${reasoning}`;
  const smartResponse = await doReq('blackboxai/openai/gpt-4o-mini', enhancedContent);
  console.log(smartResponse.choices[0].message.content);
  ```
</CodeGroup>

## Preserving Reasoning

<Tip>
  See [API Best Practices](/api-reference/best-practices#preserving-reasoning-blocks-in-multi-turn-requests) for how to correctly pass reasoning blocks back across multi-turn tool calling conversations without signature errors.
</Tip>

To preserve reasoning context across multiple turns, you can pass it back to the API in one of two ways:

* `message.reasoning` (string): Pass the plaintext reasoning as a string field on the assistant message
* `message.reasoning_details` (array): Pass the full reasoning\_details block

Use `reasoning_details` when working with models that return special reasoning types (such as encrypted or summarized) - this preserves the full structure needed for those models.

For models that only return raw reasoning strings, you can use the simpler `reasoning` field. You can also use `reasoning_content` as an alias - it functions identically to `reasoning`.

### Model Support

Preserving reasoning is currently supported by these proprietary models:

* All OpenAI reasoning models (o1 series, o3 series, GPT-5 series and newer)
* All Anthropic reasoning models (Claude 3.7 series and newer)
* All Gemini Reasoning models
* All xAI reasoning models

And these open source models:

* MiniMax M2 / M2.1
* Kimi K2 Thinking / K2.5
* INTELLECT-3
* Nemotron 3 Nano
* MiMo-V2-Flash
* All Z.ai reasoning models (GLM 4.5 series and newer)

<Note>
  Note: standard interleaved thinking only. The preserved thinking feature for Z.ai models is currently not supported.
</Note>

The `reasoning_details` functionality works identically across all supported reasoning models. You can easily switch between OpenAI reasoning models (like `blackboxai/openai/gpt-5.2`) and Anthropic reasoning models (like `blackboxai/anthropic/claude-sonnet-4.5`) without changing your code structure.

Preserving reasoning blocks is useful specifically for tool calling. When models like Claude invoke tools, it is pausing its construction of a response to await external information. When tool results are returned, the model will continue building that existing response. This necessitates preserving reasoning blocks during tool use, for a couple of reasons:

* **Reasoning continuity**: The reasoning blocks capture the model's step-by-step reasoning that led to tool requests. When you post tool results, including the original reasoning ensures the model can continue its reasoning from where it left off.

* **Context maintenance**: While tool results appear as user messages in the API structure, they're part of a continuous reasoning flow. Preserving reasoning blocks maintains this conceptual flow across multiple API calls.

<Warning>
  **Important for Reasoning Models**: When providing reasoning\_details blocks, the entire sequence of consecutive reasoning blocks must match the outputs generated by the model during the original request; you cannot rearrange or modify the sequence of these blocks.
</Warning>

### Example: Preserving Reasoning Blocks with Tool Calls

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  from openai import OpenAI
  client = OpenAI(
      # Public api users: use https://api.blackbox.ai/chat/completions
      base_url="https://enterprise.blackbox.ai/chat/completions",
      api_key="<BLACKBOX_API_KEY>",
  )

  # Define tools once and reuse
  tools = [{
      "type": "function",
      "function": {
          "name": "get_weather",
          "description": "Get current weather",
          "parameters": {
              "type": "object",
              "properties": {
                  "location": {"type": "string"}
              },
              "required": ["location"]
          }
      }
  }]

  # First API call with tools
  # Note: You can use 'blackboxai/openai/gpt-5.2' instead of 'blackboxai/anthropic/claude-sonnet-4.5' - they're completely interchangeable
  response = client.chat.completions.create(
      model="blackboxai/anthropic/claude-sonnet-4.5",
      messages=[
          {"role": "user", "content": "What's the weather like in Boston? Then recommend what to wear."}
      ],
      tools=tools,
      extra_body={"reasoning": {"max_tokens": 2000}}
  )

  # Extract the assistant message with reasoning_details
  message = response.choices[0].message

  # Preserve the complete reasoning_details when passing back
  messages = [
      {"role": "user", "content": "What's the weather like in Boston? Then recommend what to wear."},
      {
          "role": "assistant",
          "content": message.content,
          "tool_calls": message.tool_calls,
          "reasoning_details": message.reasoning_details  # Pass back unmodified
      },
      {
          "role": "tool",
          "tool_call_id": message.tool_calls[0].id,
          "content": '{"temperature": 45, "condition": "rainy", "humidity": 85}'
      }
  ]

  # Second API call - Claude continues reasoning from where it left off
  response2 = client.chat.completions.create(
      model="blackboxai/anthropic/claude-sonnet-4.5",
      messages=messages,  # Includes preserved thinking blocks
      tools=tools
  )
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  // Define tools once and reuse
  const tools = [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' }
        },
        required: ['location']
      }
    }
  }];

  // First API call with tools
  const response = await client.chat.completions.create({
    model: 'blackboxai/anthropic/claude-sonnet-4.5',
    messages: [
      { role: 'user', content: 'What\'s the weather like in Boston? Then recommend what to wear.' }
    ],
    tools,
    // @ts-ignore
    reasoning: { max_tokens: 2000 }
  });

  // Extract the assistant message with reasoning_details
  const message = response.choices[0].message;

  // Preserve the complete reasoning_details when passing back
  const messages = [
    { role: 'user', content: 'What\'s the weather like in Boston? Then recommend what to wear.' },
    {
      role: 'assistant',
      content: message.content,
      tool_calls: message.tool_calls,
      reasoning_details: message.reasoning_details  // Pass back unmodified
    },
    {
      role: 'tool',
      tool_call_id: message.tool_calls[0].id,
      content: '{"temperature": 45, "condition": "rainy", "humidity": 85}'
    }
  ];

  // Second API call - Claude continues reasoning from where it left off
  const response2 = await client.chat.completions.create({
    model: 'blackboxai/anthropic/claude-sonnet-4.5',
    messages,  // Includes preserved thinking blocks
    tools
  });
  ```
</CodeGroup>

For more detailed information about thinking encryption, redacted blocks, and advanced use cases, see [Anthropic's documentation on extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/tool-use#extended-thinking).

For more information about OpenAI reasoning models, see [OpenAI's reasoning documentation](https://platform.openai.com/docs/guides/reasoning).

## Reasoning Details API Shape

When reasoning models generate responses, the reasoning information is structured in a standardized format through the `reasoning_details` array. This section documents the API response structure for reasoning details in both streaming and non-streaming responses.

### reasoning\_details Array Structure

The `reasoning_details` field contains an array of reasoning detail objects. Each object in the array represents a specific piece of reasoning information and follows one of three possible types. The location of this array differs between streaming and non-streaming responses.

* **Non-streaming responses**: `reasoning_details` appears in `choices[].message.reasoning_details`
* **Streaming responses**: `reasoning_details` appears in `choices[].delta.reasoning_details` for each chunk

### Common Fields

All reasoning detail objects share these common fields:

* `id` (string | null): Unique identifier for the reasoning detail
* `format` (string): The format of the reasoning detail, with possible values:
  * `"unknown"` - Format is not specified
  * `"openai-responses-v1"` - OpenAI responses format version 1
  * `"xai-responses-v1"` - xAI responses format version 1
  * `"anthropic-claude-v1"` - Anthropic Claude format version 1 (default)
* `index` (number, optional): Sequential index of the reasoning detail

### Reasoning Detail Types

#### 1. Summary Type (reasoning.summary)

Contains a high-level summary of the reasoning process:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
{
  "type": "reasoning.summary",
  "summary": "The model analyzed the problem by first identifying key constraints, then evaluating possible solutions...",
  "id": "reasoning-summary-1",
  "format": "anthropic-claude-v1",
  "index": 0
}
```

#### 2. Encrypted Type (reasoning.encrypted)

Contains encrypted reasoning data that may be redacted or protected:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
{
  "type": "reasoning.encrypted",
  "data": "eyJlbmNyeXB0ZWQiOiJ0cnVlIiwiY29udGVudCI6IltSRURBQ1RFRF0ifQ==",
  "id": "reasoning-encrypted-1",
  "format": "anthropic-claude-v1",
  "index": 1
}
```

#### 3. Text Type (reasoning.text)

Contains raw text reasoning with optional signature verification:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
{
  "type": "reasoning.text",
  "text": "Let me think through this step by step:\n1. First, I need to understand the user's question...",
  "signature": "sha256:abc123def456...",
  "id": "reasoning-text-1",
  "format": "anthropic-claude-v1",
  "index": 2
}
```

### Response Examples

#### Non-Streaming Response

In non-streaming responses, `reasoning_details` appears in the message:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on my analysis, I recommend the following approach...",
        "reasoning_details": [
          {
            "type": "reasoning.summary",
            "summary": "Analyzed the problem by breaking it into components",
            "id": "reasoning-summary-1",
            "format": "anthropic-claude-v1",
            "index": 0
          },
          {
            "type": "reasoning.text",
            "text": "Let me work through this systematically:\n1. First consideration...\n2. Second consideration...",
            "signature": null,
            "id": "reasoning-text-1",
            "format": "anthropic-claude-v1",
            "index": 1
          }
        ]
      }
    }
  ]
}
```

#### Streaming Response

In streaming responses, `reasoning_details` appears in delta chunks as the reasoning is generated:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
{
  "choices": [
    {
      "delta": {
        "reasoning_details": [
          {
            "type": "reasoning.text",
            "text": "Let me think about this step by step...",
            "signature": null,
            "id": "reasoning-text-1",
            "format": "anthropic-claude-v1",
            "index": 0
          }
        ]
      }
    }
  ]
}
```

#### Streaming Behavior Notes:

* Each reasoning detail chunk is sent as it becomes available
* The `reasoning_details` array in each chunk may contain one or more reasoning objects
* For encrypted reasoning, the content may appear as `[REDACTED]` in streaming responses
* The complete reasoning sequence is built by concatenating all chunks in order

## Legacy Parameters

For backward compatibility, BLACKBOX AI still supports the following legacy parameters:

* `include_reasoning: true` - Equivalent to `reasoning: {}`
* `include_reasoning: false` - Equivalent to `reasoning: { exclude: true }`

However, we recommend using the new unified `reasoning` parameter for better control and future compatibility.

## Provider-Specific Reasoning Implementation

### Anthropic Models with Reasoning Tokens

The latest Claude models, such as `blackboxai/anthropic/claude-3.7-sonnet`, support working with and returning reasoning tokens.

You can enable reasoning on Anthropic models only using the unified reasoning parameter with either `effort` or `max_tokens`.

<Note />

#### Reasoning Max Tokens for Anthropic Models

When using Anthropic models with reasoning:

* When using the `reasoning.max_tokens` parameter, that value is used directly with a minimum of 1024 tokens.
* When using the `reasoning.effort` parameter, the budget\_tokens are calculated based on the `max_tokens` value.

The reasoning token allocation is capped at 128,000 tokens maximum and 1024 tokens minimum. The formula for calculating the budget\_tokens is: `budget_tokens = max(min(max_tokens * {effort_ratio}, 128000), 1024)`

`effort_ratio` is 0.95 for xhigh effort, 0.8 for high effort, 0.5 for medium effort, 0.2 for low effort, and 0.1 for minimal effort.

<Warning>
  **Important**: max\_tokens must be strictly higher than the reasoning budget to ensure there are tokens available for the final response after thinking.
</Warning>

#### Example: Streaming with Anthropic Reasoning Tokens

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  from openai import OpenAI
  client = OpenAI(
      # Public api users: use https://api.blackbox.ai/chat/completions
      base_url="https://enterprise.blackbox.ai/chat/completions",
      api_key="<BLACKBOX_API_KEY>",
  )

  def chat_completion_with_reasoning(messages):
      response = client.chat.completions.create(
          model="blackboxai/anthropic/claude-3.7-sonnet",
          messages=messages,
          max_tokens=10000,
          extra_body={
              "reasoning": {
                  "max_tokens": 8000
              }
          },
          stream=True
      )
      return response

  for chunk in chat_completion_with_reasoning([
      {"role": "user", "content": "What's bigger, 9.9 or 9.11?"}
  ]):
      if hasattr(chunk.choices[0].delta, 'reasoning_details') and chunk.choices[0].delta.reasoning_details:
          print(f"REASONING_DETAILS: {chunk.choices[0].delta.reasoning_details}")
      elif getattr(chunk.choices[0].delta, 'content', None):
          print(f"CONTENT: {chunk.choices[0].delta.content}")
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  function chatCompletionWithReasoning(messages: any[]) {
    return client.chat.completions.create({
      model: 'blackboxai/anthropic/claude-3.7-sonnet',
      messages,
      max_tokens: 10000,
      // @ts-ignore
      reasoning: {
        max_tokens: 8000
      },
      stream: true
    });
  }

  const stream = await chatCompletionWithReasoning([
    { role: 'user', content: 'What\'s bigger, 9.9 or 9.11?' }
  ]);

  for await (const chunk of stream) {
    if (chunk.choices[0].delta.reasoning_details) {
      console.log(`REASONING_DETAILS: ${JSON.stringify(chunk.choices[0].delta.reasoning_details)}`);
    } else if (chunk.choices[0].delta.content) {
      console.log(`CONTENT: ${chunk.choices[0].delta.content}`);
    }
  }
  ```
</CodeGroup>

### Google Gemini 3 Models with Thinking Levels

Gemini 3 models (such as `blackboxai/google/gemini-3-pro-preview` and `blackboxai/google/gemini-3-flash-preview`) use Google's `thinkingLevel` API instead of the older `thinkingBudget` API used by Gemini 2.5 models.

BLACKBOX AI maps the `reasoning.effort` parameter directly to Google's `thinkingLevel` values:

| BLACKBOX AI reasoning.effort | Google thinkingLevel |
| ---------------------------- | -------------------- |
| "minimal"                    | "minimal"            |
| "low"                        | "low"                |
| "medium"                     | "medium"             |
| "high"                       | "high"               |
| "xhigh"                      | "high" (mapped down) |

#### Token Consumption is Determined by Google

When using `thinkingLevel`, the actual number of reasoning tokens consumed is determined internally by Google. There are no publicly documented token limit breakpoints for each level. For example, setting `effort: "low"` might result in several hundred reasoning tokens depending on the complexity of the task. This is expected behavior and reflects how Google implements thinking levels internally.

If a model doesn't support a specific effort level (for example, if a model only supports low and high), BLACKBOX AI will map your requested effort to the nearest supported level.

#### Using max\_tokens with Gemini 3

If you specify `reasoning.max_tokens` explicitly, BLACKBOX AI will pass it through as `thinkingBudget` to Google's API. However, for Gemini 3 models, Google internally maps this budget value to a `thinkingLevel`, so you will not get precise token control. The actual token consumption is still determined by Google's `thinkingLevel` implementation, not by the specific budget value you provide.

#### Example: Using Thinking Levels with Gemini 3

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  from openai import OpenAI
  client = OpenAI(
      # Public api users: use https://api.blackbox.ai/chat/completions
      base_url="https://enterprise.blackbox.ai/chat/completions",
      api_key="<BLACKBOX_API_KEY>",
  )

  response = client.chat.completions.create(
      model="blackboxai/google/gemini-3-pro-preview",
      messages=[
          {"role": "user", "content": "Explain the implications of quantum entanglement."}
      ],
      extra_body={
          "reasoning": {
              "effort": "low"  # Maps to thinkingLevel: "low"
          }
      },
  )

  msg = response.choices[0].message
  print(getattr(msg, "reasoning", None))
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    // Public api users: use https://api.blackbox.ai/chat/completions
    baseURL: 'https://enterprise.blackbox.ai/chat/completions',
    apiKey: process.env.BLACKBOX_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: 'blackboxai/google/gemini-3-pro-preview',
    messages: [
      { role: 'user', content: 'Explain the implications of quantum entanglement.' }
    ],
    // @ts-ignore
    reasoning: {
      effort: 'low'  // Maps to thinkingLevel: "low"
    }
  });

  const msg = response.choices[0].message;
  console.log(msg.reasoning);
  ```
</CodeGroup>

## Reasoning with the Responses API

**Reasoning models** like `blackboxai/openai/gpt-5.3-codex` are LLMs trained with reinforcement learning to perform reasoning. They think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel at complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows.

### Get Started with Reasoning

Call the Responses API and specify your reasoning model and reasoning effort:

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  # Public api users: use https://api.blackbox.ai/v1/responses
  curl --location 'https://enterprise.blackbox.ai/v1/responses' \
    --header 'Authorization: Bearer YOUR_BLACKBOX_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
      "model": "blackboxai/openai/gpt-5.3-codex",
      "store": false,
      "stream": false,
      "reasoning": {
        "effort": "medium"
      },
      "input": [{
        "role": "user",
        "content": "Write a bash script that takes a matrix represented as a string with format [1,2],[3,4],[5,6] and prints the transpose in the same format."
      }]
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import requests

  # Public api users: use https://api.blackbox.ai/v1/responses
  response = requests.post(
      'https://enterprise.blackbox.ai/v1/responses',
      headers={
          'Authorization': 'Bearer YOUR_BLACKBOX_API_KEY',
          'Content-Type': 'application/json',
      },
      json={
          'model': 'blackboxai/openai/gpt-5.3-codex',
          'store': False,
          'stream': False,
          'reasoning': {
              'effort': 'medium'
          },
          'input': [{
              'role': 'user',
              'content': 'Write a bash script that takes a matrix represented as a string with format [1,2],[3,4],[5,6] and prints the transpose in the same format.'
          }]
      }
  )
  print(response.json())
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  // Public api users: use https://api.blackbox.ai/v1/responses
  const response = await fetch('https://enterprise.blackbox.ai/v1/responses', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_BLACKBOX_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'blackboxai/openai/gpt-5.3-codex',
      store: false,
      stream: false,
      reasoning: {
        effort: 'medium',
      },
      input: [{
        role: 'user',
        content: 'Write a bash script that takes a matrix represented as a string with format [1,2],[3,4],[5,6] and prints the transpose in the same format.',
      }],
    }),
  });
  const data = await response.json();
  console.log(data);
  ```
</CodeGroup>

The `reasoning.effort` parameter guides the model on how many reasoning tokens to generate before creating a response. The default value is `medium`.

| Value      | Description                                                                    |
| ---------- | ------------------------------------------------------------------------------ |
| `"none"`   | Disables reasoning entirely — no reasoning tokens are generated                |
| `"low"`    | Favors speed and economical token usage                                        |
| `"medium"` | Balanced between speed and reasoning accuracy (default)                        |
| `"high"`   | Favors more complete reasoning for complex tasks                               |
| `"xhigh"`  | Maximum reasoning depth — allocates the largest portion of tokens for thinking |

### How Reasoning Works

Reasoning models introduce **reasoning tokens** in addition to input and output tokens. The models use these reasoning tokens to "think," breaking down the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens and discards the reasoning tokens from its context.

<Warning>
  While reasoning tokens are not visible via the API, they still occupy space in the model's context window and are billed as output tokens.
</Warning>

#### Managing the Context Window

It's important to ensure there's enough space in the context window for reasoning tokens when creating responses. Depending on the problem's complexity, the models may generate anywhere from a few hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used is visible in the `usage` object of the response, under `output_tokens_details`:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
{
  "usage": {
    "input_tokens": 75,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 1186,
    "output_tokens_details": {
      "reasoning_tokens": 1024
    },
    "total_tokens": 1261
  }
}
```

#### Allocating Space for Reasoning

If the generated tokens reach the context window limit or the `max_output_tokens` value you've set, you'll receive a response with a `status` of `incomplete` and `incomplete_details` with `reason` set to `max_output_tokens`. This might occur before any visible output tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.

To prevent this, ensure there's sufficient space in the context window or adjust the `max_output_tokens` value to a higher number. We recommend reserving at least **25,000 tokens** for reasoning and outputs when you start experimenting with these models.

<CodeGroup>
  ```bash cURL — Handling Incomplete Responses theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  # Public api users: use https://api.blackbox.ai/v1/responses
  curl --location 'https://enterprise.blackbox.ai/v1/responses' \
    --header 'Authorization: Bearer YOUR_BLACKBOX_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
      "model": "blackboxai/openai/gpt-5.3-codex",
      "store": false,
      "stream": false,
      "reasoning": {
        "effort": "medium"
      },
      "max_output_tokens": 300,
      "input": [{
        "role": "user",
        "content": "Write a bash script that takes a matrix represented as a string with format [1,2],[3,4],[5,6] and prints the transpose in the same format."
      }]
    }'
  ```

  ```python Python — Handling Incomplete Responses theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import requests

  # Public api users: use https://api.blackbox.ai/v1/responses
  response = requests.post(
      'https://enterprise.blackbox.ai/v1/responses',
      headers={
          'Authorization': 'Bearer YOUR_BLACKBOX_API_KEY',
          'Content-Type': 'application/json',
      },
      json={
          'model': 'blackboxai/openai/gpt-5.3-codex',
          'store': False,
          'stream': False,
          'reasoning': {
              'effort': 'medium'
          },
          'max_output_tokens': 300,
          'input': [{
              'role': 'user',
              'content': 'Write a bash script that takes a matrix represented as a string with format [1,2],[3,4],[5,6] and prints the transpose in the same format.'
          }]
      }
  )

  data = response.json()
  if data.get('status') == 'incomplete' and data.get('incomplete_details', {}).get('reason') == 'max_output_tokens':
      print('Ran out of tokens')
      msg = next((item for item in data.get('output', []) if item.get('type') == 'message'), None)
      text = next((part['text'] for part in (msg or {}).get('content', []) if part.get('type') == 'output_text'), None)
      if text:
          print('Partial output:', text)
      else:
          print('Ran out of tokens during reasoning')
  ```

  ```typescript TypeScript — Handling Incomplete Responses theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  // Public api users: use https://api.blackbox.ai/v1/responses
  const response = await fetch('https://enterprise.blackbox.ai/v1/responses', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_BLACKBOX_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'blackboxai/openai/gpt-5.3-codex',
      store: false,
      stream: false,
      reasoning: {
        effort: 'medium',
      },
      max_output_tokens: 300,
      input: [{
        role: 'user',
        content: 'Write a bash script that takes a matrix represented as a string with format [1,2],[3,4],[5,6] and prints the transpose in the same format.',
      }],
    }),
  });

  const data = await response.json();
  if (data.status === 'incomplete' && data.incomplete_details?.reason === 'max_output_tokens') {
    console.log('Ran out of tokens');
    const msg = data.output?.find((item: any) => item.type === 'message');
    const text = msg?.content?.find((part: any) => part.type === 'output_text')?.text;
    if (text) {
      console.log('Partial output:', text);
    } else {
      console.log('Ran out of tokens during reasoning');
    }
  }
  ```
</CodeGroup>

### Reasoning Summaries

You can view a summary of the model's reasoning using the `summary` parameter inside the `reasoning` object. Different models support different reasoning summary settings.

To access the most detailed summarizer available for a model, set the value of this parameter to `auto`. `auto` will be equivalent to `detailed` for most reasoning models today, but there may be more granular settings in the future.

| Value        | Description                                                             |
| ------------ | ----------------------------------------------------------------------- |
| `"auto"`     | Uses the most detailed summarizer available for the model (recommended) |
| `"detailed"` | Full step-by-step reasoning summary                                     |
| `"concise"`  | A shorter, high-level summary of the reasoning process                  |

Reasoning summary output is part of the `summary` array in the `reasoning` output item. This output will **not** be included unless you explicitly opt in by setting the `summary` field.

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  # Public api users: use https://api.blackbox.ai/v1/responses
  curl --location 'https://enterprise.blackbox.ai/v1/responses' \
    --header 'Authorization: Bearer YOUR_BLACKBOX_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
      "model": "blackboxai/openai/gpt-5.3-codex",
      "store": false,
      "stream": false,
      "reasoning": {
        "effort": "low",
        "summary": "auto"
      },
      "input": [{
        "role": "user",
        "content": "What is the capital of France?"
      }]
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  import requests

  # Public api users: use https://api.blackbox.ai/v1/responses
  response = requests.post(
      'https://enterprise.blackbox.ai/v1/responses',
      headers={
          'Authorization': 'Bearer YOUR_BLACKBOX_API_KEY',
          'Content-Type': 'application/json',
      },
      json={
          'model': 'blackboxai/openai/gpt-5.3-codex',
          'store': False,
          'stream': False,
          'reasoning': {
              'effort': 'low',
              'summary': 'auto'
          },
          'input': [{
              'role': 'user',
              'content': 'What is the capital of France?'
          }]
      }
  )
  print(response.json())
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
  // Public api users: use https://api.blackbox.ai/v1/responses
  const response = await fetch('https://enterprise.blackbox.ai/v1/responses', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_BLACKBOX_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'blackboxai/openai/gpt-5.3-codex',
      store: false,
      stream: false,
      reasoning: {
        effort: 'low',
        summary: 'auto',
      },
      input: [{
        role: 'user',
        content: 'What is the capital of France?',
      }],
    }),
  });
  const data = await response.json();
  console.log(data);
  ```
</CodeGroup>

This API request will return an `output` array with both a reasoning summary and the assistant message:

```json theme={"theme":{"light":"github-light-default","dark":"github-dark-default"}}
[
  {
    "id": "rs_abc123",
    "type": "reasoning",
    "summary": [
      {
        "type": "summary_text",
        "text": "**Answering a simple question**\n\nThe capital of France is Paris — a well-known fact. I'll keep the answer brief and direct."
      }
    ]
  },
  {
    "id": "msg_abc456",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "The capital of France is Paris.",
        "annotations": []
      }
    ]
  }
]
```

<CardGroup cols={3}>
  <Card title="Tool Calling" icon="wrench" href="/api-reference/tool-calling">
    Use tools with reasoning models
  </Card>

  <Card title="Best Practices" icon="shield-check" href="/api-reference/best-practices#preserving-reasoning-blocks-in-multi-turn-requests">
    Preserve reasoning signatures across turns and avoid common errors
  </Card>

  <Card title="Interleaved Thinking" icon="brain" href="/api-reference/messages/interleaved-thinking">
    Thinking between tool calls on the Messages API
  </Card>
</CardGroup>
