> ## Documentation Index
> Fetch the complete documentation index at: https://docs.blackbox.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Parameters

> Learn about all available parameters for BLACKBOX AI API requests. Configure temperature, max tokens, top_p, and other model-specific settings.

Sampling parameters shape the token generation process of the model. You may send any parameters from the following list, as well as others, to BLACKBOX AI.

BLACKBOX AI will default to the values listed below if certain parameters are absent from your request (for example, `temperature` to 1.0). We will also transmit some provider-specific parameters, such as `safe_prompt` for Mistral or `raw_mode` for Hyperbolic directly to the respective providers if specified.

## Temperature

* Key: `temperature`

* Optional, **float**, 0.0 to 2.0

* Default: 1.0

This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input.

## Top P

* Key: `top_p`

* Optional, **float**, 0.0 to 1.0

* Default: 1.0

This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.

## Top K

* Key: `top_k`

* Optional, **integer**, 0 or above

* Default: 0

This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.

## Frequency Penalty

* Key: `frequency_penalty`

* Optional, **float**, -2.0 to 2.0

* Default: 0.0

This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse.

## Presence Penalty

* Key: `presence_penalty`

* Optional, **float**, -2.0 to 2.0

* Default: 0.0

Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse.

## Repetition Penalty

* Key: `repetition_penalty`

* Optional, **float**, 0.0 to 2.0

* Default: 1.0

Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token's probability.

## Min P

* Key: `min_p`

* Optional, **float**, 0.0 to 1.0

* Default: 0.0

Represents the minimum probability for a token to be
considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option.

## Top A

* Key: `top_a`

* Optional, **float**, 0.0 to 1.0

* Default: 0.0

Consider only the top tokens with "sufficiently high" probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability.

## Seed

* Key: `seed`

* Optional, **integer**

If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed for some models.

## Max Tokens

* Key: `max_tokens`

* Optional, **integer**, 1 or above

This sets the upper limit for the number of tokens the model can generate in response. It won't produce more than this limit. The maximum value is the context length minus the prompt length.

## Logit Bias

* Key: `logit_bias`

* Optional, **map**

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

## Logprobs

* Key: `logprobs`

* Optional, **boolean**

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.

## Top Logprobs

* Key: `top_logprobs`

* Optional, **integer**

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

## Response Format

* Key: `response_format`

* Optional, **map**

Forces the model to produce specific output format. Setting to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON.

**Note**: when using JSON mode, you should also instruct the model to produce JSON yourself via a system or user message.

## Structured Outputs

* Key: `structured_outputs`

* Optional, **boolean**

If the model can return structured outputs using response\_format json\_schema.

## Stop

* Key: `stop`

* Optional, **array**

Stop generation immediately if the model encounter any token specified in the stop array.

## Tools

* Key: `tools`

* Optional, **array**

Tool calling parameter, following OpenAI's tool calling request shape. For non-OpenAI providers, it will be transformed accordingly. To learn more about tool calling, see the [Tool & Function calling](/api-reference/tool-calling)

## Tool Choice

* Key: `tool_choice`

* Optional, **array**

Controls which (if any) tool is called by the model. 'none' means the model will not call any tool and instead generates a message. 'auto' means the model can pick between generating a message or calling one or more tools. 'required' means the model must call one or more tools. Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool.

## Parallel Tool Calls

* Key: `parallel_tool_calls`

* Optional, **boolean**

* Default: **true**

Whether to enable parallel function calling during tool use. If true, the model can call multiple functions simultaneously. If false, functions will be called sequentially. Only applies when tools are provided.

## Reasoning

* Key: `reasoning`

* Optional, **object**

* Default: **disabled**

Controls reasoning token generation and behavior for models that support it. Reasoning tokens provide transparent insight into the model's thinking process and are charged as output tokens.

The reasoning object supports the following fields:

* `effort` (string): Controls reasoning intensity. Options: `"xhigh"`, `"high"`, `"medium"`, `"low"`, `"minimal"`, `"none"`. Supported by OpenAI reasoning models and Grok models.
* `max_tokens` (integer): Directly specifies maximum reasoning tokens. Supported by Anthropic, Gemini, and some Alibaba Qwen models.
* `exclude` (boolean): When `true`, reasoning is used internally but not returned in response. Default: `false`.
* `enabled` (boolean): Enables reasoning with default "medium" effort level.

For detailed information and examples, see the [Reasoning and Interleaved Thinking](/api-reference/reasoning) documentation.

## Verbosity

* Key: `verbosity`

* Optional, **enum** (low, medium, high)

* Default: **medium**

Controls the verbosity and length of the model response. Lower values produce more concise responses, while higher values produce more detailed and comprehensive responses.

## Reasoning

* Key: `reasoning`

* Optional, **object**

For models that support reasoning tokens (such as OpenAI o1, o3 series, GPT-5 series, and Anthropic Claude 3.7+), you can control the reasoning effort level to balance between response quality and token usage.

The `reasoning` object accepts the following property:

| Property | Type   | Description                                                                                            |
| -------- | ------ | ------------------------------------------------------------------------------------------------------ |
| `effort` | string | The reasoning effort level. One of: `"xhigh"`, `"high"`, `"medium"`, `"low"`, `"minimal"`, or `"none"` |

### Effort Levels

* `"xhigh"` - Maximum reasoning effort (approximately 95% of max\_tokens allocated to reasoning)
* `"high"` - High reasoning effort (approximately 80% of max\_tokens allocated to reasoning)
* `"medium"` - Moderate reasoning effort (approximately 50% of max\_tokens allocated to reasoning)
* `"low"` - Low reasoning effort (approximately 20% of max\_tokens allocated to reasoning)
* `"minimal"` - Minimal reasoning effort (approximately 10% of max\_tokens allocated to reasoning)
* `"none"` - Disable reasoning entirely

<Note>
  Higher effort levels will consume more tokens but may produce more thorough and accurate reasoning. The actual token allocation depends on your `max_tokens` setting. For models that don't support reasoning, this parameter will be ignored.
</Note>
