Provider Routing - BLACKBOX AI

BLACKBOX AI routes requests to the best available providers for your model. By default, requests are load balanced across top providers to maximize uptime. You can customize routing behavior using the provider object in your request body.

Available Providers

Common provider slugs you can use with order, only, and ignore:

Provider	Slug
Anthropic	`anthropic`
OpenAI	`openai`
Azure	`azure`
Google	`google`
Together AI	`together`
DeepInfra	`deepinfra`
Fireworks AI	`fireworks`
Groq	`groq`
AWS Bedrock	`bedrock`
Mistral	`mistral`

Provider availability varies by model. Not all providers host all models. If you specify a provider that doesn’t host your requested model, it will be skipped.

Provider Object

The provider object can contain the following fields:

Field	Type	Default	Description
`sort`	string \| object	-	Sort providers by `"price"`, `"throughput"`, or `"latency"`
`order`	string[]	-	List of provider slugs to try in order (e.g., `["anthropic", "openai"]`)
`only`	string[]	-	List of provider slugs to allow for this request
`ignore`	string[]	-	List of provider slugs to skip for this request
`allow_fallbacks`	boolean	`true`	Whether to allow backup providers when the primary is unavailable
`require_parameters`	boolean	`false`	Only use providers that support all parameters in your request
`data_collection`	”allow” \| “deny"	"allow”	Control whether to use providers that may store data
`quantizations`	string[]	-	List of quantization levels to filter by (e.g., `["int4", "int8"]`)
`preferred_min_throughput`	number \| object	-	Preferred minimum throughput (tokens/sec)
`preferred_max_latency`	number \| object	-	Preferred maximum latency (seconds)

Provider Sorting

Control how providers are prioritized for your request. By default, BLACKBOX AI load balances based on price while accounting for uptime.

Sort by Price

Route to the lowest-cost provider:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "price"
    }
  }'

Sort by Throughput

Route to the highest-throughput provider for faster token generation:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "throughput"
    }
  }'

Sort by Latency

Route to the lowest-latency provider for faster time-to-first-token:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "latency"
    }
  }'

Ordering Specific Providers

Use the order field to specify which providers to try first, in order of preference:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "order": ["together", "deepinfra", "fireworks"]
    }
  }'

The router will try providers in the specified order. If none are available, it will fall back to other providers unless fallbacks are disabled.

Allowing Only Specific Providers

Use the only field to restrict requests to specific providers:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/anthropic/claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "only": ["anthropic"]
    }
  }'

Restricting to specific providers may reduce fallback options and limit request recovery if the specified provider is unavailable.

Ignoring Providers

Use the ignore field to exclude specific providers from routing:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "ignore": ["azure", "aws"]
    }
  }'

Disabling Fallbacks

By default, if your preferred provider fails, BLACKBOX AI will try other available providers. To disable this behavior:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "order": ["together"],
      "allow_fallbacks": false
    }
  }'

With allow_fallbacks: false, if the specified provider fails, the request will return an error instead of trying other providers.

Data Collection Policy

Control whether to use providers that may store or train on your data:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "data_collection": "deny"
    }
  }'

"allow" (default): Allow providers that may store data non-transiently
"deny": Only use providers that do not collect user data

Requiring Parameter Support

Ensure your request only goes to providers that support all specified parameters:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "response_format": {"type": "json_object"},
    "provider": {
      "require_parameters": true
    }
  }'

This is useful when using features like JSON mode or specific sampling parameters that not all providers support.

Performance Thresholds

Set minimum throughput or maximum latency preferences to filter providers based on performance:

Minimum Throughput

Prefer providers with at least a certain throughput (tokens per second):

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "preferred_min_throughput": {
        "p90": 50
      }
    }
  }'

Maximum Latency

Prefer providers with latency below a certain threshold (in seconds):

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "preferred_max_latency": {
        "p90": 3
      }
    }
  }'

Percentile Options

Performance thresholds support the following percentile cutoffs:

p50 - Median performance (50% of requests perform better)
p75 - 75th percentile
p90 - 90th percentile (recommended for most use cases)
p99 - 99th percentile (strictest)

Performance thresholds are preferences, not hard requirements. Providers that don’t meet the threshold are deprioritized but not excluded entirely.

Quantization Filtering

Filter providers by model quantization level:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.1-8b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "quantizations": ["fp8", "fp16"]
    }
  }'

Available Quantization Levels

int4 - Integer 4-bit
int8 - Integer 8-bit
fp4 - Floating point 4-bit
fp6 - Floating point 6-bit
fp8 - Floating point 8-bit
fp16 - Floating point 16-bit
bf16 - Brain floating point 16-bit
fp32 - Floating point 32-bit

Quantized models may exhibit degraded performance for certain prompts. Lower quantization levels reduce memory requirements but may affect output quality.

Combining Options

You can combine multiple provider options for fine-grained control:

curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "price",
      "ignore": ["azure"],
      "data_collection": "deny",
      "preferred_min_throughput": {"p90": 30}
    }
  }'

This example:

Sorts providers by price (lowest first)
Excludes Azure from consideration
Only uses providers that don’t collect data
Prefers providers with at least 30 tokens/sec throughput at p90

Response Provider Field

The response includes a provider field indicating which provider served the request:

{
  "id": "gen-...",
  "model": "meta-llama/llama-3.3-70b-instruct",
  "choices": [...],
  "usage": {...},
  "provider": "Together"
}

This helps you track which provider was used, especially useful when using load balancing or fallbacks.

Endpoints

​Available Providers

​Provider Object

​Provider Sorting

​Sort by Price

​Sort by Throughput

​Sort by Latency

​Ordering Specific Providers

​Allowing Only Specific Providers

​Ignoring Providers

​Disabling Fallbacks

​Data Collection Policy

​Requiring Parameter Support

​Performance Thresholds

​Minimum Throughput

​Maximum Latency

​Percentile Options

​Quantization Filtering

​Available Quantization Levels

​Combining Options

​Response Provider Field

Available Providers

Provider Object

Provider Sorting

Sort by Price

Sort by Throughput

Sort by Latency

Ordering Specific Providers

Allowing Only Specific Providers

Ignoring Providers

Disabling Fallbacks

Data Collection Policy

Requiring Parameter Support

Performance Thresholds

Minimum Throughput

Maximum Latency

Percentile Options

Quantization Filtering

Available Quantization Levels

Combining Options

Response Provider Field