Skip to main content
BLACKBOX AI routes requests to the best available providers for your model. By default, requests are load balanced across top providers to maximize uptime. You can customize routing behavior using the provider object in your request body.

Available Providers

Common provider slugs you can use with order, only, and ignore:
ProviderSlug
Anthropicanthropic
OpenAIopenai
Azureazure
Googlegoogle
Together AItogether
DeepInfradeepinfra
Fireworks AIfireworks
Groqgroq
AWS Bedrockbedrock
Mistralmistral
Provider availability varies by model. Not all providers host all models. If you specify a provider that doesn’t host your requested model, it will be skipped.

Provider Object

The provider object can contain the following fields:
FieldTypeDefaultDescription
sortstring | object-Sort providers by "price", "throughput", or "latency"
orderstring[]-List of provider slugs to try in order (e.g., ["anthropic", "openai"])
onlystring[]-List of provider slugs to allow for this request
ignorestring[]-List of provider slugs to skip for this request
allow_fallbacksbooleantrueWhether to allow backup providers when the primary is unavailable
require_parametersbooleanfalseOnly use providers that support all parameters in your request
data_collection”allow” | “deny""allow”Control whether to use providers that may store data
quantizationsstring[]-List of quantization levels to filter by (e.g., ["int4", "int8"])
preferred_min_throughputnumber | object-Preferred minimum throughput (tokens/sec)
preferred_max_latencynumber | object-Preferred maximum latency (seconds)

Provider Sorting

Control how providers are prioritized for your request. By default, BLACKBOX AI load balances based on price while accounting for uptime.

Sort by Price

Route to the lowest-cost provider:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "price"
    }
  }'

Sort by Throughput

Route to the highest-throughput provider for faster token generation:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "throughput"
    }
  }'

Sort by Latency

Route to the lowest-latency provider for faster time-to-first-token:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "latency"
    }
  }'

Ordering Specific Providers

Use the order field to specify which providers to try first, in order of preference:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "order": ["together", "deepinfra", "fireworks"]
    }
  }'
The router will try providers in the specified order. If none are available, it will fall back to other providers unless fallbacks are disabled.

Allowing Only Specific Providers

Use the only field to restrict requests to specific providers:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/anthropic/claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "only": ["anthropic"]
    }
  }'
Restricting to specific providers may reduce fallback options and limit request recovery if the specified provider is unavailable.

Ignoring Providers

Use the ignore field to exclude specific providers from routing:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "ignore": ["azure", "aws"]
    }
  }'

Disabling Fallbacks

By default, if your preferred provider fails, BLACKBOX AI will try other available providers. To disable this behavior:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "order": ["together"],
      "allow_fallbacks": false
    }
  }'
With allow_fallbacks: false, if the specified provider fails, the request will return an error instead of trying other providers.

Data Collection Policy

Control whether to use providers that may store or train on your data:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "data_collection": "deny"
    }
  }'
  • "allow" (default): Allow providers that may store data non-transiently
  • "deny": Only use providers that do not collect user data

Requiring Parameter Support

Ensure your request only goes to providers that support all specified parameters:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "response_format": {"type": "json_object"},
    "provider": {
      "require_parameters": true
    }
  }'
This is useful when using features like JSON mode or specific sampling parameters that not all providers support.

Performance Thresholds

Set minimum throughput or maximum latency preferences to filter providers based on performance:

Minimum Throughput

Prefer providers with at least a certain throughput (tokens per second):
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "preferred_min_throughput": {
        "p90": 50
      }
    }
  }'

Maximum Latency

Prefer providers with latency below a certain threshold (in seconds):
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "preferred_max_latency": {
        "p90": 3
      }
    }
  }'

Percentile Options

Performance thresholds support the following percentile cutoffs:
  • p50 - Median performance (50% of requests perform better)
  • p75 - 75th percentile
  • p90 - 90th percentile (recommended for most use cases)
  • p99 - 99th percentile (strictest)
Performance thresholds are preferences, not hard requirements. Providers that don’t meet the threshold are deprioritized but not excluded entirely.

Quantization Filtering

Filter providers by model quantization level:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.1-8b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "quantizations": ["fp8", "fp16"]
    }
  }'

Available Quantization Levels

  • int4 - Integer 4-bit
  • int8 - Integer 8-bit
  • fp4 - Floating point 4-bit
  • fp6 - Floating point 6-bit
  • fp8 - Floating point 8-bit
  • fp16 - Floating point 16-bit
  • bf16 - Brain floating point 16-bit
  • fp32 - Floating point 32-bit
Quantized models may exhibit degraded performance for certain prompts. Lower quantization levels reduce memory requirements but may affect output quality.

Combining Options

You can combine multiple provider options for fine-grained control:
curl -X POST https://api.blackbox.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "blackboxai/meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": "price",
      "ignore": ["azure"],
      "data_collection": "deny",
      "preferred_min_throughput": {"p90": 30}
    }
  }'
This example:
  1. Sorts providers by price (lowest first)
  2. Excludes Azure from consideration
  3. Only uses providers that don’t collect data
  4. Prefers providers with at least 30 tokens/sec throughput at p90

Response Provider Field

The response includes a provider field indicating which provider served the request:
{
  "id": "gen-...",
  "model": "meta-llama/llama-3.3-70b-instruct",
  "choices": [...],
  "usage": {...},
  "provider": "Together"
}
This helps you track which provider was used, especially useful when using load balancing or fallbacks.