DeepInfra pricing · auto-updated daily

DeepInfra API pricing — June 12, 2026

All 86 DeepInfra text models with token prices ($ per 1M), context window and max output, refreshed daily from the official pricing source. Bring your DeepInfra key to flo2 and you pay these exact prices — zero markup — with fallback, racing and per-request cost accounting on top.

Cheapest output
$0.03/M
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
Cheapest input
$0.02/M
meta-llama/Meta-Llama-3.1-8B-Instruct
Biggest context
1M tok
XiaomiMiMo/MiMo-V2.5-Pro
Models
86
tracked daily
ModelContextMax outIn $/MCached inOut $/MReasoning
anthropic/claude-opus-4-7 1M $5 $25
anthropic/claude-opus-4-8 1M $5 $25
anthropic/claude-sonnet-4-6 1M $3 $15
google/gemini-3.1-pro 1M $2 $12
google/gemini-2.5-pro 1M $1.25 $10
google/gemini-3.5-flash 1M $1.5 $9
Qwen/Qwen3.7-Max 256K $2.5 $0.5 $7.5
Qwen/Qwen3-Max 256K $1.2 $0.24 $6
Qwen/Qwen3-Max-Thinking 256K $1.2 $0.24 $6
anthropic/claude-haiku-4-5 200K $1 $5
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 262K $1 $0.3 $5
moonshotai/Kimi-K2.6 262K $0.75 $0.15 $3.5
zai-org/GLM-5.1 203K $1.05 $0.205 $3.5
Qwen/Qwen3.6-27B 262K $0.32 $3.2
ByteDance/Seed-2.0-code 256K $0.5 $0.1 $3
ByteDance/Seed-2.0-pro 256K $0.5 $0.1 $3
Qwen/Qwen3.5-397B-A17B 262K $0.45 $0.22 $3
XiaomiMiMo/MiMo-V2.5-Pro 1M $1 $0.2 $3
deepseek-ai/DeepSeek-V4-Pro 1M $1.3 $0.1 $2.6
Qwen/Qwen3.5-27B 262K $0.26 $2.6
google/gemini-2.5-flash 1M $0.3 $2.5
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B 262K $0.5 $0.15 $2.5
Qwen/Qwen3-235B-A22B-Thinking-2507 262K $0.23 $0.2 $2.3
moonshotai/Kimi-K2.5 262K $0.45 $0.07 $2.25
deepseek-ai/DeepSeek-R1-0528 164K $0.5 $0.35 $2.15
zai-org/GLM-5 203K $0.6 $0.12 $2.08
ByteDance/Seed-1.8 256K $0.25 $0.05 $2
XiaomiMiMo/MiMo-V2.5 262K $0.4 $0.08 $2
zai-org/GLM-4.7 203K $0.4 $0.08 $1.75
zai-org/GLM-4.6 203K $0.43 $0.08 $1.74
google/gemini-3.1-flash-lite 1M $0.25 $1.5
MiniMaxAI/MiniMax-M2.5 197K $0.15 $0.03 $1.15
stepfun-ai/Step-3.7-Flash 262K $0.2 $0.04 $1.15
Qwen/Qwen3-Next-80B-A3B-Instruct 262K $0.09 $1.1
MiniMaxAI/MiniMax-M2.7 197K $0.25 $0.05 $1
NousResearch/Hermes-3-Llama-3.1-405B 131K $1 $1
Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo 262K $0.3 $0.1 $1
Qwen/Qwen3.5-35B-A3B 262K $0.14 $0.05 $1
deepseek-ai/DeepSeek-V3.1-Terminus 164K $0.27 $0.13 $0.95
Qwen/Qwen3.6-35B-A3B 262K $0.15 $0.95
deepseek-ai/DeepSeek-V3 164K $0.32 $0.89
Qwen/Qwen3-VL-235B-A22B-Instruct 262K $0.2 $0.11 $0.88
Sao10K/L3.1-70B-Euryale-v2.2 131K $0.85 $0.85
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning 262K $0.2 $0.8
deepseek-ai/DeepSeek-V3.1 164K $0.21 $0.13 $0.79
deepseek-ai/DeepSeek-V3-0324 33K $0.2 $0.135 $0.77
NousResearch/Hermes-3-Llama-3.1-70B 131K $0.7 $0.7
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 1M $0.15 $0.6
openai/gpt-oss-120b-Turbo 131K $0.15 $0.6
Qwen/Qwen3-VL-30B-A3B-Instruct 262K $0.15 $0.6
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B 262K $0.1 $0.5
Qwen/Qwen3-30B-A3B 41K $0.12 $0.5
ByteDance/Seed-2.0-mini 256K $0.1 $0.02 $0.4
Gryphe/MythoMax-L2-13b 4K $0.4 $0.4
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo 131K $0.4 $0.4
nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 131K $0.4 $0.4
Qwen/Qwen2.5-72B-Instruct 33K $0.36 $0.4
zai-org/GLM-4.7-Flash 203K $0.06 $0.01 $0.4
deepseek-ai/DeepSeek-V3.2 164K $0.26 $0.13 $0.38
google/gemma-4-31B-it 262K $0.13 $0.38
google/gemma-4-31B-it-turbo 262K $0.12 $0.37
meta-llama/Llama-3.2-11B-Vision-Instruct 131K $0.345 $0.345
google/gemma-4-26B-A4B-it 262K $0.07 $0.34
meta-llama/Llama-3.3-70B-Instruct-Turbo 131K $0.1 $0.32
meta-llama/Llama-4-Scout-17B-16E-Instruct 328K $0.1 $0.3
stepfun-ai/Step-3.5-Flash 262K $0.09 $0.02 $0.3
Qwen/Qwen3-32B 41K $0.08 $0.28
Qwen/Qwen3-14B 41K $0.12 $0.24
deepseek-ai/DeepSeek-V4-Flash 1M $0.1 $0.02 $0.2
mistralai/Mistral-Small-3.2-24B-Instruct-2506 128K $0.075 $0.2
nvidia/Nemotron-3-Nano-30B-A3B 262K $0.05 $0.2
nvidia/Nemotron-Content-Safety-3.5 131K $0.2 $0.2
openai/gpt-oss-120b 131K $0.039 $0.19
meta-llama/Llama-Guard-4-12B 164K $0.18 $0.18
google/gemma-3-27b-it 131K $0.08 $0.16
google/gemma-3-12b-it 131K $0.05 $0.15
Qwen/Qwen3.5-9B 262K $0.1 $0.15
microsoft/phi-4 16K $0.07 $0.14
openai/gpt-oss-20b 131K $0.03 $0.14
google/gemma-3-4b-it 131K $0.05 $0.1
Qwen/Qwen3-235B-A22B-Instruct-2507 262K $0.09 $0.1
mistralai/Mistral-Small-24B-Instruct-2501 33K $0.05 $0.08
meta-llama/Meta-Llama-3.1-8B-Instruct 131K $0.02 $0.05
Sao10K/L3-8B-Lunaris-v1-Turbo 8K $0.04 $0.05
mistralai/Mistral-Nemo-Instruct-2407 131K $0.02 $0.04
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 131K $0.02 $0.03

Prices in USD per 1,000,000 tokens, fetched 2026-06-12 from the official DeepInfra pricing source. Verify before large commitments. Click any column header to sort.

More providers: OpenAI · Anthropic · Google Gemini · xAI Grok · Groq · Cerebras · Mistral · OpenRouter · NVIDIA NIM · or the full cross-provider comparison.

Use DeepInfra through one key — zero markup.
flo2 routes every call to the cheapest, fastest model that clears your bar, with fallback, racing and true cost accounting. Free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. blog · all providers · flow → to