DeepInfra pricing · auto-updated daily

DeepInfra API pricing — June 12, 2026

Name: DeepInfra API pricing (June 2026)
Creator: flo2
License: https://creativecommons.org/licenses/by/4.0/

All 86 DeepInfra text models with token prices ($ per 1M), context window and max output, refreshed daily from the official pricing source. Bring your DeepInfra key to flo2 and you pay these exact prices — zero markup — with fallback, racing and per-request cost accounting on top.

Cheapest output

$0.03/M

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

Cheapest input

$0.02/M

meta-llama/Meta-Llama-3.1-8B-Instruct

Biggest context

1M tok

XiaomiMiMo/MiMo-V2.5-Pro

Models

tracked daily

Model	Context	Max out	In $/M	Cached in	Out $/M	Reasoning
anthropic/claude-opus-4-7	1M	—	$5	—	$25	✓
anthropic/claude-opus-4-8	1M	—	$5	—	$25	✓
anthropic/claude-sonnet-4-6	1M	—	$3	—	$15	✓
google/gemini-3.1-pro	1M	—	$2	—	$12	✓
google/gemini-2.5-pro	1M	—	$1.25	—	$10	✓
google/gemini-3.5-flash	1M	—	$1.5	—	$9	✓
Qwen/Qwen3.7-Max	256K	—	$2.5	$0.5	$7.5	—
Qwen/Qwen3-Max	256K	—	$1.2	$0.24	$6	—
Qwen/Qwen3-Max-Thinking	256K	—	$1.2	$0.24	$6	—
anthropic/claude-haiku-4-5	200K	—	$1	—	$5	✓
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16	262K	—	$1	$0.3	$5	✓
moonshotai/Kimi-K2.6	262K	—	$0.75	$0.15	$3.5	✓
zai-org/GLM-5.1	203K	—	$1.05	$0.205	$3.5	✓
Qwen/Qwen3.6-27B	262K	—	$0.32	—	$3.2	✓
ByteDance/Seed-2.0-code	256K	—	$0.5	$0.1	$3	✓
ByteDance/Seed-2.0-pro	256K	—	$0.5	$0.1	$3	✓
Qwen/Qwen3.5-397B-A17B	262K	—	$0.45	$0.22	$3	✓
XiaomiMiMo/MiMo-V2.5-Pro	1M	—	$1	$0.2	$3	✓
deepseek-ai/DeepSeek-V4-Pro	1M	—	$1.3	$0.1	$2.6	—
Qwen/Qwen3.5-27B	262K	—	$0.26	—	$2.6	✓
google/gemini-2.5-flash	1M	—	$0.3	—	$2.5	✓
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B	262K	—	$0.5	$0.15	$2.5	✓
Qwen/Qwen3-235B-A22B-Thinking-2507	262K	—	$0.23	$0.2	$2.3	✓
moonshotai/Kimi-K2.5	262K	—	$0.45	$0.07	$2.25	✓
deepseek-ai/DeepSeek-R1-0528	164K	—	$0.5	$0.35	$2.15	✓
zai-org/GLM-5	203K	—	$0.6	$0.12	$2.08	✓
ByteDance/Seed-1.8	256K	—	$0.25	$0.05	$2	✓
XiaomiMiMo/MiMo-V2.5	262K	—	$0.4	$0.08	$2	✓
zai-org/GLM-4.7	203K	—	$0.4	$0.08	$1.75	✓
zai-org/GLM-4.6	203K	—	$0.43	$0.08	$1.74	✓
google/gemini-3.1-flash-lite	1M	—	$0.25	—	$1.5	✓
MiniMaxAI/MiniMax-M2.5	197K	—	$0.15	$0.03	$1.15	✓
stepfun-ai/Step-3.7-Flash	262K	—	$0.2	$0.04	$1.15	—
Qwen/Qwen3-Next-80B-A3B-Instruct	262K	—	$0.09	—	$1.1	—
MiniMaxAI/MiniMax-M2.7	197K	—	$0.25	$0.05	$1	✓
NousResearch/Hermes-3-Llama-3.1-405B	131K	—	$1	—	$1	—
Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo	262K	—	$0.3	$0.1	$1	—
Qwen/Qwen3.5-35B-A3B	262K	—	$0.14	$0.05	$1	✓
deepseek-ai/DeepSeek-V3.1-Terminus	164K	—	$0.27	$0.13	$0.95	✓
Qwen/Qwen3.6-35B-A3B	262K	—	$0.15	—	$0.95	✓
deepseek-ai/DeepSeek-V3	164K	—	$0.32	—	$0.89	—
Qwen/Qwen3-VL-235B-A22B-Instruct	262K	—	$0.2	$0.11	$0.88	—
Sao10K/L3.1-70B-Euryale-v2.2	131K	—	$0.85	—	$0.85	—
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning	262K	—	$0.2	—	$0.8	✓
deepseek-ai/DeepSeek-V3.1	164K	—	$0.21	$0.13	$0.79	✓
deepseek-ai/DeepSeek-V3-0324	33K	—	$0.2	$0.135	$0.77	—
NousResearch/Hermes-3-Llama-3.1-70B	131K	—	$0.7	—	$0.7	—
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8	1M	—	$0.15	—	$0.6	—
openai/gpt-oss-120b-Turbo	131K	—	$0.15	—	$0.6	✓
Qwen/Qwen3-VL-30B-A3B-Instruct	262K	—	$0.15	—	$0.6	—
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B	262K	—	$0.1	—	$0.5	✓
Qwen/Qwen3-30B-A3B	41K	—	$0.12	—	$0.5	✓
ByteDance/Seed-2.0-mini	256K	—	$0.1	$0.02	$0.4	✓
Gryphe/MythoMax-L2-13b	4K	—	$0.4	—	$0.4	—
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo	131K	—	$0.4	—	$0.4	—
nvidia/Llama-3.3-Nemotron-Super-49B-v1.5	131K	—	$0.4	—	$0.4	✓
Qwen/Qwen2.5-72B-Instruct	33K	—	$0.36	—	$0.4	—
zai-org/GLM-4.7-Flash	203K	—	$0.06	$0.01	$0.4	✓
deepseek-ai/DeepSeek-V3.2	164K	—	$0.26	$0.13	$0.38	—
google/gemma-4-31B-it	262K	—	$0.13	—	$0.38	✓
google/gemma-4-31B-it-turbo	262K	—	$0.12	—	$0.37	✓
meta-llama/Llama-3.2-11B-Vision-Instruct	131K	—	$0.345	—	$0.345	—
google/gemma-4-26B-A4B-it	262K	—	$0.07	—	$0.34	✓
meta-llama/Llama-3.3-70B-Instruct-Turbo	131K	—	$0.1	—	$0.32	—
meta-llama/Llama-4-Scout-17B-16E-Instruct	328K	—	$0.1	—	$0.3	—
stepfun-ai/Step-3.5-Flash	262K	—	$0.09	$0.02	$0.3	—
Qwen/Qwen3-32B	41K	—	$0.08	—	$0.28	✓
Qwen/Qwen3-14B	41K	—	$0.12	—	$0.24	✓
deepseek-ai/DeepSeek-V4-Flash	1M	—	$0.1	$0.02	$0.2	✓
mistralai/Mistral-Small-3.2-24B-Instruct-2506	128K	—	$0.075	—	$0.2	—
nvidia/Nemotron-3-Nano-30B-A3B	262K	—	$0.05	—	$0.2	✓
nvidia/Nemotron-Content-Safety-3.5	131K	—	$0.2	—	$0.2	—
openai/gpt-oss-120b	131K	—	$0.039	—	$0.19	✓
meta-llama/Llama-Guard-4-12B	164K	—	$0.18	—	$0.18	—
google/gemma-3-27b-it	131K	—	$0.08	—	$0.16	—
google/gemma-3-12b-it	131K	—	$0.05	—	$0.15	—
Qwen/Qwen3.5-9B	262K	—	$0.1	—	$0.15	—
microsoft/phi-4	16K	—	$0.07	—	$0.14	—
openai/gpt-oss-20b	131K	—	$0.03	—	$0.14	✓
google/gemma-3-4b-it	131K	—	$0.05	—	$0.1	—
Qwen/Qwen3-235B-A22B-Instruct-2507	262K	—	$0.09	—	$0.1	—
mistralai/Mistral-Small-24B-Instruct-2501	33K	—	$0.05	—	$0.08	—
meta-llama/Meta-Llama-3.1-8B-Instruct	131K	—	$0.02	—	$0.05	—
Sao10K/L3-8B-Lunaris-v1-Turbo	8K	—	$0.04	—	$0.05	—
mistralai/Mistral-Nemo-Instruct-2407	131K	—	$0.02	—	$0.04	—
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo	131K	—	$0.02	—	$0.03	—

Prices in USD per 1,000,000 tokens, fetched 2026-06-12 from the official DeepInfra pricing source. Verify before large commitments. Click any column header to sort.

More providers: OpenAI · Anthropic · Google Gemini · xAI Grok · Groq · Cerebras · Mistral · OpenRouter · NVIDIA NIM · or the full cross-provider comparison.

Use DeepInfra through one key — zero markup.

flo2 routes every call to the cheapest, fastest model that clears your bar, with fallback, racing and true cost accounting. Free during Beta.

Get your flo2 key →