LLM pricing · updated 2026-06-03

LLM API pricing, 2026

Token prices ($ per 1M), context, max output and speed across every major provider. Bring your own keys to flo2 and you pay these prices directly — zero markup — and flo2 routes each call to the cheapest model that clears your quality bar.

Cheapest output
$0.08/M
llama-3.1-8b-instant · Groq
Cheapest input
$0.05/M
llama-3.1-8b-instant · Groq
Fastest output
3,000 tok/s
gpt-oss-120b · Cerebras
Biggest context
10M tok
llama-4-scout-17b-16e · Groq
ProviderModelContextMax outIn $/MCachedOut $/MOut tok/sPrefill
OpenAI gpt-5.5 1M 16K $5 $0.5 $30 70 10,000
OpenAI gpt-5.5-pro 1M 16K $30 $180 50 5,000
OpenAI gpt-5.4 1M 16K $2.5 $0.25 $15 90 12,000
OpenAI gpt-5.4-mini 1M 16K $0.75 $0.075 $4.5 135 20,000
OpenAI gpt-5.4-nano 1M 16K $0.2 $0.02 $1.25 200 30,000
OpenAI gpt-4.1 1M 16K $2 $0.2 $8 80 15,000
OpenAI gpt-4.1-nano 1M 16K $0.1 $0.01 $0.4 250 35,000
OpenAI o4-mini 200K 66K $1.1 $0.11 $4.4 40 10,000
OpenAI gpt-5 128K 8K $1.25 $0.125 $10 50 8,000
Anthropic claude-opus-4.8 1M 128K $5 $25 30 5,000
Anthropic claude-opus-4.7 1M 128K $5 $0.5 $25 40 6,000
Anthropic claude-sonnet-5 1M 128K $3 $0.3 $15 70 12,000
Anthropic claude-sonnet-4.6 1M 128K $3 $0.3 $15 80 15,000
Anthropic claude-haiku-4.5 200K 64K $1 $0.1 $5 150 25,000
Anthropic claude-opus-4.1 200K 4K $15 $1.5 $75 15 3,000
Google gemini-3.1-pro 2M 8K $4 $1 $18 45 8,000
Google gemini-3.1-flash-lite 1M 8K $0.25 $0.0625 $1.5 180 25,000
Google gemini-2.5-pro 2M 8K $1.25 $0.3125 $5 60 10,000
Google gemini-2.5-flash 1M 8K $0.075 $0.0187 $0.3 120 18,000
xAI grok-4.20 2M 33K $1.25 $0.125 $2.5 150 18,000
xAI grok-4 128K 8K $3 $0.3 $15 60 10,000
xAI grok-4.1-fast 2M 16K $0.2 $0.02 $0.5 180 25,000
Groq llama-3.3-70b-versatile 128K 8K $0.59 $0.295 $0.79 330 35,000
Groq llama-3.1-8b-instant 128K 8K $0.05 $0.025 $0.08 1,200 80,000
Groq llama-4-scout-17b-16e 10M 16K $0.15 $0.075 $0.4 800 60,000
Groq deepseek-r1-distill-llama-70b 128K 8K $0.75 $0.375 $0.99 330 35,000
Groq gemma-2-9b 8K 8K $0.2 $0.1 $0.2 800 70,000
Cerebras gpt-oss-120b 131K 16K $0.35 $0.75 3,000 100,000
Cerebras llama-3.3-70b 128K 16K $0.85 $1.2 2,100 100,000
Cerebras llama-3.1-8b 128K 8K $0.1 $0.1 2,200 120,000
Cerebras glm-4.7 200K 8K $2.25 $2.75 1,000 80,000
Mistral mistral-large-3 128K 8K $0.5 $0.25 $1.5 60 8,000
Mistral mistral-small-4 32K 8K $0.15 $0.075 $0.6 160 20,000
Mistral mistral-medium-3 32K 8K $0.4 $0.2 $2 75 10,000
Mistral devstral-2 128K 16K $0.4 $0.2 $2 80 12,000
Mistral ministral-3-8b 128K 8K $0.15 $0.075 $0.15 180 22,000
Mistral pixtral-large 128K 8K $2 $1 $6 50 8,000
DeepInfra deepseek-r1-0528 160K 16K $0.5 $0.35 $2.15 30 1,500
DeepInfra deepseek-v3-0324-turbo 164K 8K $0.2 $0.11 $0.77 70 3,000
DeepInfra llama-3.3-70b-instruct 128K 16K $0.1 $0.32 55 2,500
DeepInfra llama-4-maverick-17b 128K 16K $0.15 $0.6 120 8,000
DeepInfra qwen3-235b-a22b 128K 16K $0.07 $0.1 40 1,200
DeepInfra kimi-k2-instruct 128K 16K $4 $1.5 $20 45 2,000
DeepInfra deepseek-r1-dist-qwen-32b 128K 16K $0.7 $0.8 110 6,000
DeepInfra qwen2.5-coder-32b-instruct 128K 16K $0.08 $0.28 120 8,000
OpenRouter deepseek-v4-pro 1M 16K $0.435 $0.367 $0.87 45 2,000
OpenRouter llama-3.3-70b-instruct 131K 16K $0.1 $0.32 55 2,500
OpenRouter deepseek-r1 164K 16K $0.7 $2.5 25 1,500
OpenRouter openrouter/free 66K 8K Free Free Free 50 2,000
OpenRouter deepseek-r1-dist-llama-70b 131K 16K $0.22 $0.22 60 2,500

Prices are USD per 1,000,000 tokens. Speeds are tokens/sec (output generation and prompt prefill), measured under normal load and indicative only. Indicative figures compiled from public provider pricing — models and prices change constantly. Always confirm on the provider's own pricing page before you commit. flo2 does not resell tokens: with your own keys you pay these prices directly. Click any column header to sort.

Official pricing pages: OpenAI · Anthropic · Google · xAI · Groq · Cerebras · Mistral · DeepInfra · OpenRouter.

One key in front of all of them — zero markup.
flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting. Free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. blog · flow → to