LLM pricing · updated 2026-06-03

LLM API pricing, 2026

Token prices ($ per 1M), context, max output and speed across every major provider. Bring your own keys to flo2 and you pay these prices directly — zero markup — and flo2 routes each call to the cheapest model that clears your quality bar.

Cheapest output

$0.08/M

llama-3.1-8b-instant · Groq

Cheapest input

$0.05/M

llama-3.1-8b-instant · Groq

Fastest output

3,000 tok/s

gpt-oss-120b · Cerebras

Biggest context

10M tok

llama-4-scout-17b-16e · Groq

Provider	Model	Context	Max out	In $/M	Cached	Out $/M	Out tok/s	Prefill
OpenAI	gpt-5.5	1M	16K	$5	$0.5	$30	70	10,000
OpenAI	gpt-5.5-pro	1M	16K	$30	—	$180	50	5,000
OpenAI	gpt-5.4	1M	16K	$2.5	$0.25	$15	90	12,000
OpenAI	gpt-5.4-mini	1M	16K	$0.75	$0.075	$4.5	135	20,000
OpenAI	gpt-5.4-nano	1M	16K	$0.2	$0.02	$1.25	200	30,000
OpenAI	gpt-4.1	1M	16K	$2	$0.2	$8	80	15,000
OpenAI	gpt-4.1-nano	1M	16K	$0.1	$0.01	$0.4	250	35,000
OpenAI	o4-mini	200K	66K	$1.1	$0.11	$4.4	40	10,000
OpenAI	gpt-5	128K	8K	$1.25	$0.125	$10	50	8,000
Anthropic	claude-opus-4.8	1M	128K	$5	—	$25	30	5,000
Anthropic	claude-opus-4.7	1M	128K	$5	$0.5	$25	40	6,000
Anthropic	claude-sonnet-5	1M	128K	$3	$0.3	$15	70	12,000
Anthropic	claude-sonnet-4.6	1M	128K	$3	$0.3	$15	80	15,000
Anthropic	claude-haiku-4.5	200K	64K	$1	$0.1	$5	150	25,000
Anthropic	claude-opus-4.1	200K	4K	$15	$1.5	$75	15	3,000
Google	gemini-3.1-pro	2M	8K	$4	$1	$18	45	8,000
Google	gemini-3.1-flash-lite	1M	8K	$0.25	$0.0625	$1.5	180	25,000
Google	gemini-2.5-pro	2M	8K	$1.25	$0.3125	$5	60	10,000
Google	gemini-2.5-flash	1M	8K	$0.075	$0.0187	$0.3	120	18,000
xAI	grok-4.20	2M	33K	$1.25	$0.125	$2.5	150	18,000
xAI	grok-4	128K	8K	$3	$0.3	$15	60	10,000
xAI	grok-4.1-fast	2M	16K	$0.2	$0.02	$0.5	180	25,000
Groq	llama-3.3-70b-versatile	128K	8K	$0.59	$0.295	$0.79	330	35,000
Groq	llama-3.1-8b-instant	128K	8K	$0.05	$0.025	$0.08	1,200	80,000
Groq	llama-4-scout-17b-16e	10M	16K	$0.15	$0.075	$0.4	800	60,000
Groq	deepseek-r1-distill-llama-70b	128K	8K	$0.75	$0.375	$0.99	330	35,000
Groq	gemma-2-9b	8K	8K	$0.2	$0.1	$0.2	800	70,000
Cerebras	gpt-oss-120b	131K	16K	$0.35	—	$0.75	3,000	100,000
Cerebras	llama-3.3-70b	128K	16K	$0.85	—	$1.2	2,100	100,000
Cerebras	llama-3.1-8b	128K	8K	$0.1	—	$0.1	2,200	120,000
Cerebras	glm-4.7	200K	8K	$2.25	—	$2.75	1,000	80,000
Mistral	mistral-large-3	128K	8K	$0.5	$0.25	$1.5	60	8,000
Mistral	mistral-small-4	32K	8K	$0.15	$0.075	$0.6	160	20,000
Mistral	mistral-medium-3	32K	8K	$0.4	$0.2	$2	75	10,000
Mistral	devstral-2	128K	16K	$0.4	$0.2	$2	80	12,000
Mistral	ministral-3-8b	128K	8K	$0.15	$0.075	$0.15	180	22,000
Mistral	pixtral-large	128K	8K	$2	$1	$6	50	8,000
DeepInfra	deepseek-r1-0528	160K	16K	$0.5	$0.35	$2.15	30	1,500
DeepInfra	deepseek-v3-0324-turbo	164K	8K	$0.2	$0.11	$0.77	70	3,000
DeepInfra	llama-3.3-70b-instruct	128K	16K	$0.1	—	$0.32	55	2,500
DeepInfra	llama-4-maverick-17b	128K	16K	$0.15	—	$0.6	120	8,000
DeepInfra	qwen3-235b-a22b	128K	16K	$0.07	—	$0.1	40	1,200
DeepInfra	kimi-k2-instruct	128K	16K	$4	$1.5	$20	45	2,000
DeepInfra	deepseek-r1-dist-qwen-32b	128K	16K	$0.7	—	$0.8	110	6,000
DeepInfra	qwen2.5-coder-32b-instruct	128K	16K	$0.08	—	$0.28	120	8,000
OpenRouter	deepseek-v4-pro	1M	16K	$0.435	$0.367	$0.87	45	2,000
OpenRouter	llama-3.3-70b-instruct	131K	16K	$0.1	—	$0.32	55	2,500
OpenRouter	deepseek-r1	164K	16K	$0.7	—	$2.5	25	1,500
OpenRouter	openrouter/free	66K	8K	Free	Free	Free	50	2,000
OpenRouter	deepseek-r1-dist-llama-70b	131K	16K	$0.22	—	$0.22	60	2,500

Prices are USD per 1,000,000 tokens. Speeds are tokens/sec (output generation and prompt prefill), measured under normal load and indicative only. Indicative figures compiled from public provider pricing — models and prices change constantly. Always confirm on the provider's own pricing page before you commit. flo2 does not resell tokens: with your own keys you pay these prices directly. Click any column header to sort.

Per-provider pages (auto-updated daily): OpenAI · Anthropic · Google Gemini · xAI Grok · Groq · Cerebras · Mistral · DeepInfra · OpenRouter · NVIDIA NIM.

Official pricing pages: OpenAI · Anthropic · Google · xAI · Groq · Cerebras · Mistral · DeepInfra · OpenRouter.

One key in front of all of them — zero markup.

flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting. Free during Beta.

Get your flo2 key →