One key,
every AI model.
flo2 routes your app to OpenAI, Anthropic, Groq, Cerebras & more — and auto-picks the cheapest, fastest one for every call.
We email you a one-time code. No password to remember.
One key. Every model. Fully under your control.
flo2 is the developer-first LLM gateway, router and proxy. It doesn't resell tokens — you bring your own provider keys, and flo2 routes one OpenAI- & Anthropic-compatible API key to the cheapest, fastest models across OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more, with fallback, racing and real cost accounting. The OpenRouter alternative that never marks up your tokens.
Smart LLM routing
Point one flo2 key at any mix of providers and models. Pin a default, restrict to a few, or open it to your whole key collection — a true unified LLM API.
Fallback chains
If the primary model errors after N retries, flo2 slides to the next fallback automatically. Drag to reorder priority. No more single-provider outages.
AI racing
Fire free or unstable models in parallel with a head start. The fastest LLM to answer wins; the rest keep racing in case they finish sooner.
A/B testing
Shadow new models against your live setup, capture both answers, and let a judge model score which is better — so you ship the model that actually wins.
Cost transparency
Every call logs tokens, throughput and computed cost across providers — full LLM cost observability, so your spend stays clear and easy to optimize.
Drop-in compatible
Speak OpenAI Chat Completions, Responses, legacy Completions or Anthropic Messages — streaming included. Just change the base URL.
Auto-collect datasets
Capture your prompts and the winning answers on demand to build clean training sets — ready to fine-tune your own models on your own hardware.
Response caching
Repeated prompts come straight back from cache — cutting latency and cost. Opt-in per flo2 key, with a TTL you control.
Optimize your token spend, not someone else's margin.
flo2 charges no markup on tokens — ever. You always pay your providers directly. We just make your keys flow to the cheapest, fastest place and prove the numbers.
Upgrading AI tokenomics via
How it works
Three steps from sign-up to your first routed, accounted, streaming completion.
Add your provider keys
Paste keys for OpenAI, Anthropic, Groq, Cerebras, DeepInfra… and set their per-million-token prices.
Wire up a flo2 key
Choose which models it can reach and their roles: default, fallback, racing or A/B.
Point your app at flo2
curl https://flo2.com/api/v1/chat/completions \ -H "Authorization: Bearer flo_…" \ -d '{"model":"auto","stream":true, "messages":[{"role":"user", "content":"hi"}]}'
LLM gateway questions, answered
What is an LLM gateway?
An LLM gateway is a single API endpoint in front of multiple model providers — OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more. flo2 is a developer-first LLM gateway and router: one key, every model, with smart routing, fallback, racing and cost accounting.
What is the cheapest LLM API?
The cheapest LLM API depends on your task. flo2 lets you attach every provider key you already have, set per-million-token prices, and route each request to the cheapest model that meets your quality bar — with no token markup, since you pay providers directly.
Is flo2 a good OpenRouter alternative?
Yes. Unlike OpenRouter, flo2 doesn't resell tokens or credits. You bring your own provider keys and flo2 only routes between them — a zero-markup OpenRouter alternative with fallback, racing and a real cost-audit dashboard.
Which is the fastest LLM, and can flo2 pick it automatically?
flo2's racing mode fires several models in parallel and serves whichever responds fastest, so you always get the fastest LLM for that moment without hard-coding one provider.
How does flo2 optimize AI tokenomics and reduce LLM costs?
flo2 logs tokens, throughput and computed cost for every attempt, so you can reconcile against the provider invoice, route to cheaper models via fallback, and cut LLM spend — optimizing your AI tokenomics for your benefit.
Does flo2 support the OpenAI and Anthropic APIs?
Yes. flo2 speaks OpenAI Chat Completions, Responses and legacy Completions, plus the Anthropic Messages API — streaming included. Just change the base URL and use your flo2 key.
Point one key at every model.
Bring the provider keys you already have, wire up a flo2 key, and ship. No token markup — you always pay your providers directly. We just make your keys flow to the right model.