One key,
every AI model.
flo2 routes your app to OpenAI, Anthropic, Groq, Cerebras & more — and auto-picks the cheapest, fastest one for every call.
Zero markup · bring your own keys · private by default · free during Beta
Enter your email — we'll pick the right sign-in method on the next step.
Real products, really running on flo2.
Teams already push production traffic through flo2 — research tools, global aviation data and high-volume feed pipelines. Real usage, growing every week.
“Nodum turns scattered notes, videos and posts into structured knowledge — that's a lot of LLM calls. flo2 routes each step to the cheapest model that's good enough and proves the cost per request. We swap models without touching app code; the token accounting alone paid for itself.”
“We serve real-time aviation data to thousands of developers, so predictable cost and uptime matter. flo2 sits in front of every provider with one key — no token resale, our own prices, real numbers per call. Adding a new model is a dropdown, not a migration.”
“FetchRSS processes a huge volume of feeds, so AI summarization had to be cheap to scale. With flo2 we point one key at the cheapest fast model, fall back automatically, and watch the exact spend per feed. No markup, no lock-in — exactly what an infra-heavy product needs.”
One key. Every model. Fully under your control.
flo2 is the developer-first LLM gateway, router and proxy. It doesn't resell tokens — you bring your own provider keys, and flo2 routes one OpenAI- & Anthropic-compatible API key to the cheapest, fastest models across OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more, with fallback, racing and real cost accounting. The OpenRouter alternative that never marks up your tokens.
Smart LLM routing
Point one flo2 key at any mix of providers and models. Pin a default, restrict to a few, or open it to your whole key collection — a true unified LLM API.
Fallback chains
If the primary model errors after N retries, flo2 slides to the next fallback automatically. Drag to reorder priority. No more single-provider outages.
AI racing
Fire free or unstable models in parallel with a head start. The fastest LLM to answer wins; the rest keep racing in case they finish sooner.
A/B testing
Shadow new models against your live setup, capture both answers, and let a judge model score which is better — so you reach model–task fit: the model that actually wins each task, proven by data.
Cost transparency
Every call logs tokens, throughput and computed cost across providers — full LLM cost observability, so your spend stays clear and easy to optimize.
Drop-in compatible
Speak OpenAI Chat Completions, Responses, legacy Completions or Anthropic Messages — streaming included. Just change the base URL.
Prompt Insights PRO · SOON
After an A/B run, flo2 reads the winning vs losing answers and your prompt, then suggests concrete edits to lift accuracy — your test data, turned into a sharper prompt.
Tokens are your new unit of spend — and costs rise faster than value.
Every prompt, API call and automated workflow burns tokens. The teams that win optimise early. flo2 is the layer that does it for you — route to the cheapest model that clears the bar, cache repeats, fall back on outages, and prove every number. No markup, ever: you always pay your providers directly.
How much are you overpaying on LLMs?
Smart tokenomics in practice — and the flo2 lever for each:
Right-size models
The right model for the job — not a frontier model for everything. Smart routing + A/B judge picks the cheapest model that clears your bar.
↓ up to 80%Cache & batch
Stop paying twice for the same answer. Response caching (opt-in, your TTL) returns repeats instantly and free.
↓ tokensBuild vs buy
Fine-tuned small vs general large — decide with data, not vibes. A/B + cost accounting compares them on your real traffic.
right choicePrompt discipline
You can't cut what you can't see. The true cost dashboard shows tokens & cost per call, so you tighten the expensive ones.
token budget“I built this flow by hand to cut the LLM bill on my own product (mapa.ua) — right-sizing models, caching, and a bullet-proof fallback path so an outage never took the feature down. It worked. flo2 is that exact playbook, turned into one key anyone can flip on.”
Every call — routed, raced and priced.
Live usage, the true cost per call, and a per-model breakdown across every provider you route to. No markup — just your numbers.
How it works
Three steps from sign-up to your first routed, accounted, streaming completion.
Add your provider keys
Paste keys for OpenAI, Anthropic, Groq, Cerebras, DeepInfra… and set their per-million-token prices.
Wire up a flo2 key
Choose which models it can reach and their roles: default, fallback, racing or A/B.
Point your app at flo2
curl https://flo2.com/api/v1/chat/completions \ -H "Authorization: Bearer flo_…" \ -d '{"model":"auto","stream":true, "messages":[{"role":"user", "content":"hi"}]}'
Your prompts are yours. We don't hoard them.
Built for teams that can't hand their data to a black box — flo2's default is to know as little as possible.
No content logging by default
flo2 records only metadata — tokens, latency, computed cost. Your prompts and responses are never stored unless you turn it on.
Captured only when you opt in
Content is kept only for the A/B tests and Prompt Insights you explicitly enable — and auto-erased after a short window once you have your result.
Your keys, your data path
BYOK means every request goes straight to the providers you chose. No token resale, no shadow copies — we'd rather earn trust than hoard data.
LLM gateway questions, answered
What is an LLM gateway?
An LLM gateway is a single API endpoint in front of multiple model providers — OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more. flo2 is a developer-first LLM gateway and router: one key, every model, with smart routing, fallback, racing and cost accounting.
What is the cheapest LLM API?
The cheapest LLM API depends on your task. flo2 lets you attach every provider key you already have, set per-million-token prices, and route each request to the cheapest model that meets your quality bar — with no token markup, since you pay providers directly.
Is flo2 a good OpenRouter alternative?
Yes. Unlike OpenRouter, flo2 doesn't resell tokens or credits. You bring your own provider keys and flo2 only routes between them — a zero-markup OpenRouter alternative with fallback, racing and a real cost-audit dashboard.
Which is the fastest LLM, and can flo2 pick it automatically?
flo2's racing mode fires several models in parallel and serves whichever responds fastest, so you always get the fastest LLM for that moment without hard-coding one provider.
How does flo2 optimize AI tokenomics and reduce LLM costs?
flo2 logs tokens, throughput and computed cost for every attempt, so you can reconcile against the provider invoice, route to cheaper models via fallback, and cut LLM spend — optimizing your AI tokenomics for your benefit.
Does flo2 support the OpenAI and Anthropic APIs?
Yes. flo2 speaks OpenAI Chat Completions, Responses and legacy Completions, plus the Anthropic Messages API — streaming included. Just change the base URL and use your flo2 key.
Point one key at every model.
Bring the provider keys you already have, wire up a flo2 key, and ship. No token markup — you always pay your providers directly. We just make your keys flow to the right model.