Tokens routed to date 238,724,699,592

One key,
every AI model.

flo2 routes your app to OpenAI, Anthropic, Groq, Cerebras & more — and auto-picks the cheapest, fastest one for every call.

Zero markup · bring your own keys · private by default · free during Beta

Enter your email — we'll pick the right sign-in method on the next step.

◇ OpenAI & Anthropic compatible ◇ Zero token markup ◇ Live in 2 minutes
Routes to
Trusted in production

Real products, really running on flo2.

Teams already push production traffic through flo2 — research tools, global aviation data and high-volume feed pipelines. Real usage, growing every week.

★★★★★
“Nodum turns scattered notes, videos and posts into structured knowledge — that's a lot of LLM calls. flo2 routes each step to the cheapest model that's good enough and proves the cost per request. We swap models without touching app code; the token accounting alone paid for itself.”
RomanCreator of nodum.space — research & synthesis workspaces
★★★★★
“We serve real-time aviation data to thousands of developers, so predictable cost and uptime matter. flo2 sits in front of every provider with one key — no token resale, our own prices, real numbers per call. Adding a new model is a dropdown, not a migration.”
SerhiiFounder at AirLabs.co — global aviation data API
★★★★★
“FetchRSS processes a huge volume of feeds, so AI summarization had to be cheap to scale. With flo2 we point one key at the cheapest fast model, fall back automatically, and watch the exact spend per feed. No markup, no lock-in — exactly what an infra-heavy product needs.”
DmitryFounder of fetchrss.com — web & social to RSS
Unified LLM API

One key. Every model. Fully under your control.

flo2 is the developer-first LLM gateway, router and proxy. It doesn't resell tokens — you bring your own provider keys, and flo2 routes one OpenAI- & Anthropic-compatible API key to the cheapest, fastest models across OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more, with fallback, racing and real cost accounting. The OpenRouter alternative that never marks up your tokens.

🔀

Smart LLM routing

Point one flo2 key at any mix of providers and models. Pin a default, restrict to a few, or open it to your whole key collection — a true unified LLM API.

🛟

Fallback chains

If the primary model errors after N retries, flo2 slides to the next fallback automatically. Drag to reorder priority. No more single-provider outages.

🏁

AI racing

Fire free or unstable models in parallel with a head start. The fastest LLM to answer wins; the rest keep racing in case they finish sooner.

🧪

A/B testing

Shadow new models against your live setup, capture both answers, and let a judge model score which is better — so you reach model–task fit: the model that actually wins each task, proven by data.

📊

Cost transparency

Every call logs tokens, throughput and computed cost across providers — full LLM cost observability, so your spend stays clear and easy to optimize.

🔁

Drop-in compatible

Speak OpenAI Chat Completions, Responses, legacy Completions or Anthropic Messages — streaming included. Just change the base URL.

💡

Prompt Insights PRO · SOON

After an A/B run, flo2 reads the winning vs losing answers and your prompt, then suggests concrete edits to lift accuracy — your test data, turned into a sharper prompt.

AI tokenomics

Tokens are your new unit of spend — and costs rise faster than value.

Every prompt, API call and automated workflow burns tokens. The teams that win optimise early. flo2 is the layer that does it for you — route to the cheapest model that clears the bar, cache repeats, fall back on outages, and prove every number. No markup, ever: you always pay your providers directly.

30-second tokenomics check

How much are you overpaying on LLMs?

Estimated flo2 savings ~$852/mo ≈ 43%
Right-sizing simple tasks ~$600 · caching repeats ~$252 · + fallback so outages never cost you sales.
Start saving — free →
Indicative, from your inputs — see exact per-model prices →, then your real spend in the cost dashboard.

Smart tokenomics in practice — and the flo2 lever for each:

🎚️

Right-size models

The right model for the job — not a frontier model for everything. Smart routing + A/B judge picks the cheapest model that clears your bar.

↓ up to 80%

Cache & batch

Stop paying twice for the same answer. Response caching (opt-in, your TTL) returns repeats instantly and free.

↓ tokens
🧩

Build vs buy

Fine-tuned small vs general large — decide with data, not vibes. A/B + cost accounting compares them on your real traffic.

right choice
📊

Prompt discipline

You can't cut what you can't see. The true cost dashboard shows tokens & cost per call, so you tighten the expensive ones.

token budget
“I built this flow by hand to cut the LLM bill on my own product (mapa.ua) — right-sizing models, caching, and a bullet-proof fallback path so an outage never took the feature down. It worked. flo2 is that exact playbook, turned into one key anyone can flip on.”
— the flo2 founder, from a real production stack
Your dashboard

Every call — routed, raced and priced.

Live usage, the true cost per call, and a per-model breakdown across every provider you route to. No markup — just your numbers.

Get started

How it works

Three steps from sign-up to your first routed, accounted, streaming completion.

STEP 1

Add your provider keys

Paste keys for OpenAI, Anthropic, Groq, Cerebras, DeepInfra… and set their per-million-token prices.

STEP 2

Wire up a flo2 key

Choose which models it can reach and their roles: default, fallback, racing or A/B.

STEP 3

Point your app at flo2

curl https://flo2.com/api/v1/chat/completions \
  -H "Authorization: Bearer flo_…" \
  -d '{"model":"auto","stream":true,
       "messages":[{"role":"user",
       "content":"hi"}]}'
Privacy by default

Your prompts are yours. We don't hoard them.

Built for teams that can't hand their data to a black box — flo2's default is to know as little as possible.

🙈

No content logging by default

flo2 records only metadata — tokens, latency, computed cost. Your prompts and responses are never stored unless you turn it on.

🧪

Captured only when you opt in

Content is kept only for the A/B tests and Prompt Insights you explicitly enable — and auto-erased after a short window once you have your result.

🔑

Your keys, your data path

BYOK means every request goes straight to the providers you chose. No token resale, no shadow copies — we'd rather earn trust than hoard data.

FAQ

LLM gateway questions, answered

What is an LLM gateway?

An LLM gateway is a single API endpoint in front of multiple model providers — OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more. flo2 is a developer-first LLM gateway and router: one key, every model, with smart routing, fallback, racing and cost accounting.

What is the cheapest LLM API?

The cheapest LLM API depends on your task. flo2 lets you attach every provider key you already have, set per-million-token prices, and route each request to the cheapest model that meets your quality bar — with no token markup, since you pay providers directly.

Is flo2 a good OpenRouter alternative?

Yes. Unlike OpenRouter, flo2 doesn't resell tokens or credits. You bring your own provider keys and flo2 only routes between them — a zero-markup OpenRouter alternative with fallback, racing and a real cost-audit dashboard.

Which is the fastest LLM, and can flo2 pick it automatically?

flo2's racing mode fires several models in parallel and serves whichever responds fastest, so you always get the fastest LLM for that moment without hard-coding one provider.

How does flo2 optimize AI tokenomics and reduce LLM costs?

flo2 logs tokens, throughput and computed cost for every attempt, so you can reconcile against the provider invoice, route to cheaper models via fallback, and cut LLM spend — optimizing your AI tokenomics for your benefit.

Does flo2 support the OpenAI and Anthropic APIs?

Yes. flo2 speaks OpenAI Chat Completions, Responses and legacy Completions, plus the Anthropic Messages API — streaming included. Just change the base URL and use your flo2 key.

Free during Beta

Point one key at every model.

Bring the provider keys you already have, wire up a flo2 key, and ship. No token markup — you always pay your providers directly. We just make your keys flow to the right model.