flo2 — LLM Gateway & Router | Cheapest, Fastest Models, One API Key

Trusted in production

Real products, really running on flo2.

Teams already push production traffic through flo2: research tools, global aviation data and high-volume feed pipelines. Real usage, growing every week.

★★★★★

“Nodum turns scattered notes, videos and posts into structured knowledge, and that's a lot of LLM calls. flo2 routes each step to the cheapest model that's good enough and proves the cost per request. We swap models without touching app code; the token accounting alone paid for itself.”

RomanCreator of nodum.space — research & synthesis workspaces

★★★★★

“We serve real-time aviation data to thousands of developers, so predictable cost and uptime matter. flo2 sits in front of every provider with one key, with no token resale, our own prices, real numbers per call. Adding a new model is a dropdown, not a migration.”

SerhiiFounder at AirLabs.co — global aviation data API

★★★★★

“FetchRSS processes a huge volume of feeds, so AI summarization had to be cheap to scale. With flo2 we point one key at the cheapest fast model, fall back automatically, and watch the exact spend per feed. No markup, no lock-in. Exactly what an infra-heavy product needs.”

DmitryFounder of fetchrss.com — web & social to RSS

Unified LLM API

One key. Every model. Fully under your control.

flo2 is the developer-first LLM gateway, router and proxy. It doesn't resell tokens. You bring your own provider keys, and flo2 routes one OpenAI- & Anthropic-compatible API key to the cheapest, fastest models across OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more, with fallback, racing and real cost accounting. The OpenRouter alternative that never marks up your tokens.

Smart LLM routing

Point one flo2 key at any mix of providers and models. Pin a default, restrict to a few, or open it to your whole key collection. One unified LLM API.

Fallback chains

If the primary model errors after N retries, flo2 slides to the next fallback automatically. Drag to reorder priority. No more single-provider outages.

AI racing

Fire free or unstable models in parallel with a head start. The fastest LLM to answer wins, and the rest keep racing in case they finish sooner.

A/B testing

Shadow new models against your live setup, capture both answers, and let a judge model score which is better. You reach model–task fit: the model that actually wins each task, proven by data.

Cost transparency

Every call logs tokens, throughput and computed cost across providers. Full LLM cost observability, so your spend stays clear and easy to optimize.

Drop-in compatible

Speak OpenAI Chat Completions, Responses, legacy Completions or Anthropic Messages, streaming included. Just change the base URL.

Prompt Insights PRO · SOON

After an A/B run, flo2 reads the winning and losing answers next to your prompt, then suggests concrete edits to lift accuracy. Your test data, turned into a sharper prompt.

AI tokenomics

Tokens are your new unit of spend, and costs rise faster than value.

Every prompt, API call and automated workflow burns tokens. The teams that win optimise early. flo2 is the layer that does it for you: route to the cheapest model that clears the bar, cache repeats, fall back on outages, and prove every number. No markup, ever: you always pay your providers directly.

30-second tokenomics check

How much are you overpaying on LLMs?

Your LLM spend / month

$

Share of simple tasks 50% Share of repeated requests 20%

Estimated flo2 savings ~$852/mo ≈ 43%

Right-sizing simple tasks ~$600 · caching repeats ~$252 · + fallback so outages never cost you sales.

Start saving — free →

Indicative, from your inputs. See exact per-model prices →, then your real spend in the cost dashboard.

Smart tokenomics in practice, and the flo2 lever for each:

Right-size models

The right model for the job, not a frontier model for everything. Smart routing plus the A/B judge picks the cheapest model that clears your bar.

↓ up to 80%

Cache & batch

Stop paying twice for the same answer. Response caching (opt-in, your TTL) returns repeats instantly and free.

↓ tokens

Build vs buy

Fine-tuned small versus general large: decide with data, not vibes. A/B plus cost accounting compares them on your real traffic.

right choice

Prompt discipline

You can't cut what you can't see. The true cost dashboard shows tokens and cost per call, so you tighten the expensive ones.

token budget

“I built this flow by hand to cut the LLM bill on my own product (mapa.ua): right-sizing models, caching, and a solid fallback path so an outage never took the feature down. It worked. flo2 is that exact playbook, turned into one key anyone can flip on.”

— the flo2 founder, from a real production stack

Your dashboard

Every call, routed, raced and priced.

Live usage, the true cost per call, and a per-model breakdown across every provider you route to. No markup, just your numbers.

flo2.com · Overview

Usage at a glance

46,100

Requests

12.6M

Input tokens

5.2M

Output tokens

$66.32

Computed cost

99.8%

Success rate

By model

Provider	Model	Tries	OK	Out tokens	Gen TPS	Cost
anthropic	claude-mythos	240	240	96,000	42.0	$48.00
groq	llama-3.3-70b	24,100	24,050	3,360,000	2291.9	$5.84
openai	gpt-4o-mini	14,200	14,170	1,424,000	96.4	$11.86
cerebras	llama-3.1-8b	8,500	8,486	320,000	1180.0	$0.62

Get started

How it works

Three steps from sign-up to your first routed, accounted, streaming completion.

STEP 1

Add your provider keys

Paste keys for OpenAI, Anthropic, Groq, Cerebras, DeepInfra… and set their per-million-token prices.

STEP 2

Wire up a flo2 key

Choose which models it can reach and their roles: default, fallback, racing or A/B.

STEP 3

Point your app at flo2

curl https://flo2.com/api/v1/chat/completions \
  -H "Authorization: Bearer flo_…" \
  -d '{"model":"auto","stream":true,
       "messages":[{"role":"user",
       "content":"hi"}]}'

Privacy by default

Your prompts are yours. We don't hoard them.

Built for teams that can't hand their data to a black box. flo2's default is to know as little as possible.

No content logging by default

flo2 records only metadata: tokens, latency, computed cost. Your prompts and responses are never stored unless you turn it on.

Captured only when you opt in

Content is kept only for the A/B tests and Prompt Insights you explicitly enable, then auto-erased after a short window once you have your result.

Your keys, your data path

BYOK means every request goes straight to the providers you chose. No token resale and no shadow copies. We'd rather earn trust than hoard data.

FAQ

LLM gateway questions, answered

What is an LLM gateway?

An LLM gateway is a single API endpoint in front of multiple model providers: OpenAI, Anthropic, Groq, Cerebras, DeepInfra and more. flo2 is a developer-first LLM gateway and router: one key, every model, with smart routing, fallback, racing and cost accounting.

What is the cheapest LLM API?

The cheapest LLM API depends on your task. flo2 lets you attach every provider key you already have, set per-million-token prices, and route each request to the cheapest model that meets your quality bar, with no token markup, since you pay providers directly.

Is flo2 a good OpenRouter alternative?

Yes. Unlike OpenRouter, flo2 doesn't resell tokens or credits. You bring your own provider keys and flo2 only routes between them: a zero-markup OpenRouter alternative with fallback, racing and a real cost-audit dashboard.

Which is the fastest LLM, and can flo2 pick it automatically?

flo2's racing mode fires several models in parallel and serves whichever responds fastest, so you always get the fastest LLM for that moment without hard-coding one provider.

How does flo2 optimize AI tokenomics and reduce LLM costs?

flo2 logs tokens, throughput and computed cost for every attempt, so you can reconcile against the provider invoice, route to cheaper models via fallback, and cut LLM spend, optimizing your AI tokenomics for your benefit.

Does flo2 support the OpenAI and Anthropic APIs?

Yes. flo2 speaks OpenAI Chat Completions, Responses and legacy Completions, plus the Anthropic Messages API, streaming included. Just change the base URL and use your flo2 key.

Free during Beta

Point one key at every model.

Bring the provider keys you already have, wire up a flo2 key, and ship. No token markup, you always pay your providers directly. We just make your keys flow to the right model.

One key,every AI model.

Real products, really running on flo2.

One key. Every model. Fully under your control.

Smart LLM routing

Fallback chains

AI racing

A/B testing

Cost transparency

Drop-in compatible

Prompt Insights PRO · SOON

Tokens are your new unit of spend, and costs rise faster than value.

How much are you overpaying on LLMs?

Right-size models

Cache & batch

Build vs buy

Prompt discipline

Every call, routed, raced and priced.

How it works

Add your provider keys

Wire up a flo2 key

Point your app at flo2

Your prompts are yours. We don't hoard them.

No content logging by default

Captured only when you opt in

Your keys, your data path

LLM gateway questions, answered

Point one key at every model.

One key,
every AI model.