2026-06-03 · flo2 blog

Zero-Markup LLM Gateway: Pay Providers Directly with BYOK

When you route your application's LLM calls through a gateway, you give up some control in exchange for convenience. The question is what exactly you give up. A zero markup LLM gateway takes routing, fallback, caching, and cost tracking off your plate — but it never touches your token economics. You bring your own provider keys, your calls hit OpenAI or Anthropic or Google directly, and you pay those providers at their published list price. The gateway earns nothing on each token. That single constraint changes a lot about what you can trust, audit, and negotiate.

What "Zero Markup" Actually Means

A token markup is the spread between what a provider charges for a token and what you pay to the layer sitting in front of it. Even a small markup — a few percent — compounds fast at scale. Ten million tokens a day at a 5% spread is real money every month, and the spread is often invisible in a single blended invoice.

Zero markup means the gateway never resells tokens at all. There is no spread because the gateway is not in the purchase chain. Here is the mechanics:

This arrangement is sometimes called BYOK — bring your own key. The routing and orchestration are the product; the token sale is not. For a deeper look at BYOK as a pattern, see BYOK explained.

How Credit-Based Aggregators Work (and Why Both Models Exist)

The dominant alternative is the credit or aggregator model. You deposit funds with the platform, receive a single API key, and calls are debited from your balance at the platform's rates. The platform buys inference wholesale from providers and resells it to you.

This model is genuinely convenient:

The trade-off is structural. An aggregator that resells tokens has to price them above cost — that margin funds the service. The exact economics are often invisible: a blended rate, a deposit fee, or a spread on routed traffic. None of that is dishonest, but it does mean the price you pay is not always the provider's published price, and reconciling your bill against provider list prices is genuinely hard. See our OpenRouter alternative article for a concrete comparison of the aggregator trade-offs.

Side-by-Side: Zero-Markup BYOK vs Credit Resale

Dimension Zero-Markup BYOK Gateway Credit / Aggregator Model
Who pays the provider You — directly, via your own key The platform — it bills you separately
Token price Provider's published list price Platform rate (may include markup)
Cost transparency Per-call cost reconciles against provider invoices Blended rate; full reconciliation is hard
Provider discounts You keep them — committed-use or volume deals accrue to you Accrues to the platform, not you
Provider relationship Direct — data terms, enterprise agreements, rate limits are yours Mediated — provider relationship belongs to the platform
Model catalog Whatever providers you have keys for Potentially broader — providers you have no account with
Setup friction Higher — you manage provider accounts Lower — one signup, one balance
Lock-in Low — keys are yours, provider relationships are yours Moderate — credits tied to platform
Gateway revenue model Subscription or platform fee Token margin (and/or platform fee)

Why Zero Markup Matters at Scale

Cost transparency you can actually audit

With a BYOK gateway, your gateway's cost log and your provider invoices should agree to the token. Every call has a known model, a known token count, and a known per-token price — the provider's published price, which you can look up. This makes per-feature, per-team, or per-customer cost accounting defensible rather than approximate. When a product manager asks why the AI line item doubled last month, you can show them exactly which model, which feature, and how many tokens drove the change.

With a reseller, the math is harder. You see what left your credit balance, but reconciling that against what the model actually cost the provider — and understanding the spread — requires work the platform does not always make easy.

You keep your volume discounts and committed-use rates

OpenAI, Anthropic, and Google all offer meaningful discounts at volume — committed-use agreements, batch pricing, tier discounts for high-spend accounts. If you buy through a reseller, those discounts accrue to the reseller. If you hold the provider relationship directly, they accrue to you. For teams spending tens or hundreds of thousands of dollars per month on inference, this is not academic — it is the difference between the gateway saving you money and costing you money relative to going direct.

Data and compliance terms stay direct

Enterprise data-processing agreements, zero-data-retention commitments, and HIPAA or SOC 2 terms are between you and the provider. With a BYOK gateway, those agreements are yours and apply to every call the gateway routes on your behalf. When a reseller holds the provider relationship, you are subject to the reseller's terms with the provider — which may not match what you negotiated or what your compliance posture requires.

No pricing opacity, no lock-in

No token markup and no lock-in tend to go together. If the gateway doubles its platform fee or shuts down, your provider keys still work and your provider accounts are intact. You can migrate to a different routing layer — or go direct — without any provider-side disruption. With a credit balance on an aggregator, a price change on routed models can move your economics without notice, and the credits you have prepaid are a switching cost.

How a Zero-Markup Gateway Makes Money

The obvious question: if the gateway does not earn a token margin, what funds it? The honest answer is a platform fee — subscription, seat-based, or usage-based on the gateway service itself rather than on tokens. This aligns the gateway's incentives with yours: it has no reason to route traffic through more expensive models to widen a spread, because it earns nothing on the token spend. Its only incentive is to make the routing, caching, and reliability valuable enough that you keep paying the platform fee.

Some BYOK gateways also run free during early access or beta phases, which is worth taking advantage of — the same routing infrastructure that costs nothing now will compound savings as your volume grows.

flo2: A Zero-Markup BYOK LLM Gateway

flo2 is built on the LLM gateway no resale model. You bring your own keys for OpenAI, Anthropic, Google, and other providers; flo2 routes your calls using those keys and never touches your token economics. The gateway issues you one OpenAI- and Anthropic-compatible key, so existing SDK code works without changes.

Beyond the zero-markup model, flo2 adds the routing features that make a gateway worth running:

flo2 is free during beta. If you are spending real money on inference and want cost transparency, provider discounts, and no token markup, it is worth a look at flo2.

One key, every model — zero markup.
Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. flow → to