Zero-Markup LLM Gateway: Pay Providers Directly with BYOK
When you route your application's LLM calls through a gateway, you give up some control in exchange for convenience. The question is what exactly you give up. A zero markup LLM gateway takes routing, fallback, caching, and cost tracking off your plate — but it never touches your token economics. You bring your own provider keys, your calls hit OpenAI or Anthropic or Google directly, and you pay those providers at their published list price. The gateway earns nothing on each token. That single constraint changes a lot about what you can trust, audit, and negotiate.
What "Zero Markup" Actually Means
A token markup is the spread between what a provider charges for a token and what you pay to the layer sitting in front of it. Even a small markup — a few percent — compounds fast at scale. Ten million tokens a day at a 5% spread is real money every month, and the spread is often invisible in a single blended invoice.
Zero markup means the gateway never resells tokens at all. There is no spread because the gateway is not in the purchase chain. Here is the mechanics:
- You create API keys in each provider's dashboard — OpenAI, Anthropic, Google, Groq, Mistral, whatever you use.
- You add those keys to the gateway, encrypted and scoped to your account.
- Your application calls one gateway endpoint with a single issued key.
- The gateway routes the call to the right provider using your key. The provider charges your account at its list price.
- The gateway sends you only the bill for its own service — not for tokens.
This arrangement is sometimes called BYOK — bring your own key. The routing and orchestration are the product; the token sale is not. For a deeper look at BYOK as a pattern, see BYOK explained.
How Credit-Based Aggregators Work (and Why Both Models Exist)
The dominant alternative is the credit or aggregator model. You deposit funds with the platform, receive a single API key, and calls are debited from your balance at the platform's rates. The platform buys inference wholesale from providers and resells it to you.
This model is genuinely convenient:
- One balance, many models. Access hundreds of models across providers without managing separate accounts.
- Instant start. Top up with a card and call frontier models in minutes, even for providers where you have no account.
- Consolidated billing. One invoice instead of five or six provider statements.
The trade-off is structural. An aggregator that resells tokens has to price them above cost — that margin funds the service. The exact economics are often invisible: a blended rate, a deposit fee, or a spread on routed traffic. None of that is dishonest, but it does mean the price you pay is not always the provider's published price, and reconciling your bill against provider list prices is genuinely hard. See our OpenRouter alternative article for a concrete comparison of the aggregator trade-offs.
Side-by-Side: Zero-Markup BYOK vs Credit Resale
| Dimension | Zero-Markup BYOK Gateway | Credit / Aggregator Model |
|---|---|---|
| Who pays the provider | You — directly, via your own key | The platform — it bills you separately |
| Token price | Provider's published list price | Platform rate (may include markup) |
| Cost transparency | Per-call cost reconciles against provider invoices | Blended rate; full reconciliation is hard |
| Provider discounts | You keep them — committed-use or volume deals accrue to you | Accrues to the platform, not you |
| Provider relationship | Direct — data terms, enterprise agreements, rate limits are yours | Mediated — provider relationship belongs to the platform |
| Model catalog | Whatever providers you have keys for | Potentially broader — providers you have no account with |
| Setup friction | Higher — you manage provider accounts | Lower — one signup, one balance |
| Lock-in | Low — keys are yours, provider relationships are yours | Moderate — credits tied to platform |
| Gateway revenue model | Subscription or platform fee | Token margin (and/or platform fee) |
Why Zero Markup Matters at Scale
Cost transparency you can actually audit
With a BYOK gateway, your gateway's cost log and your provider invoices should agree to the token. Every call has a known model, a known token count, and a known per-token price — the provider's published price, which you can look up. This makes per-feature, per-team, or per-customer cost accounting defensible rather than approximate. When a product manager asks why the AI line item doubled last month, you can show them exactly which model, which feature, and how many tokens drove the change.
With a reseller, the math is harder. You see what left your credit balance, but reconciling that against what the model actually cost the provider — and understanding the spread — requires work the platform does not always make easy.
You keep your volume discounts and committed-use rates
OpenAI, Anthropic, and Google all offer meaningful discounts at volume — committed-use agreements, batch pricing, tier discounts for high-spend accounts. If you buy through a reseller, those discounts accrue to the reseller. If you hold the provider relationship directly, they accrue to you. For teams spending tens or hundreds of thousands of dollars per month on inference, this is not academic — it is the difference between the gateway saving you money and costing you money relative to going direct.
Data and compliance terms stay direct
Enterprise data-processing agreements, zero-data-retention commitments, and HIPAA or SOC 2 terms are between you and the provider. With a BYOK gateway, those agreements are yours and apply to every call the gateway routes on your behalf. When a reseller holds the provider relationship, you are subject to the reseller's terms with the provider — which may not match what you negotiated or what your compliance posture requires.
No pricing opacity, no lock-in
No token markup and no lock-in tend to go together. If the gateway doubles its platform fee or shuts down, your provider keys still work and your provider accounts are intact. You can migrate to a different routing layer — or go direct — without any provider-side disruption. With a credit balance on an aggregator, a price change on routed models can move your economics without notice, and the credits you have prepaid are a switching cost.
How a Zero-Markup Gateway Makes Money
The obvious question: if the gateway does not earn a token margin, what funds it? The honest answer is a platform fee — subscription, seat-based, or usage-based on the gateway service itself rather than on tokens. This aligns the gateway's incentives with yours: it has no reason to route traffic through more expensive models to widen a spread, because it earns nothing on the token spend. Its only incentive is to make the routing, caching, and reliability valuable enough that you keep paying the platform fee.
Some BYOK gateways also run free during early access or beta phases, which is worth taking advantage of — the same routing infrastructure that costs nothing now will compound savings as your volume grows.
flo2: A Zero-Markup BYOK LLM Gateway
flo2 is built on the LLM gateway no resale model. You bring your own keys for OpenAI, Anthropic, Google, and other providers; flo2 routes your calls using those keys and never touches your token economics. The gateway issues you one OpenAI- and Anthropic-compatible key, so existing SDK code works without changes.
Beyond the zero-markup model, flo2 adds the routing features that make a gateway worth running:
- Smart routing. Route each call to the cheapest or fastest model that meets your latency and quality requirements, automatically.
- Fallback and racing. If a provider returns an error or is slow, the gateway retries on a backup provider or races multiple providers and returns the first good response.
- A/B testing with a judge. Split traffic across models and score responses automatically to find which model performs best for your use case.
- Caching. Identical or semantically similar prompts return cached responses at zero token cost.
- True per-call cost accounting. Every call has an exact cost tied to the provider's published price — no estimates, no blended rates.
flo2 is free during beta. If you are spending real money on inference and want cost transparency, provider discounts, and no token markup, it is worth a look at flo2.