BYOK vs Credits: Two Ways to Pay for LLM Access
When you sign up for an LLM service, you quickly face a fork: pay into a shared credit balance or bring your own key and pay providers directly. The byok vs credits question is not just about price — it shapes billing transparency, compliance posture, vendor risk, and how much your costs grow with scale. This article breaks down both models, shows where each one wins, and helps you choose the right approach for where your project is today.
How Prepaid LLM Credits Work
The credit model is the most common entry point for developers experimenting with language models. You deposit funds into a platform balance — think OpenRouter, a hosted gateway, or any aggregator — and that balance is debited as you make calls. The platform holds accounts with the underlying providers, buys inference at negotiated or wholesale rates, and resells tokens to you at its own pricing.
This model is genuinely convenient, especially early on:
- One balance, many models. A single top-up gives you access to dozens or hundreds of models across OpenAI, Anthropic, Google, Mistral, Groq, and more — without opening separate accounts.
- Instant access. Deposit with a card and you can call a frontier model in minutes, even for providers that have waitlists or require business verification.
- Consolidated invoicing. One statement instead of five or six provider bills, which matters when a finance team needs a simple record.
- No key management. You receive a single API key from the aggregator and never touch provider dashboards.
The trade-off is that you are buying resold inference. The platform sets its own prices, and the spread between what it pays providers and what it charges you is how it earns revenue. That spread is not always obvious — per-token rates may look close to list price, but deposit fees, minimum balances, or blended routing across cheaper infrastructure can obscure the real cost. You also inherit the aggregator's rate limits, uptime, terms of service, and data-handling practices, because your traffic runs through its infrastructure using its provider accounts.
How BYOK (Bring Your Own Key) Works
The BYOK explained pattern decouples the convenience layer from the billing relationship. You create API keys directly in each provider's dashboard — OpenAI, Anthropic, Google, Groq, Mistral — and register those keys with a gateway. When your application calls the gateway, it routes the request to the right provider using your key. The provider charges your account at its published list price. The gateway never sits in the token transaction at all.
What you gain:
- Zero markup on tokens. You pay exactly what the provider publishes. No spread, no hidden margin. A zero-markup LLM gateway earns revenue from its service — not from reselling your inference.
- Your volume discounts carry over. If you have negotiated a committed-use discount with Anthropic or an enterprise rate with OpenAI, those savings flow through automatically because your keys are being used.
- Direct data agreements. Your traffic goes to the provider under your account and your terms — not through an intermediary's infrastructure. This matters for SOC 2, HIPAA, GDPR, and similar compliance contexts where the data path is subject to audit.
- No aggregator dependency. If the aggregator changes pricing, suspends your account, or shuts down, your relationship with providers is unaffected. You can swap gateways without re-establishing provider access.
- Transparent cost accounting. Gateway-reported costs reconcile directly against provider invoices, making per-team or per-feature attribution precise rather than estimated.
The real costs of BYOK are operational, not financial. You manage multiple provider accounts, multiple invoices, and multiple API keys. Onboarding a new provider means creating an account and going through any verification process they require. For very early-stage projects, this overhead can slow you down when you just want to prototype.
Comparing the Two Models
| Dimension | Prepaid credits / aggregator | BYOK (direct provider keys) |
|---|---|---|
| Token pricing | Aggregator's rate; markup possible | Provider list price; no markup |
| Volume discounts | Aggregator's negotiated rates (may or may not pass through) | Your own discounts apply automatically |
| Billing | One balance, one invoice | Separate invoice per provider |
| Model access | Instant; aggregator handles accounts | You must open accounts per provider |
| Data path | Through aggregator infrastructure | Direct to provider under your account |
| Compliance | Inherits aggregator's terms and DPA | Your DPA with each provider; auditable |
| Vendor lock-in | Balance tied to aggregator | Gateway-agnostic; keys are yours |
| Setup effort | Minutes | Hours (one-time, per provider) |
| Cost transparency | Can be opaque across providers | Fully transparent; reconciles to invoice |
When Credits Make Sense
Prepaid credits are a reasonable choice in specific situations:
- Prototyping and early experimentation. If you are trying five different models to find the right one, an aggregator gets you running in minutes without account overhead.
- Access to providers you cannot easily onboard. Some providers have waitlists or require business verification that takes time. An aggregator can bridge the gap.
- Minimal operational capacity. Solo developers or very small teams who do not want to manage multiple billing accounts may prefer the simplicity.
- Low and unpredictable volumes. At small scale, the markup on credits is a small absolute cost, and the convenience may justify it.
Credits are a practical starting point. The issue is that many teams stay on them longer than makes sense, even as volume grows and the markup compounds.
When BYOK Makes Sense
The balance tilts toward BYOK as soon as any of the following apply:
- Meaningful token volume. A 5–10% markup on a million tokens a day is a real line item. At scale, the one-time overhead of opening provider accounts pays for itself quickly.
- Existing negotiated rates. If you have committed-use contracts or enterprise pricing with a provider, BYOK is the only way to actually use those rates through a gateway.
- Compliance or data residency requirements. Security reviews, SOC 2 audits, and HIPAA BAAs require you to know where data travels. BYOK gives you a clear, direct answer.
- Cost attribution by team or feature. When you need to allocate LLM spend across products or business units, per-provider invoices and gateway-level logging give you numbers that reconcile.
- Reducing vendor dependency. Keeping your provider relationships independent means you can change gateways or add routing logic without renegotiating access.
A note on the transition
Teams often start on credits and move to BYOK when their volume justifies it. The transition is not painful if your gateway code is already abstracted behind a single endpoint — you update key configuration, not application code. The main work is opening provider accounts, which is a one-time task.
flo2: A BYOK Gateway With No Token Markup
flo2 is built specifically for the BYOK model. You register your provider API keys, and flo2 uses them to route requests — applying fallback, model selection, and cost tracking — while charging nothing on top of what providers charge you. There is no token resale, no credit balance to manage, and no markup to account for. During the current beta, the gateway service itself is free.
For teams running meaningful LLM workloads who want routing and observability without giving up price transparency, flo2 is worth evaluating alongside the aggregator options you may already be using. If you are already on a credit model and the math is starting to matter, the comparison is straightforward: add your provider keys, point your application at the flo2 endpoint, and compare what you see on provider invoices against what you were paying before.
Both models serve real needs. Credits are the right starting point for many projects. BYOK is usually the right destination once volume, compliance, or cost control becomes a serious concern — and a zero-markup LLM gateway like flo2 is how you get the routing benefits of an aggregator without the token margin.