2026-06-02 · flo2 blog

Best LLM Gateway in 2026: flo2 vs Vercel AI Gateway vs Cloudflare vs Portkey

Choosing an LLM gateway in 2026 is no longer a niche infrastructure decision—it is one of the highest-leverage choices a team building with AI can make. The gateway sits between your application and every model provider you call, so it quietly shapes your bill, your latency, your reliability, and how much you can actually see about what your models are doing. This guide explains what to evaluate an AI gateway on, fairly maps the main categories of options (including Vercel AI Gateway, Cloudflare AI Gateway, and Portkey), and shows where flo2 fits. Specifics about competitors change often, so treat this as a framework and verify current details on each vendor's site before you commit.

What an LLM gateway actually does

An LLM gateway (also called an AI gateway, router, or proxy) gives you one endpoint that fronts many model providers—OpenAI, Anthropic, Google, Mistral, open-weights models, and more. Instead of wiring each provider's SDK and credentials into your app, you call the gateway once and it handles provider selection, authentication, retries, and logging. The good ones add routing logic, fallback chains, caching, and cost visibility on top. The category matters because model prices, availability, and quality shift constantly; a gateway is the layer that lets you adapt without rewriting application code.

How to evaluate an AI gateway: eight criteria

Before comparing names, decide what you are optimizing for. These are the dimensions that separate gateways in practice:

Pricing model (markup vs. BYOK). The single biggest cost question. Does the gateway resell tokens with a margin, charge a platform/subscription fee, or let you bring your own provider keys (BYOK) and pay providers directly at their list price? Markup is invisible until it compounds at scale.
Unified API and provider coverage. One consistent API across many providers is the core value. Check whether it is OpenAI-compatible, Anthropic-compatible, or both, and how many providers and models are supported.
Routing, fallback, and racing. Can you route by cost, latency, or task; fail over automatically when a provider errors or rate-limits; and optionally race several models to take the fastest response?
Caching. Opt-in response caching can cut both latency and spend dramatically for repeated or similar prompts. Confirm it is controllable, not forced.
Observability and cost accounting. You want per-call logs and true per-call cost accounting—real dollars per request, per model, per user—not just aggregate token counts.
Data and privacy path. Where do your prompts and completions flow, what is logged, and for how long? This matters for compliance and for sensitive workloads.
Self-host option. Some teams need to run the gateway in their own infrastructure for data-residency or control reasons. Others prefer a managed service. Know which you need.
Ease of adoption. Drop-in compatibility with the OpenAI or Anthropic SDKs means you change a base URL and a key—not your whole codebase.

The main categories of LLM gateways

Most products on the market fall into a handful of categories. Naming the category first tells you more about a tool's economics and trade-offs than any single feature does.

Cloud-vendor gateways

Vercel AI Gateway, Cloudflare AI Gateway, and platform offerings like Kong, Azure, and Databricks AI gateways are tightly integrated with their parent ecosystems. If you already deploy on Vercel, Cloudflare, Azure, or Databricks, the convenience is real—unified billing, native dashboards, and minimal new vendors. The trade-offs to verify: these gateways are often most attractive when you stay inside their platform, and some add a margin on token spend or bundle the gateway into broader platform pricing. Check each vendor's current pricing page to see whether you pay provider rates directly or through their meter.

Observability-first proxies

Portkey and Helicone built their reputations on strong logging, tracing, analytics, and caching. If your top priority is seeing and debugging what your LLM calls do—rich dashboards, request inspection, cache hit rates—these are well-regarded in the category. They typically support routing and fallback as well. Pricing usually centers on a subscription or usage tier for the observability layer rather than a token markup, but confirm the current model and whether you bring your own provider keys.

Open-source self-host

LiteLLM is the best-known open-source option for teams that want to run the gateway themselves. You get a unified API across many providers and full control over the data path, at the cost of operating and scaling the infrastructure yourself. This is the natural pick when self-hosting is a hard requirement and you have the engineering capacity to maintain it.

Token resellers

OpenRouter popularized buying access to many models through a single resold-credit balance. It is genuinely convenient for breadth and for quick experimentation across models you do not have direct accounts for. The structural trade-off is that you are buying resold credits rather than paying providers directly, so the economics differ from a pure BYOK approach—worth weighing if cost transparency at scale is a priority.

Where flo2 fits

flo2 is a developer-first gateway built around a specific stance: zero markup with bring-your-own-keys. You connect your own provider accounts, you pay each provider directly at their price, and flo2 does not add a per-token margin on top. In exchange for one OpenAI- and Anthropic-compatible key, you get the routing layer most teams assemble by hand:

Smart routing to send each request to the right model by your rules.
Fallback chains so a provider outage or rate limit transparently fails over to the next option.
AI racing to fire several models in parallel and return the fastest response.
A/B testing with a judge that scores "model–task fit," so you can pick models on evidence rather than vibes.
Opt-in response caching to cut latency and spend where it is safe to do so.
True per-call cost accounting—real dollars per request, not just token tallies.
Drop-in OpenAI/Anthropic compatibility, so adoption is usually a base-URL and key change.

flo2 is free during its Beta. The niche is clear: teams that want experimentation features (racing, A/B, fallback) and honest cost visibility, without paying a markup on every token to get them.

Comparison at a glance

The table below compares categories, not exact current prices—vendor terms change, so the honest cells say "varies / check vendor." Use it to shortlist, then confirm specifics on each site.

Criterion	flo2	Cloud-vendor (Vercel, Cloudflare, Azure, Kong, Databricks)	Observability proxies (Portkey, Helicone)	Open-source self-host (LiteLLM)	Token resellers (OpenRouter)
Pricing model	BYOK; free during Beta	Platform pricing / metered — varies	Subscription or usage tier — varies	Free software; you run the infra	Resold credits — check vendor
Token markup	Zero	Sometimes a margin / platform meter — check vendor	Typically none on tokens — check vendor	None (you pay providers directly)	Built into resold credit pricing
Pay providers directly (BYOK)	Yes	Varies — check vendor	Often yes — check vendor	Yes	No (resold)
Routing / fallback / racing	Yes (incl. racing + A/B judge)	Routing/fallback common; racing varies	Routing/fallback common; racing varies	Routing/fallback yes; racing varies	Routing across models; fallback varies
Cost accounting	True per-call dollars	Dashboards vary by platform	Strong analytics and logging	Self-instrumented	Credit-based usage view
Self-host	Managed (BYOK)	Managed (their cloud)	Managed; some self-host — check vendor	Yes (core strength)	Managed
Drop-in OpenAI/Anthropic API	Both	Often OpenAI-compatible — check vendor	Often both — check vendor	OpenAI-compatible	OpenAI-compatible

How to pick the right gateway

There is no single best LLM gateway—only the best fit for your constraints. A few decision shortcuts:

If you live inside one cloud and value unified billing over cost optimization, a cloud-vendor gateway is the path of least resistance—just confirm whether it adds a token margin.
If observability is your pain—you need deep logs, traces, and cache analytics—an observability-first proxy like Portkey or Helicone is a strong starting point.
If self-hosting is non-negotiable and you have the ops capacity, LiteLLM gives you full control of the data path.
If you want maximum model breadth fast without opening provider accounts, a reseller like OpenRouter is convenient—accepting that you buy resold credits.
If you want zero markup, your own keys, and routing/racing/fallback with honest per-call costs, that is exactly the gap flo2 is built for.

Whatever you shortlist, test it against your real traffic: measure latency under fallback, verify the cost numbers against your provider invoices, and confirm the data path meets your compliance needs. If zero-markup BYOK plus smart routing, racing, A/B testing, and true cost accounting match your priorities, flo2 is free to try during Beta—point your existing OpenAI or Anthropic SDK at it and compare for yourself.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →