2026-06-02 · flo2 blog

Best LLM Gateway in 2026: flo2 vs Vercel AI Gateway vs Cloudflare vs Portkey

Choosing an LLM gateway in 2026 is no longer a niche infrastructure decision—it is one of the highest-leverage choices a team building with AI can make. The gateway sits between your application and every model provider you call, so it quietly shapes your bill, your latency, your reliability, and how much you can actually see about what your models are doing. This guide explains what to evaluate an AI gateway on, fairly maps the main categories of options (including Vercel AI Gateway, Cloudflare AI Gateway, and Portkey), and shows where flo2 fits. Specifics about competitors change often, so treat this as a framework and verify current details on each vendor's site before you commit.

What an LLM gateway actually does

An LLM gateway (also called an AI gateway, router, or proxy) gives you one endpoint that fronts many model providers—OpenAI, Anthropic, Google, Mistral, open-weights models, and more. Instead of wiring each provider's SDK and credentials into your app, you call the gateway once and it handles provider selection, authentication, retries, and logging. The good ones add routing logic, fallback chains, caching, and cost visibility on top. The category matters because model prices, availability, and quality shift constantly; a gateway is the layer that lets you adapt without rewriting application code.

How to evaluate an AI gateway: eight criteria

Before comparing names, decide what you are optimizing for. These are the dimensions that separate gateways in practice:

The main categories of LLM gateways

Most products on the market fall into a handful of categories. Naming the category first tells you more about a tool's economics and trade-offs than any single feature does.

Cloud-vendor gateways

Vercel AI Gateway, Cloudflare AI Gateway, and platform offerings like Kong, Azure, and Databricks AI gateways are tightly integrated with their parent ecosystems. If you already deploy on Vercel, Cloudflare, Azure, or Databricks, the convenience is real—unified billing, native dashboards, and minimal new vendors. The trade-offs to verify: these gateways are often most attractive when you stay inside their platform, and some add a margin on token spend or bundle the gateway into broader platform pricing. Check each vendor's current pricing page to see whether you pay provider rates directly or through their meter.

Observability-first proxies

Portkey and Helicone built their reputations on strong logging, tracing, analytics, and caching. If your top priority is seeing and debugging what your LLM calls do—rich dashboards, request inspection, cache hit rates—these are well-regarded in the category. They typically support routing and fallback as well. Pricing usually centers on a subscription or usage tier for the observability layer rather than a token markup, but confirm the current model and whether you bring your own provider keys.

Open-source self-host

LiteLLM is the best-known open-source option for teams that want to run the gateway themselves. You get a unified API across many providers and full control over the data path, at the cost of operating and scaling the infrastructure yourself. This is the natural pick when self-hosting is a hard requirement and you have the engineering capacity to maintain it.

Token resellers

OpenRouter popularized buying access to many models through a single resold-credit balance. It is genuinely convenient for breadth and for quick experimentation across models you do not have direct accounts for. The structural trade-off is that you are buying resold credits rather than paying providers directly, so the economics differ from a pure BYOK approach—worth weighing if cost transparency at scale is a priority.

Where flo2 fits

flo2 is a developer-first gateway built around a specific stance: zero markup with bring-your-own-keys. You connect your own provider accounts, you pay each provider directly at their price, and flo2 does not add a per-token margin on top. In exchange for one OpenAI- and Anthropic-compatible key, you get the routing layer most teams assemble by hand:

flo2 is free during its Beta. The niche is clear: teams that want experimentation features (racing, A/B, fallback) and honest cost visibility, without paying a markup on every token to get them.

Comparison at a glance

The table below compares categories, not exact current prices—vendor terms change, so the honest cells say "varies / check vendor." Use it to shortlist, then confirm specifics on each site.

Criterionflo2Cloud-vendor (Vercel, Cloudflare, Azure, Kong, Databricks)Observability proxies (Portkey, Helicone)Open-source self-host (LiteLLM)Token resellers (OpenRouter)
Pricing modelBYOK; free during BetaPlatform pricing / metered — variesSubscription or usage tier — variesFree software; you run the infraResold credits — check vendor
Token markupZeroSometimes a margin / platform meter — check vendorTypically none on tokens — check vendorNone (you pay providers directly)Built into resold credit pricing
Pay providers directly (BYOK)YesVaries — check vendorOften yes — check vendorYesNo (resold)
Routing / fallback / racingYes (incl. racing + A/B judge)Routing/fallback common; racing variesRouting/fallback common; racing variesRouting/fallback yes; racing variesRouting across models; fallback varies
Cost accountingTrue per-call dollarsDashboards vary by platformStrong analytics and loggingSelf-instrumentedCredit-based usage view
Self-hostManaged (BYOK)Managed (their cloud)Managed; some self-host — check vendorYes (core strength)Managed
Drop-in OpenAI/Anthropic APIBothOften OpenAI-compatible — check vendorOften both — check vendorOpenAI-compatibleOpenAI-compatible

How to pick the right gateway

There is no single best LLM gateway—only the best fit for your constraints. A few decision shortcuts:

Whatever you shortlist, test it against your real traffic: measure latency under fallback, verify the cost numbers against your provider invoices, and confirm the data path meets your compliance needs. If zero-markup BYOK plus smart routing, racing, A/B testing, and true cost accounting match your priorities, flo2 is free to try during Beta—point your existing OpenAI or Anthropic SDK at it and compare for yourself.

One key, every model — zero markup.
Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. flow → to