Portkey AI Gateway: Overview, Features & Alternatives
If you are evaluating a control layer to sit in front of every model you call, the Portkey AI gateway will almost certainly land on your shortlist. Portkey is an AI gateway that gives you unified API access to many providers behind one interface, then layers on routing, fallbacks and retries, load balancing, caching, observability, guardrails, and prompt management. It ships in two shapes: an open-source gateway you can self-host, and a hosted platform built around it. This overview explains what Portkey is, what it does well, what to confirm before you commit, and how a zero-markup, bring-your-own-key alternative compares. For the wider category, start with what is an LLM gateway.
What is Portkey AI?
At its core, Portkey is a unified API and control plane for LLM traffic. Instead of wiring up a separate SDK for OpenAI, Anthropic, Google, and every open-weight host you want to reach, you send requests through Portkey and it normalizes them into one interface while forwarding to the upstream provider. That framing—one façade over many providers, plus a layer of policy and visibility on top—is the mental model that explains everything else the product does.
Two things make Portkey notable. First, the open-source gateway: the routing engine is available to run yourself, which is reassuring if you want to inspect the request path or keep traffic inside your own infrastructure. Second, the hosted platform, which wraps that engine in a dashboard, logging, analytics, prompt tooling, and team features so you do not have to operate it. Many teams start on the hosted side to move quickly and keep the self-host option in their back pocket.
Portkey AI gateway features
Portkey's calling card is breadth. Where some tools pick one job and do it cleanly, Portkey aims to cover most of what a production LLM stack needs in a single place:
- Unified provider access. One API surface in front of a large catalog of providers and models, so switching models is a configuration change rather than a rewrite.
- Routing, fallbacks and retries. Define strategies that fail over to another provider or model when one errors or rate-limits, with automatic retries on transient failures.
- Load balancing. Spread traffic across keys or upstreams to smooth out rate limits and distribute load.
- Caching. Reuse prior responses to cut latency and repeat spend on matching requests.
- Observability and tracing. Per-request logs, metrics, and traces that turn otherwise opaque provider calls into something you can measure and debug.
- Guardrails. Checks and policies on inputs and outputs—useful for safety, compliance, and catching malformed responses before they reach users.
- Prompt management. Store, version, and iterate on prompts outside your application code.
Taken together, that is a lot of surface area. For a platform team that wants governance, prompt versioning, and observability standardized across many applications, having those concerns in one product—rather than stitched together from several—is a genuine advantage.
Observability and guardrails
Two features tend to be the reason teams choose Portkey specifically. The observability stack—logs, traces, latency and token metrics, cost views—gives you the kind of insight that providers rarely expose on their own, which matters enormously once LLM calls are load-bearing in your product. And guardrails let you enforce rules on what goes in and what comes back, which is increasingly a requirement rather than a nice-to-have in regulated or customer-facing settings. If those two capabilities are central to your problem, Portkey is a strong fit.
Portkey strengths
It is a capable product, and it is easy to see why it has a following:
- Broad feature set. Routing, caching, observability, guardrails, and prompt management under one roof reduce the number of moving parts you assemble yourself.
- Open-source core. The self-hostable gateway means you can run the data path yourself and audit it, which helps with privacy, latency, and trust.
- Strong observability. Logs, traces, and analytics make production LLM traffic legible.
- Guardrails built in. Input/output policies are a first-class concern rather than an afterthought.
- Team and platform features. Prompt management and governance scale well across many apps and engineers.
Considerations: what to confirm before you commit
None of the following is a knock on Portkey—they are the normal questions to ask of any platform with this much scope.
The first is feature breadth versus simplicity. A broad platform is powerful, but it is also more surface area to learn, configure, and maintain. If you need exactly one capability—say, intelligent routing by cost, or honest per-call accounting—a tool with a narrower focus can be simpler to adopt and reason about. Map the features you will actually use against the ones you will merely carry, and weigh that against the value of consolidation.
The second is pricing and limits. AI tooling pricing changes often, and the boundary between the open-source self-host path and the hosted platform's tiers, quotas, and metered features evolves over time. Rather than trust any number you read in a blog—including this one—check the current pricing, tiers, and limits on Portkey's own site and price your expected traffic against their published terms. The split between what is free to self-host and what the hosted platform charges for is exactly the kind of detail worth verifying at the source.
flo2 as a Portkey alternative
flo2 is a developer-first LLM gateway that deliberately narrows the focus to routing economics. Like Portkey, it is BYOK: you bring your own keys for OpenAI, Anthropic, Gemini, Groq, Cerebras, DeepInfra, Mistral, xAI, and OpenRouter, and you pay each provider directly. The defining difference is zero token markup—flo2 never sits in the money path, so it does not resell tokens or add a per-token margin. In exchange for one key that is compatible with both the OpenAI and Anthropic APIs, you get the routing layer most teams otherwise build by hand:
- Smart routing that sends each request to the cheapest or fastest model that fits the task.
- Fallback chains so an outage or rate limit transparently fails over to the next option.
- Racing that fires several models in parallel and returns the fastest acceptable response.
- A/B testing with a judge that scores model–task fit, so you pick models on evidence rather than vibes.
- Opt-in response caching to cut latency and spend where it is safe.
- True per-call cost accounting—real dollars per request and per model, not just aggregate token tallies.
The two tools optimize for different jobs. Portkey is a broad platform whose center of gravity is observability, guardrails, and prompt management across an organization. flo2 is a focused router whose center of gravity is choosing the right model per request and proving it was the right call in real dollars. flo2 is free during its Beta, so you can point an existing SDK at it and compare against your current setup directly.
Portkey vs flo2: a side-by-side
| Dimension | Portkey AI gateway | flo2 (zero-markup BYOK) |
|---|---|---|
| Primary focus | Broad platform: observability, guardrails, prompt management, routing | Routing-first: smart routing, racing, A/B, cost accounting |
| Deployment | Open-source self-host and hosted platform | Hosted gateway, drop-in endpoint |
| Key model | BYOK; bring your own provider keys | BYOK; bring your own provider keys |
| Token markup | See Portkey's site for current pricing | Zero markup; pay providers directly |
| API compatibility | Unified API across many providers | OpenAI- and Anthropic-compatible |
| Guardrails | Built-in input/output guardrails | Not the focus |
| Prompt management | Yes, versioning and storage | Not the focus |
| Racing & A/B + judge | Routing/fallback; confirm specifics on Portkey's site | Racing, A/B testing with a judge for model–task fit |
| Pricing | Check Portkey's current tiers and limits | Free during Beta |
Treat the table as a starting point, not gospel: feature scope and pricing on both sides shift, so verify the specifics—especially Portkey's tiers and any current routing capabilities—against the source before you decide.
How to choose
There is no universally right answer here, only fit. A few shortcuts:
- If you need a broad platform—organization-wide observability, guardrails, and prompt management alongside routing—Portkey is a natural pick, especially if the open-source gateway matters for your data path. Just confirm current pricing and limits on Portkey's site.
- If your main pain is choosing the right model per request—routing by cost and latency, racing, A/B with a judge, and seeing true dollar cost per call with no markup—a router-first gateway like flo2 targets exactly that gap.
- If you want to weigh the whole field across open-source proxies, cloud gateways, resellers, and BYOK routers, the best LLM gateway comparison walks through the categories and trade-offs.
Whatever you shortlist, test it against real traffic: measure latency, reconcile cost numbers against your provider invoices, and confirm the data path and retention meet your requirements. If zero-markup BYOK plus smart routing, racing, A/B testing, and true cost accounting match your priorities, flo2 is free to try during Beta—and Portkey remains a strong choice when broad observability, guardrails, and prompt management are the job to be done.