2026-06-03 · flo2 blog

Self-Hosted vs Hosted LLM Gateway: Which Should You Run?

Every team building on large language models eventually hits the same infrastructure question: should you run your own proxy, or hand that layer off to a managed service? The self-host vs hosted LLM gateway decision looks simple on the surface — open-source software versus a vendor — but the real trade-off is about where operational work lands, not which software is "better." This guide walks through both sides honestly, with the factors that should actually drive your decision.

What the Comparison Is Really About

A self-hosted LLM gateway means you take an open-source project — something like LiteLLM, Portkey's open-source variant, or a similar proxy — and run it yourself. You control the binary, the config, and the infrastructure underneath it. A hosted gateway means a third party runs that layer for you, and you connect to it over the internet. In both cases you bring your own provider keys and pay providers directly. The gateway is just middleware.

What differs is where the operational burden goes. That burden is real, and underestimating it is the most common mistake teams make when they first spin up a self-hosted LLM proxy.

Self-Hosted LLM Gateway: What You Get and What It Costs

The genuine advantages

Full control. You own the config, the version, and every parameter. There is no vendor making decisions about your routing logic or caching behavior without your knowledge.

Data stays in your infrastructure. If your compliance requirements mandate that requests never leave your network — HIPAA, SOC 2 with on-prem scope, EU data-residency, financial-services policies — a self-hosted proxy can satisfy them. Requests from your app go to your proxy, then directly to the provider. A third-party hosted gateway sits in that path.

No vendor lock-in at the routing layer. You are trusting LLM providers with your model calls regardless, but the gateway itself is a dependency you fully control. You can fork it, patch it, or replace it on your own schedule.

What self-hosting actually requires

The open-source gateway software is free. The time to operate it is not:

Deployment and scaling. Containers, orchestration, autoscaling, and a high-availability setup so the proxy does not become a single point of failure in front of every LLM call. One instance is easy; a reliable production setup is not.
Security and key management. Provider API keys live somewhere in your config or secrets manager. You own rotation schedules, access control, and network policies. A breach of the proxy layer can expose all your provider credentials at once.
Upgrades and patches. Provider APIs change, models get added, bugs get fixed. Tracking upstream releases, testing updates, and applying them without disrupting production is ongoing engineering time — not a one-time task.
Monitoring and on-call. If the gateway goes down at 2 AM, your LLM features go with it. You need uptime monitoring, alerting, and someone who will wake up and fix it. That someone is on your team.
Observability and dashboards. Logs and metrics exist, but turning them into a cost-per-model view, a per-app breakdown, or an alert on spend anomalies is work you build and maintain.

For a small team without dedicated platform engineers, this list is heavy. For a larger team with an existing infra practice, it may be fully manageable — especially if you already run similar middleware.

Hosted (Managed) LLM Gateway: What You Get and What It Costs

The genuine advantages

Zero operational overhead. Deployment, scaling, patching, and on-call are the vendor's problem. You configure routing rules, add your provider keys, and point your app at a new base URL. Time-to-first-call is measured in minutes.

Instant dashboard and updates. Cost accounting, per-model analytics, caching stats, and routing visibility are available from day one. New provider support, routing features, and bug fixes appear without any action on your part.

Scales invisibly. A hosted gateway is already running at scale. You do not have to plan capacity or handle traffic spikes with infra changes; the vendor absorbs that.

The real trade-offs

A third party sits in your request path. Requests from your app pass through the vendor's infrastructure before reaching your LLM provider. For most teams this is an acceptable trust boundary — the routing layer does not store the content of completions by default — but it is a trust decision you have to make consciously.

Less control over internals. You configure the gateway through whatever interface the vendor exposes. If you need a routing behavior the product does not support, you are waiting on the roadmap or working around it.

Vendor continuity risk. If the hosted service shuts down or changes pricing, you need to migrate. This is a real but manageable risk — most integrations are behind a base URL change, so migration is relatively low-friction.

Decision Factors: How to Choose

Factor	Lean self-hosted	Lean hosted
Team size & ops capacity	Dedicated platform/infra team available	Small team; engineers focused on product
Compliance / data residency	Strict: requests must stay in your network	Standard: provider data handling is sufficient
Time to value	Can absorb a multi-day setup and config cycle	Need routing and observability this week
Scale requirements	Very high, highly specific, or unusual traffic patterns	Normal growth curve; vendor scale is sufficient
Control requirements	Need to fork, patch, or deeply customize the gateway	Standard routing, fallback, and caching features
Engineering time cost	Low opportunity cost for infra work	Every infra hour is an hour not on the product
Token markup concern	BYOK self-hosted is zero-markup by definition	Choose a BYOK hosted gateway — also zero markup

A Note on Token Markup — It Applies to Both

A common misconception is that self-hosting is the only way to avoid paying a markup on tokens. That is not true. The markup question is about whether the gateway resells tokens at a spread on top of provider pricing. A self-hosted gateway using your own provider keys is inherently zero-markup — you pay the provider directly. But a hosted gateway that also supports bring-your-own-key (BYOK) works the same way: your keys, provider invoiced directly, no token spread. The hosted vs self-hosted distinction and the markup distinction are separate axes. Evaluate them separately.

When comparing options, see the best LLM gateway comparison for a fuller breakdown of how products differ on this and other dimensions.

Where the Teams That Choose Hosted End Up

The teams that find the most value in a hosted gateway tend to share a profile: they are moving fast on product, they do not have spare platform-engineering capacity, and they recognize that the gateway is a means to an end — not the thing they are building. They want model routing, fallback, caching, and cost visibility without adding another service to their on-call rotation.

That does not mean self-hosting is wrong. Teams in regulated industries, teams at very large scale with specialized requirements, and teams that genuinely value owning every layer are well-served by running their own proxy. The open-source options are capable. The question is whether the operational ownership fits your situation.

Where flo2 Fits

flo2 is a hosted, developer-first LLM gateway. You bring your own provider keys — OpenAI, Anthropic, Google, Groq, DeepInfra, and others — and pay providers directly. There is no token markup. flo2 handles routing, fallback, racing, A/B testing, caching, and cost accounting, with a dashboard that gives you visibility from day one. There is no infrastructure to deploy or operate on your end.

If you have concluded that a managed gateway fits your situation — team size, compliance requirements, and time-to-value all point that way — flo2 is free during its beta period. You get the full feature set while it is free to try, with no token markup regardless of plan.

If you are still evaluating whether you need a gateway at all, or want to understand the landscape before committing to any approach, the best LLM gateway comparison covers the leading options side by side.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →