2026-06-03 · flo2 blog

LiteLLM vs flo2: Self-Hosted Proxy vs Hosted Zero-Markup Gateway

The LiteLLM vs flo2 question comes up when you have already decided you want a unified LLM gateway and are now figuring out the right operational model for your team. Both give you one OpenAI-compatible interface in front of OpenAI, Anthropic, and more, and neither charges a token markup on top of what providers charge. But the architecture is entirely different: LiteLLM is open-source software you download and run yourself, while flo2 is a hosted, zero-infrastructure gateway you connect to with your own provider keys. That distinction ripples through setup, ops, cost, observability, and who has the best fit — and this guide works through each dimension without pulling punches in either direction.

If you want the background first, our explainer on what is LiteLLM covers the SDK and proxy in detail. This article assumes you know roughly what both products do and are deciding which to run.

LiteLLM vs flo2: the core difference

Strip the feature lists away and the two products differ on exactly one axis: who operates the gateway.

LiteLLM is open-source software. At its simplest it is a Python SDK you import; at its most powerful it is a proxy server you deploy — in your own infrastructure, under your own management — that exposes an OpenAI-compatible HTTP endpoint in front of 100+ providers. The proxy makes requests under your provider keys, so you pay OpenAI, Anthropic, and the rest directly at list price. Nothing sits between you and the model vendor in the money or data path. The trade-off is that running the proxy — scaling it, securing it, monitoring it, patching it — is your engineering team's job.

flo2 is a hosted gateway that works on the same BYOK (bring-your-own-keys) principle: you configure your OpenAI, Anthropic, or other provider credentials once, and flo2 routes requests through them without ever reselling tokens or adding a markup. What flo2 handles for you is everything operational: uptime, scaling, key storage, routing logic, caching, cost accounting, and a dashboard — all managed, nothing to run. You get one flo2 key that is compatible with both the OpenAI and Anthropic SDKs, and your app is a base-URL change away from being connected.

So the flo2 vs LiteLLM choice is not a quality comparison — it is a build-vs-buy decision for the infrastructure layer underneath your model calls.

Setup and day-two operations

This is the dimension that bites hardest in practice and is often under-weighted on day one.

LiteLLM: you run it

Getting the LiteLLM SDK running in a script takes minutes. Getting the proxy running reliably in production takes longer. Once it is load-bearing — sitting between your application and every model call — you are responsible for:

Deployment and scaling. Containers, autoscaling, and a high-availability setup so the proxy does not become a single point of failure. A crashed proxy means all your LLM features go dark.
Security. Storing provider keys safely, managing auth for callers, rotating secrets, applying security patches on a responsible timeline.
Monitoring and alerting. Wiring latency, error rate, and uptime into your existing observability stack. Nobody at LiteLLM pages you at 2am — you have to page yourself.
Maintenance. Provider API changes, library updates, and configuration drift are ongoing work, not a one-time cost.

For teams that want full control and have the headcount to support it, this is exactly the point of the project and a reasonable trade-off. For teams that want model access without operating a new service in their critical path, it is where friction accumulates.

flo2: someone else runs it

Setup is a sign-up, a key paste, and a base-URL change. There is no proxy to deploy, no Docker image to configure, no autoscaling group to size. Availability, patching, and capacity are flo2's problem. Your engineering time goes to the thing you are actually building. The trade-off is that flo2 sits in the network path to your model calls, which is a meaningful point of control to hand to a third party — something your security and compliance posture should weigh.

Pricing model: same goal, different shape

Both LiteLLM and flo2 share the same fundamental pricing philosophy: no token markup. You pay providers what providers charge, not a marked-up rate. That is genuinely the common ground, and it matters — particularly relative to hosted aggregators that resell inference at a spread.

The difference is where the cost lands. With LiteLLM, the marginal cost of routing is zero, but the fixed cost is the engineering time and infrastructure to run the proxy. Cloud compute, engineers' hours, monitoring tooling, and incident response add up — they are real costs, just not per-token ones. With flo2, the current answer is simpler: it is free during the public beta, meaning there is no platform fee on top of provider costs. Post-beta pricing is not yet published, so if long-term cost certainty matters, that is a question worth putting to the flo2 team before committing.

If you already have negotiated pricing or committed spend with a provider, both approaches let you use it — LiteLLM because you are calling under your keys directly, flo2 because it routes through your keys rather than its own.

Features: routing, fallback, caching, and observability

LiteLLM

The LiteLLM proxy is feature-rich. It supports load balancing across model deployments, fallback chains, retry logic, per-user and per-key spend limits, and callback hooks into logging and observability tools like Langfuse, Helicone, Prometheus, and OpenTelemetry. Because it is open source, you can configure it in detail and extend it if you need to. The flip side: wiring up a complete observability picture — spend per model, per app, per user, visualized — requires you to compose these integrations yourself rather than finding a working dashboard out of the box.

flo2

flo2 includes smart routing, provider fallback, request racing (send to multiple providers, use the first response), A/B testing with a judge model to evaluate output quality, and semantic response caching. Cost accounting is per-call — not an estimate, not a heuristic — and it surfaces in a built-in dashboard rather than a log file you parse. Both OpenAI SDK and Anthropic SDK compatibility are first-class, so you do not have to pick one call style for your whole codebase. Because flo2 is hosted, the feature surface is whatever the platform ships; you configure within it, you do not extend it.

Control and data residency

LiteLLM running in your own infrastructure means requests never leave your network on the way to the provider. For regulated industries, data-residency requirements, or security postures that prohibit a third-party gateway in the request path, that is not just a preference — it may be a requirement. flo2 is the wrong answer for those situations, and a responsible comparison has to say so.

For teams without those constraints, the control question is usually about configurability rather than network topology. Both products let you express routing preferences and fallbacks; LiteLLM does so in more depth through its open-source configuration surface, while flo2 does so through a hosted interface that is less flexible but faster to operate.

Side-by-side comparison

Dimension	LiteLLM	flo2
Hosting model	Self-hosted (you run and operate it)	Hosted SaaS (zero infra to manage)
Setup time	Minutes for SDK; hours to days for production proxy	Minutes (sign up, paste key, change base URL)
Token markup	None — BYOK, direct provider billing	None — BYOK, direct provider billing
Platform fee	None (OSS), but infra and ops cost	Free during beta; post-beta pricing TBD
OpenAI compatibility	Yes	Yes
Anthropic SDK compatibility	Partial (via proxy shim)	First-class native support
Smart routing / fallback	Yes — configurable, flexible	Yes — routing, fallback, racing
A/B testing with judge	Requires custom setup	Built in
Response caching	Yes (configurable backends)	Yes (managed, semantic caching)
Per-call cost accounting	Yes — data emitted, dashboard you build	Yes — built-in dashboard, per-call true cost
Observability	Rich integrations (Langfuse, Prometheus, etc.)	Built-in dashboard, no wiring required
Data residency / on-prem	Full control — runs in your network	Not suitable for on-prem requirements
Provider coverage	100+ providers	Major providers (OpenAI, Anthropic, others)
License / source	Open source (MIT)	Commercial hosted service

Who should choose LiteLLM

LiteLLM is the right pick when one or more of the following is true for your team:

You need requests to stay inside your own network — data residency, compliance, or a security policy that rules out a third-party gateway.
You have the engineering capacity to operate and monitor a production service and prefer that control over offloading it.
You need access to the long tail of 100+ providers, including ones flo2 does not yet support.
You want to customize or extend the proxy in ways that a hosted platform would not allow — custom middleware, forking, or deep integration with internal tooling.
You are already running significant infrastructure and adding one more self-hosted service is not a meaningful burden.

Who should choose flo2

flo2 is the better fit when the operational overhead of running a proxy is a cost you would rather not absorb:

You want to spend zero time on proxy deployment, scaling, and incident response — model access, not infrastructure management, is the goal.
You use both the OpenAI and Anthropic SDKs and want native compatibility with both through one gateway key.
You want built-in routing, fallback, racing, A/B testing, caching, and a cost dashboard without assembling them from components.
You are in beta or early-stage — flo2 is free during its public beta, which removes the platform cost entirely for now.
Your security posture does not require an on-prem or private-network gateway.

The bottom line

The self-host vs hosted LLM proxy trade-off is not a quality call — it is an ops call. LiteLLM is a mature, capable open-source project that gives you complete ownership of your gateway. For teams that want that, it earns its reputation. For teams that want the same BYOK, zero-markup philosophy with none of the operational surface, a hosted gateway like flo2 closes the gap. The right question is not which product is better; it is which operational model fits your team today.

If you are evaluating the hosted path, flo2 is free during its public beta — worth running alongside your existing setup to see how the routing, caching, and cost accounting land in practice. And if you are still researching the landscape, our guide to LiteLLM alternative options covers a broader set of tools worth considering before committing.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →