LiteLLM vs flo2: Self-Hosted Proxy vs Hosted Zero-Markup Gateway
The LiteLLM vs flo2 question comes up when you have already decided you want a unified LLM gateway and are now figuring out the right operational model for your team. Both give you one OpenAI-compatible interface in front of OpenAI, Anthropic, and more, and neither charges a token markup on top of what providers charge. But the architecture is entirely different: LiteLLM is open-source software you download and run yourself, while flo2 is a hosted, zero-infrastructure gateway you connect to with your own provider keys. That distinction ripples through setup, ops, cost, observability, and who has the best fit — and this guide works through each dimension without pulling punches in either direction.
If you want the background first, our explainer on what is LiteLLM covers the SDK and proxy in detail. This article assumes you know roughly what both products do and are deciding which to run.
LiteLLM vs flo2: the core difference
Strip the feature lists away and the two products differ on exactly one axis: who operates the gateway.
LiteLLM is open-source software. At its simplest it is a Python SDK you import; at its most powerful it is a proxy server you deploy — in your own infrastructure, under your own management — that exposes an OpenAI-compatible HTTP endpoint in front of 100+ providers. The proxy makes requests under your provider keys, so you pay OpenAI, Anthropic, and the rest directly at list price. Nothing sits between you and the model vendor in the money or data path. The trade-off is that running the proxy — scaling it, securing it, monitoring it, patching it — is your engineering team's job.
flo2 is a hosted gateway that works on the same BYOK (bring-your-own-keys) principle: you configure your OpenAI, Anthropic, or other provider credentials once, and flo2 routes requests through them without ever reselling tokens or adding a markup. What flo2 handles for you is everything operational: uptime, scaling, key storage, routing logic, caching, cost accounting, and a dashboard — all managed, nothing to run. You get one flo2 key that is compatible with both the OpenAI and Anthropic SDKs, and your app is a base-URL change away from being connected.
So the flo2 vs LiteLLM choice is not a quality comparison — it is a build-vs-buy decision for the infrastructure layer underneath your model calls.
Setup and day-two operations
This is the dimension that bites hardest in practice and is often under-weighted on day one.
LiteLLM: you run it
Getting the LiteLLM SDK running in a script takes minutes. Getting the proxy running reliably in production takes longer. Once it is load-bearing — sitting between your application and every model call — you are responsible for:
- Deployment and scaling. Containers, autoscaling, and a high-availability setup so the proxy does not become a single point of failure. A crashed proxy means all your LLM features go dark.
- Security. Storing provider keys safely, managing auth for callers, rotating secrets, applying security patches on a responsible timeline.
- Monitoring and alerting. Wiring latency, error rate, and uptime into your existing observability stack. Nobody at LiteLLM pages you at 2am — you have to page yourself.
- Maintenance. Provider API changes, library updates, and configuration drift are ongoing work, not a one-time cost.
For teams that want full control and have the headcount to support it, this is exactly the point of the project and a reasonable trade-off. For teams that want model access without operating a new service in their critical path, it is where friction accumulates.
flo2: someone else runs it
Setup is a sign-up, a key paste, and a base-URL change. There is no proxy to deploy, no Docker image to configure, no autoscaling group to size. Availability, patching, and capacity are flo2's problem. Your engineering time goes to the thing you are actually building. The trade-off is that flo2 sits in the network path to your model calls, which is a meaningful point of control to hand to a third party — something your security and compliance posture should weigh.
Pricing model: same goal, different shape
Both LiteLLM and flo2 share the same fundamental pricing philosophy: no token markup. You pay providers what providers charge, not a marked-up rate. That is genuinely the common ground, and it matters — particularly relative to hosted aggregators that resell inference at a spread.
The difference is where the cost lands. With LiteLLM, the marginal cost of routing is zero, but the fixed cost is the engineering time and infrastructure to run the proxy. Cloud compute, engineers' hours, monitoring tooling, and incident response add up — they are real costs, just not per-token ones. With flo2, the current answer is simpler: it is free during the public beta, meaning there is no platform fee on top of provider costs. Post-beta pricing is not yet published, so if long-term cost certainty matters, that is a question worth putting to the flo2 team before committing.
If you already have negotiated pricing or committed spend with a provider, both approaches let you use it — LiteLLM because you are calling under your keys directly, flo2 because it routes through your keys rather than its own.
Features: routing, fallback, caching, and observability
LiteLLM
The LiteLLM proxy is feature-rich. It supports load balancing across model deployments, fallback chains, retry logic, per-user and per-key spend limits, and callback hooks into logging and observability tools like Langfuse, Helicone, Prometheus, and OpenTelemetry. Because it is open source, you can configure it in detail and extend it if you need to. The flip side: wiring up a complete observability picture — spend per model, per app, per user, visualized — requires you to compose these integrations yourself rather than finding a working dashboard out of the box.
flo2
flo2 includes smart routing, provider fallback, request racing (send to multiple providers, use the first response), A/B testing with a judge model to evaluate output quality, and semantic response caching. Cost accounting is per-call — not an estimate, not a heuristic — and it surfaces in a built-in dashboard rather than a log file you parse. Both OpenAI SDK and Anthropic SDK compatibility are first-class, so you do not have to pick one call style for your whole codebase. Because flo2 is hosted, the feature surface is whatever the platform ships; you configure within it, you do not extend it.
Control and data residency
LiteLLM running in your own infrastructure means requests never leave your network on the way to the provider. For regulated industries, data-residency requirements, or security postures that prohibit a third-party gateway in the request path, that is not just a preference — it may be a requirement. flo2 is the wrong answer for those situations, and a responsible comparison has to say so.
For teams without those constraints, the control question is usually about configurability rather than network topology. Both products let you express routing preferences and fallbacks; LiteLLM does so in more depth through its open-source configuration surface, while flo2 does so through a hosted interface that is less flexible but faster to operate.
Side-by-side comparison
| Dimension | LiteLLM | flo2 |
|---|---|---|
| Hosting model | Self-hosted (you run and operate it) | Hosted SaaS (zero infra to manage) |
| Setup time | Minutes for SDK; hours to days for production proxy | Minutes (sign up, paste key, change base URL) |
| Token markup | None — BYOK, direct provider billing | None — BYOK, direct provider billing |
| Platform fee | None (OSS), but infra and ops cost | Free during beta; post-beta pricing TBD |
| OpenAI compatibility | Yes | Yes |
| Anthropic SDK compatibility | Partial (via proxy shim) | First-class native support |
| Smart routing / fallback | Yes — configurable, flexible | Yes — routing, fallback, racing |
| A/B testing with judge | Requires custom setup | Built in |
| Response caching | Yes (configurable backends) | Yes (managed, semantic caching) |
| Per-call cost accounting | Yes — data emitted, dashboard you build | Yes — built-in dashboard, per-call true cost |
| Observability | Rich integrations (Langfuse, Prometheus, etc.) | Built-in dashboard, no wiring required |
| Data residency / on-prem | Full control — runs in your network | Not suitable for on-prem requirements |
| Provider coverage | 100+ providers | Major providers (OpenAI, Anthropic, others) |
| License / source | Open source (MIT) | Commercial hosted service |
Who should choose LiteLLM
LiteLLM is the right pick when one or more of the following is true for your team:
- You need requests to stay inside your own network — data residency, compliance, or a security policy that rules out a third-party gateway.
- You have the engineering capacity to operate and monitor a production service and prefer that control over offloading it.
- You need access to the long tail of 100+ providers, including ones flo2 does not yet support.
- You want to customize or extend the proxy in ways that a hosted platform would not allow — custom middleware, forking, or deep integration with internal tooling.
- You are already running significant infrastructure and adding one more self-hosted service is not a meaningful burden.
Who should choose flo2
flo2 is the better fit when the operational overhead of running a proxy is a cost you would rather not absorb:
- You want to spend zero time on proxy deployment, scaling, and incident response — model access, not infrastructure management, is the goal.
- You use both the OpenAI and Anthropic SDKs and want native compatibility with both through one gateway key.
- You want built-in routing, fallback, racing, A/B testing, caching, and a cost dashboard without assembling them from components.
- You are in beta or early-stage — flo2 is free during its public beta, which removes the platform cost entirely for now.
- Your security posture does not require an on-prem or private-network gateway.
The bottom line
The self-host vs hosted LLM proxy trade-off is not a quality call — it is an ops call. LiteLLM is a mature, capable open-source project that gives you complete ownership of your gateway. For teams that want that, it earns its reputation. For teams that want the same BYOK, zero-markup philosophy with none of the operational surface, a hosted gateway like flo2 closes the gap. The right question is not which product is better; it is which operational model fits your team today.
If you are evaluating the hosted path, flo2 is free during its public beta — worth running alongside your existing setup to see how the routing, caching, and cost accounting land in practice. And if you are still researching the landscape, our guide to LiteLLM alternative options covers a broader set of tools worth considering before committing.