2026-06-03 · flo2 blog

What Is LiteLLM? The Open-Source LLM Proxy, Explained

If you've shipped anything on top of large language models, you've probably asked some version of what is LiteLLM and whether it belongs in your stack. The short answer: LiteLLM is a popular open-source project that gives you one OpenAI-style interface for 100+ model providers, available both as a Python SDK and as a self-hostable proxy server. This guide explains LiteLLM in plain terms for developers, covers what you get, the tradeoffs of running it yourself, and when a managed gateway is the simpler call.

What is LiteLLM, exactly?

LiteLLM is an open-source library and proxy that standardizes how you call language models. Instead of learning a different request and response shape for every vendor, you write OpenAI-format calls and LiteLLM translates them to whichever provider you target — OpenAI, Anthropic, Google Gemini, Azure, Bedrock, Mistral, and many more. It comes in two main flavors:

LiteLLM Python SDK — a library you import into your app. You call litellm.completion(...) with an OpenAI-shaped payload and a model string, and it handles the provider-specific plumbing.
LiteLLM Proxy — a server you run that exposes an OpenAI-compatible HTTP endpoint. Any app or SDK that already speaks OpenAI can point its base URL at the proxy and reach every configured model through it, in any language.

The core idea is unification. One interface, many backends. That alone removes a lot of glue code, but the proxy adds the operational features teams usually want once more than one service starts calling models.

LiteLLM in Python: the SDK

The SDK is the fastest way to understand the project. A call looks like a normal OpenAI request, just with a provider-prefixed model name:

from litellm import completion

resp = completion(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Summarize this in one line."}],
)
print(resp.choices[0].message.content)

Swap the model string to gpt-4o or gemini/gemini-1.5-pro and the rest of your code stays the same. That portability is the headline feature: you can switch providers, run experiments, or add a fallback without rewriting request and response handling.

What you get with the LiteLLM proxy

Used as the SDK, LiteLLM is mostly about a unified call surface. Run it as a proxy and it becomes a small gateway that sits between your apps and the providers. The commonly used capabilities include:

Unified OpenAI-compatible endpoint — every model reachable through one base URL and request format.
Routing — distribute traffic across deployments or model groups, with strategies like least-busy or latency-based selection.
Fallbacks and retries — when a model errors or rate-limits, automatically retry against another configured model so requests don't simply fail.
Virtual keys, budgets, and rate limits — issue per-team or per-app keys, set spend caps, and throttle usage centrally.
Logging and observability — emit request, token, and cost data to your logging or monitoring stack for cost tracking and debugging.

That feature set is why LiteLLM shows up so often in "litellm vs" discussions: it covers the practical needs of a shared LLM access layer in a single, well-maintained open-source package.

Self-hosting LiteLLM: the upside

Running LiteLLM yourself is appealing for good reasons, and for many teams it's the right choice:

Full control — it runs in your infrastructure, inside your network and security boundary, on your terms.
Open source — you can read the code, extend it, pin versions, and avoid lock-in. There's no third party sitting in the request path you can't inspect.
No external middleman per call — traffic goes from your proxy straight to the providers; you aren't routing requests through someone else's hosted service.
Customizable — config-driven model groups, custom callbacks, and hooks let you shape behavior to your needs.

If you value owning the deployment and already operate services confidently, self-hosting LiteLLM is a strong, mature option.

Self-hosting LiteLLM: the cost

The flip side of control is operations. When you run the proxy, it's yours to keep healthy:

You deploy and scale it — containers, autoscaling, and capacity planning are on you, and the proxy becomes a dependency in your critical path to every model.
You secure it — key storage, network policy, auth, and upgrades are your responsibility.
You monitor it — uptime, latency, and alerting need wiring into your stack; if the proxy goes down, your LLM calls go with it.
You maintain it — tracking releases, applying updates, and handling provider changes is ongoing work.

None of this is unusual for infrastructure, but it's real engineering time. For a small team that just wants reliable, cheap model access, standing up and babysitting a proxy can be more than the problem warrants.

Self-host vs hosted: the tradeoff

Here's the balanced version of the decision:

Self-host (LiteLLM) — maximum control, open source, no external service in the path. Cost: you own deploy, scale, security, and monitoring.
Hosted gateway — no infrastructure to run, a dashboard out of the box, and the routing layer maintained for you. Cost: you trust a managed service in your request path.

It's genuinely about preference and team capacity. If running a proxy is no burden — or you specifically want everything in your own environment — LiteLLM is excellent. If you'd rather not operate one more service, a hosted gateway gets you the same unified interface and routing without the ops overhead.

When a managed gateway is the simpler choice

This is where a hosted option earns its keep. flo2 is a developer-first, hosted LLM gateway with zero token markup: you bring your own provider keys (OpenAI, Anthropic, Gemini, Groq, Cerebras, DeepInfra, Mistral, xAI, OpenRouter) and pay the providers directly. One key — OpenAI- and Anthropic-compatible — routes to the cheapest or fastest model that fits the request, with no proxy for you to deploy, scale, secure, or monitor.

Beyond unified access, flo2 adds smart routing, fallback, and racing, plus A/B testing with an LLM judge for "model–task fit," response caching, and true cost accounting — all behind a managed dashboard. It's free during Beta. If LiteLLM answers "how do I unify my model calls," flo2 answers "how do I get that without running the infrastructure myself."

The honest framing: LiteLLM is a great open-source choice when you want to self-host and own the deployment. flo2 is the hosted, zero-markup alternative when you'd rather skip the proxy entirely. Many developers will try both before deciding which fits.

Want to go deeper? Read what is an LLM gateway for the broader concept, or see our best LLM gateway comparison to weigh the options side by side. When you're ready to route without running anything, give flo2 a try.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →