2026-06-03 · flo2 blog

Unified LLM API: One Interface for Every Model

A unified LLM API is a single interface that sits in front of many model providers — OpenAI, Anthropic, Gemini, Groq, Mistral, and the rest — and exposes them all through one consistent request and response shape. Instead of wiring up a separate SDK, auth scheme, and payload format for each vendor, you call one endpoint and select the model with a string. It's the difference between "integrate seven providers" and "change model and ship." If you've ever wanted one API for all LLMs, this is the pattern that delivers it.

This post explains the problem a unified API solves, what the abstraction actually standardizes, the concrete benefits for a production codebase, how a gateway delivers it under the hood, and a small before/after example so you can see the code shrink.

The problem: every provider is its own integration

Each LLM vendor ships its own SDK, and underneath each SDK is a different contract. Adopting a second or third model means re-learning four things that have nothing to do with your actual product:

Auth and endpoints: OpenAI wants Authorization: Bearer at api.openai.com/v1; Anthropic uses an x-api-key header and a separate anthropic-version header at its own host; Gemini historically used a key as a query parameter. Different base URLs, different headers, different key-management code.
Request shape: OpenAI's Chat Completions takes a flat messages array. The Anthropic Messages API splits the system prompt into a top-level system field and structures content as typed blocks. The same logical prompt has to be rebuilt per provider.
Response shape: you read choices[0].message.content in one place and content[0].text in another. Token usage lives under usage.prompt_tokens here and usage.input_tokens there. Your parsing and cost-tracking code forks.
Streaming format: both stream over server-sent events, but the event names and delta structures differ, so your streaming handler is provider-specific too.

Multiply that by every model you want to evaluate and the cost is real. Worse, it's a quiet form of lock-in: once your application is written against one vendor's exact payload and SDK, "let's try a cheaper model" turns into a refactor instead of a config change. That friction is what stops teams from shopping around — and the model landscape moves fast enough that being unable to switch is a genuine liability.

The solution: one interface, change the model string

A unified LLM API removes the per-provider integration by standardizing the contract. In practice the de facto standard is the OpenAI Chat Completions shape — messages in, choices[].message.content out, bearer-token auth, SSE streaming — because the entire tooling ecosystem (the official SDKs, LangChain, LlamaIndex, the Vercel AI SDK, instructor) already targets it. A single API many models can ride on top of that one schema.

With that abstraction in place, the only thing that varies between models is the model field:

One client object, one base URL, one key in your code.
Switching from GPT to Claude to a Llama model on Groq is a one-line edit to a string, not a rewrite.
The request body, the response parsing, and the streaming handler stay identical across vendors.

This is exactly the principle behind an OpenAI-compatible API: agree on the wire format once, and every provider that speaks it becomes interchangeable. A multi-model API is just that idea applied across many vendors at the same time.

Why it's worth it: four concrete wins

Standardizing on one interface isn't only about typing less. It changes what your team can do operationally:

Swap and compare models freely. Because the call site never changes, you can A/B a prompt across three models by changing one variable. Cost and quality become tunable knobs instead of architectural commitments.
Add providers without rewrites. When a new model ships — and a strong one ships roughly every few weeks now — onboarding it is a config entry, not a sprint. You're never blocked from adopting the current best price-performance.
Centralize keys, limits, and cost. One place holds every provider credential, enforces rate limits, and records spend. You get a single source of truth for "what did we spend, on which model, for which feature" instead of stitching together vendor dashboards.
Resilience via fallback. With many providers behind one interface, a 429 or a 5xx from one vendor can transparently retry on another. A single provider's outage stops being your outage.

How it's delivered: a gateway normalizes the formats

You can hand-roll a thin adapter layer yourself, but the production-grade version of a unified API is a gateway (or proxy) that sits between your app and the providers and does the normalization for you. It accepts a request in a standard shape, translates it to whatever the target provider expects, calls that provider with the right key, and translates the response back. For the full mental model of that layer — routing, fallback, caching, observability — see what is an LLM gateway.

The detail that matters for adoption is which front doors the gateway speaks. flo2 exposes both the OpenAI Chat Completions API and the Anthropic Messages API. So an OpenAI-SDK client and an Anthropic-SDK client can each keep their native format and still route to the same set of models — no rewriting an Anthropic app into Chat Completions shape just to reach a Llama model, and vice versa. The gateway handles the translation on both sides.

Before and after

Here's the "before": two providers, two SDKs, two payload shapes, two response shapes, two key variables. The logic to support both is mostly format-shuffling.

# BEFORE — provider-specific code paths
from openai import OpenAI
from anthropic import Anthropic

if provider == "openai":
    client = OpenAI(api_key=OPENAI_KEY)
    r = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )
    text = r.choices[0].message.content
elif provider == "anthropic":
    client = Anthropic(api_key=ANTHROPIC_KEY)
    r = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}],
    )
    text = r.content[0].text   # different shape, different parsing

And the "after": one client, one base URL, one key. To target a different model — OpenAI, Anthropic, Gemini, Groq — you change the model string and nothing else. The response shape stays constant.

# AFTER — one unified interface, route by model name
from openai import OpenAI

client = OpenAI(
    base_url="https://flo2.com/v1",
    api_key="your_flo2_key",
)

r = client.chat.completions.create(
    model="auto",  # or "gpt-4o-mini" / "claude-3-5-sonnet" / "llama-3.3-70b"
    messages=[{"role": "user", "content": prompt}],
)
text = r.choices[0].message.content

The "after" works with your existing OpenAI SDK and any framework built on it. Pass a concrete model name to pin a provider, or pass auto and let flo2 pick a model that fits the task. Because you bring your own provider keys and pay each vendor directly, flo2 adds zero token markup — the unified interface is a routing convenience, not a reseller margin.

What a good unified API adds beyond the interface

The single endpoint is the foundation; the value compounds with what the layer does on top of it. With flo2 specifically, one OpenAI- and Anthropic-compatible key gives you smart routing to the cheapest or fastest model that fits, fallback across providers when one fails, request racing to cut tail latency, A/B comparisons with a judge to measure model–task fit, response caching, and true per-call cost accounting across every provider in one view. None of that is reachable when each model is a separate hard-coded integration — it only becomes possible once everything sits behind one interface.

Takeaways

A unified LLM API puts one consistent request/response shape — usually OpenAI Chat Completions — in front of many providers.
It kills the per-vendor integration tax (auth, payloads, parsing, streaming) and the lock-in that comes from coding against one provider's exact format.
Switching or comparing models becomes a one-line model change, which unlocks free swapping, easy provider onboarding, centralized cost/keys, and fallback resilience.
A gateway delivers it by normalizing formats; flo2 speaks both the OpenAI and Anthropic APIs so either SDK routes to the same models.

If you're tired of maintaining one code path per provider, route every model through a single endpoint with flo2 — it's free during Beta and adds no markup on top of provider pricing. New to the underlying concepts? Start with an OpenAI-compatible API and what is an LLM gateway.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →