2026-06-03 · flo2 blog

OpenAI-Compatible API: One Base URL for Every Model

An OpenAI-compatible API is any HTTP endpoint that speaks the same request and response schema as OpenAI's Chat Completions API — typically exposed at /v1/chat/completions. If you can call OpenAI, you can call an OpenAI-compatible endpoint by changing two things: the base_url and the api_key. That's the whole trick, and it's why you can point the official OpenAI SDK at Groq, Mistral, DeepInfra, Together, Ollama, and dozens of other providers without rewriting your application.

This post explains what "OpenAI-compatible" actually means, why nearly every provider now ships one, how to switch with a one-line base_url override, the feature-parity gotchas that bite people in production, and how an LLM gateway collapses all of those endpoints behind a single OpenAI-compatible base URL.

What "OpenAI-compatible" means

"OpenAI-compatible" is a contract, not a product. A provider claiming compatibility implements the same wire format as OpenAI's Chat Completions endpoint:

Request shape: a JSON body with model, a messages array of { "role": "...", "content": "..." } objects, plus optional fields like temperature, max_tokens, stream, and tools.
Response shape: a JSON object with id, choices[].message.content, a finish_reason, and a usage block reporting prompt_tokens, completion_tokens, and total_tokens.
Auth: a bearer token in the Authorization: Bearer <key> header.
Streaming: when stream: true, server-sent events (SSE) emit incremental delta chunks, terminated by data: [DONE].

Because the shape is fixed, any client that already knows how to build that request and parse that response works unchanged. The model name and the endpoint URL become the only things that vary between providers.

Why so many providers offer an OpenAI-compatible endpoint

The answer is distribution. OpenAI's SDKs (openai-python, openai-node) and the ecosystem built on top of them — LangChain, LlamaIndex, Vercel AI SDK, instructor, countless internal wrappers — all target the Chat Completions schema. By implementing that same schema, a new provider inherits the entire ecosystem for free. You don't have to learn a bespoke SDK or rewrite your prompt-construction code; you just repoint an existing client.

That's why a drop-in OpenAI API is now table stakes. A non-exhaustive list of providers shipping an OpenAI-compatible endpoint includes:

Groq — https://api.groq.com/openai/v1
Mistral — https://api.mistral.ai/v1
DeepInfra — https://api.deepinfra.com/v1/openai
Together — https://api.together.xyz/v1
Cerebras — https://api.cerebras.ai/v1
xAI (Grok) — https://api.x.ai/v1
Ollama (local) — http://localhost:11434/v1

Even providers with their own native API — Anthropic, Google Gemini — also expose an OpenAI-compatible surface so teams can adopt their models without abandoning existing tooling.

How to switch: set base_url and api_key

Using the OpenAI SDK with other models is mostly an exercise in the OpenAI base_url override. The SDK builds the request; you just tell it where to send it and which key to use.

Here's the same call against Groq using the Python SDK — note that nothing about the call site changes except the constructor arguments and the model string:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key="gsk_your_groq_key",
)

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Explain an OpenAI-compatible API in one sentence."},
    ],
)

print(resp.choices[0].message.content)
print(resp.usage)  # prompt_tokens / completion_tokens / total_tokens

The same request with plain curl, hitting the OpenAI-compatible /chat/completions path directly:

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "user", "content": "Explain an OpenAI-compatible API in one sentence."}
    ]
  }'

To go back to OpenAI, drop the base_url (the SDK defaults to https://api.openai.com/v1) and supply an OpenAI key. To target Mistral or Together, swap the URL and key. The body stays identical.

The gotchas: "compatible" is a spectrum

The dangerous assumption is that "OpenAI-compatible" means "100% identical." It rarely does. The core messages-in, content-out path is reliable, but optional features have uneven coverage. Test these per provider before relying on them:

Tools / function calling: most providers accept the tools parameter, but support for tool_choice, parallel tool calls, and strict argument schemas varies. Some return tool calls in slightly different shapes.
JSON / structured output: response_format: { "type": "json_object" } is widely supported; the newer json_schema strict mode is not universal. A provider may silently ignore the field and return prose.
Vision / multimodal: image inputs via content parts work only on models that support them, and the accepted formats (URL vs. base64) differ.
Streaming usage: not every provider includes a usage block in the final SSE chunk, and the stream_options: { "include_usage": true } flag isn't honored everywhere. If you bill on tokens, verify this.
Token usage fields: cached-token and reasoning-token breakdowns are inconsistent; some providers omit fields your dashboards expect.
Parameter handling: unknown parameters may be silently dropped or may 400. logprobs, seed, and n > 1 are common gaps.

None of this makes OpenAI compatibility less useful — it just means you should treat each provider's compatibility claim as "supports the common case" and validate the specific features your code depends on.

One OpenAI-compatible base URL for every model

Pointing the SDK at one provider is easy. Managing keys, base URLs, fallbacks, and per-model quirks across many providers is where it gets tedious — and that's the problem an LLM gateway solves. Instead of juggling seven base URLs and seven keys in your code, you call one OpenAI-compatible endpoint and let the gateway route to the right provider.

flo2 is a developer-first gateway built around exactly this. You bring your own provider keys (OpenAI, Anthropic, Gemini, Groq, Cerebras, DeepInfra, Mistral, xAI, OpenRouter) and pay each provider directly — flo2 adds zero token markup. One key fronts all of them:

from openai import OpenAI

client = OpenAI(
    base_url="https://flo2.com/v1",
    api_key="your_flo2_key",
)

resp = client.chat.completions.create(
    model="auto",  # route by name, or let flo2 pick the cheapest/fastest fit
    messages=[{"role": "user", "content": "Summarize this changelog."}],
)

Because flo2 exposes a standard OpenAI-compatible surface, your existing SDK code and any OpenAI-targeting framework work without changes. Routing happens by model name: ask for a specific model and it goes there; ask for auto and flo2 picks a model that fits the task, with smart routing, fallback, and racing so a single provider outage doesn't take you down. It can run A/B comparisons with a judge to measure model–task fit, cache responses, and give you true cost accounting across every provider in one place.

One more thing worth knowing if your stack is split across ecosystems: flo2 also speaks the Anthropic Messages API, so Anthropic-SDK clients get the same single-key routing without translating to Chat Completions first. You get both an OpenAI-compatible and an Anthropic-compatible front door to the same router.

Takeaways

An OpenAI-compatible API implements OpenAI's Chat Completions request/response schema, usually at /v1/chat/completions.
Switching providers is a base_url + api_key change; the request body stays the same.
Feature parity (tools, JSON mode, vision, streaming usage) varies — validate the features you depend on per provider.
A gateway gives you one OpenAI-compatible base URL in front of many providers, with routing, fallback, and real cost accounting.

If you want to stop hard-coding base URLs and start routing across providers from a single endpoint, try flo2 — it's free during Beta and adds no markup on top of provider pricing. New to the concept? Start with what is an LLM gateway.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →