2026-06-03 · flo2 blog

OpenRouter Chat Completions API: Endpoint, Setup & Example

The OpenRouter Chat Completions endpoint is the main surface you'll use to send inference requests through OpenRouter. It speaks the same wire format as OpenAI's Chat Completions API, which means you can reach it with the official OpenAI SDK, curl, or any HTTP client by swapping in a single base URL. This guide covers the exact endpoint shape, authentication, the model field that selects a provider, streaming, and how any OpenAI-compatible API gateway — including flo2 — gives you one stable integration point regardless of which model is running behind it.

The OpenRouter Chat Completions endpoint

OpenRouter exposes its Chat Completions surface at:

POST https://openrouter.ai/api/v1/chat/completions

That means the OpenRouter /v1/chat/completions path lives under the base URL https://openrouter.ai/api/v1. Set that as your base_url and you're done — every tool, library, or SDK that knows how to talk to OpenAI will work without further changes.

Authentication follows the standard bearer-token pattern. Every request must carry your OpenRouter API key in the Authorization header:

Authorization: Bearer sk-or-your-key-here

Two additional headers are recommended (and sometimes required) by OpenRouter to help with usage attribution and abuse prevention:

HTTP-Referer: https://yourapp.com
X-Title: Your App Name

These headers are optional for basic usage but worth including from the start so your traffic is correctly identified in dashboards and rate-limit policies.

Request and response shape

The request body follows the OpenRouter OpenAI-compatible schema exactly. A minimal JSON body looks like this:

{
  "model": "openai/gpt-4o-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "What is the capital of France?" }
  ]
}

The response is equally familiar:

{
  "id": "gen-...",
  "model": "openai/gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 9,
    "total_tokens": 33
  }
}

Fields you already use with OpenAI — temperature, max_tokens, top_p, stop, tools, response_format — are forwarded to the underlying provider when that provider supports them.

How the model field selects a provider and model

The model field is the one thing that's genuinely different from a plain OpenAI call. Instead of a bare model name like gpt-4o, you pass a provider/model-slug pair:

"model": "anthropic/claude-3-5-sonnet"
"model": "google/gemini-2.5-pro"
"model": "meta-llama/llama-3-70b-instruct"
"model": "mistralai/mistral-large"

OpenRouter uses the prefix to route the request to the right upstream provider. If you omit the prefix and pass just a model name, OpenRouter tries to resolve it, but using the full slug is safer and unambiguous. You can browse the full model list at https://openrouter.ai/models.

curl example

Here is a complete openrouter api chat completions request using curl:

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "HTTP-Referer: https://yourapp.com" \
  -H "X-Title: My App" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "Explain async/await in Python in two sentences." }
    ]
  }'

Run this with a real key and you'll get a standard Chat Completions response within a few hundred milliseconds.

Python example with the OpenAI SDK

Because the endpoint is OpenRouter OpenAI-compatible, the official openai Python package works with zero changes to your inference logic:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
    default_headers={
        "HTTP-Referer": "https://yourapp.com",
        "X-Title": "My App",
    },
)

response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",
    messages=[
        {"role": "system", "content": "You are a terse senior engineer."},
        {"role": "user",   "content": "What's wrong with mutable default arguments in Python?"},
    ],
)

print(response.choices[0].message.content)

Two lines change versus a plain OpenAI call: base_url and the provider-prefixed model. The rest of your existing SDK usage — retries, timeouts, with_raw_response, async client — carries over unchanged.

Streaming responses

Set stream=True (or "stream": true in JSON) and the endpoint switches to server-sent events (SSE). Each event delivers a delta chunk; the stream closes with data: [DONE].

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Count to ten, slowly."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
print()  # newline after stream ends

The streaming shape — choices[0].delta.content, finish reason on the last chunk, final [DONE] sentinel — matches OpenAI's SSE format exactly. Not every model on OpenRouter supports streaming, but all the major ones do.

Why an OpenAI-compatible gateway keeps your integration stable

The attractive part of the openrouter api chat completions endpoint is that it collapses many provider APIs into one. But it also introduces a dependency: if OpenRouter has an outage, raises prices, or doesn't yet carry a model you need, your only option is to change your code.

The better architectural answer is to sit an OpenAI-compatible gateway in front of everything — including OpenRouter — so that your application code never changes, only the routing config does. See our OpenAI-compatible API guide for a full breakdown of the pattern.

How flo2 fits in

flo2 is a developer-first LLM gateway that exposes a single OpenAI-compatible /v1/chat/completions endpoint. You bring your own provider keys — OpenRouter, Anthropic, OpenAI, Groq, or any other — and flo2 routes requests with zero token markup. During the beta it's free to use.

Concretely, switching from OpenRouter directly to flo2 is the same two-line swap:

client = OpenAI(
    base_url="https://flo2.com/v1",   # flo2 gateway
    api_key=os.environ["FLO2_API_KEY"],
)

# same model slugs, same messages, same stream flag
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",
    messages=[{"role": "user", "content": "Hello."}],
)

Because flo2 is OpenAI-compatible, every curl snippet and SDK call in this article works against it — just change the base URL. You get fallback routing, provider redundancy, and a single invoice instead of managing credits across multiple dashboards. Check the OpenRouter API key guide for help generating the upstream keys you'd pass to flo2.

If you want a gateway that routes across providers without markup on every token, flo2 is worth five minutes during the free beta.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →