2026-06-03 · flo2 blog

Use Mistral with the OpenAI SDK: Compatible API & Base URL

Mistral's API is largely OpenAI-compatible at the wire level: point the standard openai Python or JavaScript client at https://api.mistral.ai/v1, swap in your Mistral key and a Mistral model name, and your existing Chat Completions code runs without further changes. This matters because it means migrating an app from OpenAI to Mistral — or running Mistral alongside other providers — is a base-URL and key change, not a rewrite. This guide walks through the mistral openai compatible endpoint in detail: curl and Python examples, what the compatibility covers, streaming, gotchas, and how to route Mistral through a gateway for fallback and cost control. Verify current model IDs and any feature details in the Mistral documentation — model names and supported parameters evolve quickly.

Mistral's OpenAI-compatible base URL

Mistral exposes a Chat Completions endpoint that follows the OpenAI wire format. The base URL is:

https://api.mistral.ai/v1

Authentication is a standard bearer token — your Mistral API key from La Plateforme, sent in the Authorization: Bearer <key> header. This is the same header the OpenAI SDK sends by default, which is exactly why the compatibility works with no client-side changes.

Model IDs are Mistral-specific. Mistral publishes a range of models — flagship large models, efficient small models, and code-focused models like Codestral and Devstral. Always confirm the exact model identifiers in the Mistral models overview before committing a model string to your codebase; the catalog and version suffixes change as new releases land. The Mistral API guide covers model families and API key setup in more depth.

curl: a minimal request to the Mistral chat completions endpoint

Before wiring anything into application code, a raw curl call is the fastest way to confirm your key and base URL are working:

export MISTRAL_API_KEY="your_mistral_key"

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-small-latest",
    "messages": [
      {"role": "system", "content": "You are a concise technical assistant."},
      {"role": "user",   "content": "What is mixture-of-experts architecture?"}
    ]
  }'

The response is the standard OpenAI shape: a choices array, a message.content string, a finish_reason, and a usage object with prompt_tokens, completion_tokens, and total_tokens. If the JSON comes back cleanly, the endpoint and key are good. Substitute the model string with a current ID from the Mistral docs — mistral-small-latest is used here as an example; verify it is still valid before shipping.

Use Mistral with the OpenAI Python SDK

The OpenAI Python client takes base_url and api_key as constructor arguments. Point both at Mistral and everything downstream — message building, response parsing, tool-call handling — stays identical:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.mistral.ai/v1",
    api_key=os.environ["MISTRAL_API_KEY"],
)

resp = client.chat.completions.create(
    model="mistral-small-latest",   # verify current model IDs in Mistral docs
    messages=[
        {"role": "system", "content": "Reply in one sentence."},
        {"role": "user",   "content": "Why do developers choose European LLM providers?"},
    ],
)

print(resp.choices[0].message.content)
print(resp.usage)   # prompt_tokens, completion_tokens, total_tokens

Any framework that accepts an OpenAI base_url override works the same way: LangChain, LlamaIndex, instructor, the Vercel AI SDK. They all construct the same HTTP request underneath, so pointing them at Mistral is the same two-argument change.

Streaming with the Mistral OpenAI-compatible endpoint

Mistral supports streaming responses on the compatible endpoint. Set stream=True and iterate exactly as you would against the OpenAI API:

stream = client.chat.completions.create(
    model="mistral-small-latest",
    messages=[
        {"role": "user", "content": "Explain tokenization to a junior developer."}
    ],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

The wire protocol is server-sent events with data: lines terminated by data: [DONE] — identical to OpenAI. Existing streaming parsers work without changes. To capture token counts from a stream, check whether Mistral supports stream_options={"include_usage": True} for the model you are using — that OpenAI-compatible parameter appends a usage block to the final chunk. Verify availability in the Mistral docs.

JavaScript / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.mistral.ai/v1",
  apiKey:  process.env.MISTRAL_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "mistral-small-latest",   // verify in Mistral docs
  messages: [{ role: "user", content: "List three open-weight Mistral models." }],
});

console.log(resp.choices[0].message.content);

What Mistral's compatibility layer covers — and what it doesn't

Mistral's compatible endpoint targets the Chat Completions surface. Here is what to expect:

Chat completions — fully supported. Multi-turn conversations with system, user, and assistant roles work as expected.
Streaming — supported. Set stream=True/stream: true and consume server-sent events in the standard way.
Common sampling parameters — temperature, top_p, max_tokens, and stop are generally supported. Verify model-level limits and any parameter restrictions in the Mistral docs.
Tool / function calling — Mistral supports the tools / tool_choice pattern, but behavior and supported models evolve. Check the Mistral docs for which models accept function calls and whether there are edge cases in the response shape.
JSON mode / structured output — response_format: {type: "json_object"} support is model-dependent. Verify for the specific model you are targeting.
Mistral-native features — capabilities like the Mistral native client's file handling, OCR endpoints, or any Codestral-specific fill-in-the-middle route sit outside the standard Chat Completions surface. Use the Mistral API reference directly for those.
Not available — Assistants API, Responses API, image generation, audio/TTS, embeddings (available but on a separate route), and fine-tuning management are OpenAI-specific products. Mistral has its own equivalents for some of these, accessed through Mistral-native endpoints.

Migrating an existing OpenAI app to Mistral

For an app that uses core chat completions, the migration is three environment variable changes and nothing else in application code:

# Before (OpenAI)
OPENAI_API_KEY="sk-..."
# base_url defaults to https://api.openai.com/v1
# model: "gpt-4o"

# After (Mistral)
MISTRAL_API_KEY="..."         # your Mistral key from La Plateforme
MISTRAL_BASE_URL="https://api.mistral.ai/v1"
# model: "mistral-large-latest"  — verify current ID in Mistral docs

In code, if you already externalize model strings (which you should), the change is:

client = OpenAI(
    base_url=os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1"),
    api_key=os.environ["OPENAI_API_KEY"],
)
model = os.environ.get("OPENAI_MODEL", "gpt-4o")

Set OPENAI_BASE_URL=https://api.mistral.ai/v1, OPENAI_API_KEY to your Mistral key, and OPENAI_MODEL to a Mistral model ID. Application code is untouched. This pattern also makes it trivial to A/B test Mistral against your current provider: run both configurations in parallel and compare quality, latency, and cost.

Gotchas when migrating to Mistral

Model name mismatch. Mistral uses its own model identifiers. Any routing logic or config keyed on gpt-4o or other provider-specific strings needs a Mistral equivalent. Keep model names in environment variables or config, not scattered across application code, so the switch is one file.
Unsupported or silently ignored parameters. Mistral may not support every parameter OpenAI accepts — things like logprobs, n > 1, or frequency/presence penalties. Test each parameter your code sends explicitly against Mistral before going to production. Silently ignored parameters are harder to catch than errors.
Context window differences. Mistral models have their own context-window sizes, which may differ from the OpenAI models you replaced. A prompt that fits inside gpt-4o's window might overflow or truncate differently on a Mistral model. Verify max context in the Mistral docs.
Rate limits are independent. Mistral enforces its own RPM and TPM limits, separate from OpenAI's. A burst workload within OpenAI's quotas might hit HTTP 429 from Mistral. Read the Retry-After header, back off appropriately, or add a fallback provider (more below).
Tool-call response shape edge cases. Even within the OpenAI-compatible surface, minor differences in how tool results are returned — particularly around parallel tool calls — can surface during migration. Run your tool-calling flows explicitly against Mistral in a test environment before promoting to production.

Routing Mistral behind a gateway

Pointing the OpenAI SDK at api.mistral.ai/v1 is the right first step. The limitation is that it hard-codes a single provider: when Mistral rate-limits you, a specific model is at capacity, or you want to benchmark Mistral against another provider on live traffic, you are back to editing application code. A gateway decouples provider selection from application logic.

That is what flo2 is built for. flo2 is a developer-first LLM gateway with zero token markup. You bring your own Mistral key — plus keys for OpenAI, Anthropic, Gemini, Groq, Cerebras, DeepInfra, and others — and pay each provider directly at their published rates. A single flo2 key, accessed through an OpenAI-compatible or Anthropic-compatible endpoint, routes each request to the cheapest or fastest provider, with automatic fallback chains so a Mistral 429 rolls over to another provider instead of surfacing as an error. Free during Beta.

import os
from openai import OpenAI

# One stable base URL — flo2 routes to Mistral (or best available provider)
client = OpenAI(
    base_url="https://flo2.com/v1",
    api_key=os.environ["FLO2_API_KEY"],
)

resp = client.chat.completions.create(
    model="mistral-small-latest",   # pin to Mistral, or let flo2 route automatically
    messages=[
        {"role": "user", "content": "Summarize this pull request diff."},
    ],
)

print(resp.choices[0].message.content)

Because flo2 exposes the same OpenAI-compatible surface you just used against Mistral, switching is a base_url and api_key change — identical to the Mistral migration itself. You get Mistral's models when they are the best fit, automatic fallback when they are not, AI racing to whoever responds first, and per-call cost accounting across every provider in one view.

For a full walkthrough of Mistral's model lineup, API key setup, pricing structure, and code models, see the Mistral API guide. For the broader picture of how OpenAI-compatible endpoints work across providers, see OpenAI-compatible API. To start routing Mistral requests with zero markup and automatic fallback, get started with flo2.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →