Use Mistral with the OpenAI SDK: Compatible API & Base URL
Mistral's API is largely OpenAI-compatible at the wire level: point the standard openai Python or JavaScript client at https://api.mistral.ai/v1, swap in your Mistral key and a Mistral model name, and your existing Chat Completions code runs without further changes. This matters because it means migrating an app from OpenAI to Mistral — or running Mistral alongside other providers — is a base-URL and key change, not a rewrite. This guide walks through the mistral openai compatible endpoint in detail: curl and Python examples, what the compatibility covers, streaming, gotchas, and how to route Mistral through a gateway for fallback and cost control. Verify current model IDs and any feature details in the Mistral documentation — model names and supported parameters evolve quickly.
Mistral's OpenAI-compatible base URL
Mistral exposes a Chat Completions endpoint that follows the OpenAI wire format. The base URL is:
https://api.mistral.ai/v1
Authentication is a standard bearer token — your Mistral API key from La Plateforme, sent in the Authorization: Bearer <key> header. This is the same header the OpenAI SDK sends by default, which is exactly why the compatibility works with no client-side changes.
Model IDs are Mistral-specific. Mistral publishes a range of models — flagship large models, efficient small models, and code-focused models like Codestral and Devstral. Always confirm the exact model identifiers in the Mistral models overview before committing a model string to your codebase; the catalog and version suffixes change as new releases land. The Mistral API guide covers model families and API key setup in more depth.
curl: a minimal request to the Mistral chat completions endpoint
Before wiring anything into application code, a raw curl call is the fastest way to confirm your key and base URL are working:
export MISTRAL_API_KEY="your_mistral_key"
curl https://api.mistral.ai/v1/chat/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-small-latest",
"messages": [
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "What is mixture-of-experts architecture?"}
]
}'
The response is the standard OpenAI shape: a choices array, a message.content string, a finish_reason, and a usage object with prompt_tokens, completion_tokens, and total_tokens. If the JSON comes back cleanly, the endpoint and key are good. Substitute the model string with a current ID from the Mistral docs — mistral-small-latest is used here as an example; verify it is still valid before shipping.
Use Mistral with the OpenAI Python SDK
The OpenAI Python client takes base_url and api_key as constructor arguments. Point both at Mistral and everything downstream — message building, response parsing, tool-call handling — stays identical:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.mistral.ai/v1",
api_key=os.environ["MISTRAL_API_KEY"],
)
resp = client.chat.completions.create(
model="mistral-small-latest", # verify current model IDs in Mistral docs
messages=[
{"role": "system", "content": "Reply in one sentence."},
{"role": "user", "content": "Why do developers choose European LLM providers?"},
],
)
print(resp.choices[0].message.content)
print(resp.usage) # prompt_tokens, completion_tokens, total_tokens
Any framework that accepts an OpenAI base_url override works the same way: LangChain, LlamaIndex, instructor, the Vercel AI SDK. They all construct the same HTTP request underneath, so pointing them at Mistral is the same two-argument change.
Streaming with the Mistral OpenAI-compatible endpoint
Mistral supports streaming responses on the compatible endpoint. Set stream=True and iterate exactly as you would against the OpenAI API:
stream = client.chat.completions.create(
model="mistral-small-latest",
messages=[
{"role": "user", "content": "Explain tokenization to a junior developer."}
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
The wire protocol is server-sent events with data: lines terminated by data: [DONE] — identical to OpenAI. Existing streaming parsers work without changes. To capture token counts from a stream, check whether Mistral supports stream_options={"include_usage": True} for the model you are using — that OpenAI-compatible parameter appends a usage block to the final chunk. Verify availability in the Mistral docs.
JavaScript / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.mistral.ai/v1",
apiKey: process.env.MISTRAL_API_KEY,
});
const resp = await client.chat.completions.create({
model: "mistral-small-latest", // verify in Mistral docs
messages: [{ role: "user", content: "List three open-weight Mistral models." }],
});
console.log(resp.choices[0].message.content);
What Mistral's compatibility layer covers — and what it doesn't
Mistral's compatible endpoint targets the Chat Completions surface. Here is what to expect:
- Chat completions — fully supported. Multi-turn conversations with
system,user, andassistantroles work as expected. - Streaming — supported. Set
stream=True/stream: trueand consume server-sent events in the standard way. - Common sampling parameters —
temperature,top_p,max_tokens, andstopare generally supported. Verify model-level limits and any parameter restrictions in the Mistral docs. - Tool / function calling — Mistral supports the
tools/tool_choicepattern, but behavior and supported models evolve. Check the Mistral docs for which models accept function calls and whether there are edge cases in the response shape. - JSON mode / structured output —
response_format: {type: "json_object"}support is model-dependent. Verify for the specific model you are targeting. - Mistral-native features — capabilities like the Mistral native client's file handling, OCR endpoints, or any Codestral-specific fill-in-the-middle route sit outside the standard Chat Completions surface. Use the Mistral API reference directly for those.
- Not available — Assistants API, Responses API, image generation, audio/TTS, embeddings (available but on a separate route), and fine-tuning management are OpenAI-specific products. Mistral has its own equivalents for some of these, accessed through Mistral-native endpoints.
Migrating an existing OpenAI app to Mistral
For an app that uses core chat completions, the migration is three environment variable changes and nothing else in application code:
# Before (OpenAI)
OPENAI_API_KEY="sk-..."
# base_url defaults to https://api.openai.com/v1
# model: "gpt-4o"
# After (Mistral)
MISTRAL_API_KEY="..." # your Mistral key from La Plateforme
MISTRAL_BASE_URL="https://api.mistral.ai/v1"
# model: "mistral-large-latest" — verify current ID in Mistral docs
In code, if you already externalize model strings (which you should), the change is:
client = OpenAI(
base_url=os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1"),
api_key=os.environ["OPENAI_API_KEY"],
)
model = os.environ.get("OPENAI_MODEL", "gpt-4o")
Set OPENAI_BASE_URL=https://api.mistral.ai/v1, OPENAI_API_KEY to your Mistral key, and OPENAI_MODEL to a Mistral model ID. Application code is untouched. This pattern also makes it trivial to A/B test Mistral against your current provider: run both configurations in parallel and compare quality, latency, and cost.
Gotchas when migrating to Mistral
- Model name mismatch. Mistral uses its own model identifiers. Any routing logic or config keyed on
gpt-4oor other provider-specific strings needs a Mistral equivalent. Keep model names in environment variables or config, not scattered across application code, so the switch is one file. - Unsupported or silently ignored parameters. Mistral may not support every parameter OpenAI accepts — things like
logprobs,n > 1, or frequency/presence penalties. Test each parameter your code sends explicitly against Mistral before going to production. Silently ignored parameters are harder to catch than errors. - Context window differences. Mistral models have their own context-window sizes, which may differ from the OpenAI models you replaced. A prompt that fits inside
gpt-4o's window might overflow or truncate differently on a Mistral model. Verify max context in the Mistral docs. - Rate limits are independent. Mistral enforces its own RPM and TPM limits, separate from OpenAI's. A burst workload within OpenAI's quotas might hit HTTP 429 from Mistral. Read the
Retry-Afterheader, back off appropriately, or add a fallback provider (more below). - Tool-call response shape edge cases. Even within the OpenAI-compatible surface, minor differences in how tool results are returned — particularly around parallel tool calls — can surface during migration. Run your tool-calling flows explicitly against Mistral in a test environment before promoting to production.
Routing Mistral behind a gateway
Pointing the OpenAI SDK at api.mistral.ai/v1 is the right first step. The limitation is that it hard-codes a single provider: when Mistral rate-limits you, a specific model is at capacity, or you want to benchmark Mistral against another provider on live traffic, you are back to editing application code. A gateway decouples provider selection from application logic.
That is what flo2 is built for. flo2 is a developer-first LLM gateway with zero token markup. You bring your own Mistral key — plus keys for OpenAI, Anthropic, Gemini, Groq, Cerebras, DeepInfra, and others — and pay each provider directly at their published rates. A single flo2 key, accessed through an OpenAI-compatible or Anthropic-compatible endpoint, routes each request to the cheapest or fastest provider, with automatic fallback chains so a Mistral 429 rolls over to another provider instead of surfacing as an error. Free during Beta.
import os
from openai import OpenAI
# One stable base URL — flo2 routes to Mistral (or best available provider)
client = OpenAI(
base_url="https://flo2.com/v1",
api_key=os.environ["FLO2_API_KEY"],
)
resp = client.chat.completions.create(
model="mistral-small-latest", # pin to Mistral, or let flo2 route automatically
messages=[
{"role": "user", "content": "Summarize this pull request diff."},
],
)
print(resp.choices[0].message.content)
Because flo2 exposes the same OpenAI-compatible surface you just used against Mistral, switching is a base_url and api_key change — identical to the Mistral migration itself. You get Mistral's models when they are the best fit, automatic fallback when they are not, AI racing to whoever responds first, and per-call cost accounting across every provider in one view.
For a full walkthrough of Mistral's model lineup, API key setup, pricing structure, and code models, see the Mistral API guide. For the broader picture of how OpenAI-compatible endpoints work across providers, see OpenAI-compatible API. To start routing Mistral requests with zero markup and automatic fallback, get started with flo2.