2026-06-03 · flo2 blog

Mistral API Guide: Models, Pricing & OpenAI-Compatible Setup

If you want capable models from a European provider with a clean, OpenAI-compatible surface, the Mistral API belongs on your shortlist. Mistral is a Paris-based lab that ships an unusually broad range: commercial flagship and small models for general chat and tool use, a line of open-weight models you can run yourself, and dedicated code models — Codestral and Devstral — aimed at developer workflows. You reach all of it through La Plateforme, Mistral's developer platform, using either Mistral's native client or the plain openai SDK pointed at one base URL. This guide covers the model families, getting a Mistral API key, the OpenAI-compatible setup with working examples, what Mistral is good at, Mistral API pricing conceptually, and where a gateway fits.

One ground rule up front: model IDs, the exact lineup, and per-token prices on Mistral move quickly, and a name that's right today can be stale next quarter. Treat Mistral's official models and pricing pages as the source of truth and verify anything below against them before you ship. This guide gives you the shape, not hard-coded figures.

Mistral API models: commercial, open-weight, and code

Rather than memorizing version numbers, think about Mistral's catalog in three buckets — the framing outlasts any single ID.

Mistral also ships embedding models and has expanded into multimodal and OCR-style capabilities. Because all of this shifts as the lab releases new generations and deprecates old ones, check Mistral's current models page for live IDs, context windows, modalities, and deprecation dates before you hard-code a model string.

Getting a Mistral API key on La Plateforme

Access runs through La Plateforme (sometimes written "Mistral le Plateforme" in searches), and three steps stand between you and your first token:

Treat the key like any other secret: load it from an environment variable, never commit it to source control, and rotate it if it leaks. The examples below read it from MISTRAL_API_KEY.

export MISTRAL_API_KEY="your_key_here"

The Mistral OpenAI-compatible endpoint

The reason Mistral is painless to adopt: alongside its native API it exposes an OpenAI-compatible surface. If your code already speaks to /v1/chat/completions, switching to Mistral is mostly a base URL and an API-key swap — no new SDK, no rewrite. The base URL is:

https://api.mistral.ai/v1

Point any OpenAI-style client at that base URL, pass your Mistral key as the bearer token, and set model to a current Mistral model ID. Here's a minimal curl call against chat completions:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<current-mistral-model-id>",
    "messages": [
      {"role": "user", "content": "Explain function calling in one sentence."}
    ]
  }'

Replace <current-mistral-model-id> with a real ID from Mistral's models page — a current Mistral Large or Small variant, or a Codestral model for code. Request and response shapes mirror the OpenAI Chat Completions API, so fields like temperature, max_tokens, and stream behave as you'd expect. (Mistral's native endpoints differ in a few details; if you use those, follow Mistral's own reference.)

Mistral with the Python openai client

Because the endpoint is OpenAI-compatible, the official openai Python package talks to Mistral with two overrides — base_url and api_key. No Mistral-specific library is required, though Mistral does publish its own mistralai SDK if you'd rather use native features directly.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.mistral.ai/v1",
    api_key=os.environ["MISTRAL_API_KEY"],
)

resp = client.chat.completions.create(
    model="<current-mistral-model-id>",   # from Mistral's models page
    messages=[
        {"role": "user", "content": "Give me three uses for a European LLM provider."}
    ],
)

print(resp.choices[0].message.content)

Streaming works identically — pass stream=True and iterate the chunks:

stream = client.chat.completions.create(
    model="<current-mistral-model-id>",
    messages=[{"role": "user", "content": "Write a haiku about open weights."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Function/tool calling also follows the OpenAI shape here — you pass a tools array and read back tool_calls — which makes Mistral a drop-in for agent code already written against that pattern. For what "compatible" does and doesn't guarantee, see our explainer on the OpenAI-compatible API.

What the Mistral API is known for

Mistral's reputation rests on a few things that don't usually come bundled together. Lean into it where they matter.

Mistral API pricing, conceptually

Mistral bills the usual way — per token, with input and output priced separately — and the rate scales with model size: the Large-tier flagship costs more per token than the Small-tier models, and code models are priced on their own terms. There is generally a free experimentation tier for low-volume development, but its existence and limits change, so don't bank on it without checking.

Two cost levers are worth keeping in mind:

For exact numbers, defer to Mistral's official pricing page rather than any figure quoted in a third-party article (including this one). Per-token rates and free-tier terms are precisely what drifts, so verify current values before building a cost model. For how Mistral sits among budget options, see our roundup of the cheapest LLM API.

Routing Mistral behind a gateway for fallback and cost

Here's how Mistral tends to slot into a real stack: a Small-tier model as the cheap default for most traffic; the flagship for hard cases; Codestral on the code paths; and an open-weight model you self-host if a workload outgrows per-token pricing. That's a lot of routing decisions, and hard-wiring each to a specific Mistral endpoint makes that endpoint's availability your app's availability.

The clean pattern is to put Mistral behind an LLM gateway. Your code calls one stable endpoint; the gateway sends everyday requests to a small model, escalates hard ones to the flagship, and falls back automatically to another provider when Mistral is unavailable or rate-limited. You get Mistral's strengths — European hosting, code models, open weights — without scattering provider-specific retry logic across your services. It's also the natural seam for the open-weight split: keep most traffic on Mistral's hosted API and send the slice that needs self-hosting to your own deployment.

flo2 is a developer-first, bring-your-own-key LLM gateway built for exactly this. You add your own Mistral key — plus OpenAI, Anthropic, Gemini, Groq, Cerebras, DeepInfra, xAI, and OpenRouter — and pay each provider directly with zero markup on tokens; flo2 takes no per-token cut, which makes it a genuine zero-markup OpenRouter alternative. One OpenAI- and Anthropic-compatible key routes each request to the cheapest or fastest model, falls back automatically when a provider is down, and gives you true per-call cost accounting on every Mistral model. It's free during Beta, so you can wire Mistral in behind a fallback and start measuring today.

One key, every model — zero markup.
Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. flow → to