2026-06-03 · flo2 blog

Mistral API Guide: Models, Pricing & OpenAI-Compatible Setup

If you want capable models from a European provider with a clean, OpenAI-compatible surface, the Mistral API belongs on your shortlist. Mistral is a Paris-based lab that ships an unusually broad range: commercial flagship and small models for general chat and tool use, a line of open-weight models you can run yourself, and dedicated code models — Codestral and Devstral — aimed at developer workflows. You reach all of it through La Plateforme, Mistral's developer platform, using either Mistral's native client or the plain openai SDK pointed at one base URL. This guide covers the model families, getting a Mistral API key, the OpenAI-compatible setup with working examples, what Mistral is good at, Mistral API pricing conceptually, and where a gateway fits.

One ground rule up front: model IDs, the exact lineup, and per-token prices on Mistral move quickly, and a name that's right today can be stale next quarter. Treat Mistral's official models and pricing pages as the source of truth and verify anything below against them before you ship. This guide gives you the shape, not hard-coded figures.

Mistral API models: commercial, open-weight, and code

Rather than memorizing version numbers, think about Mistral's catalog in three buckets — the framing outlasts any single ID.

Commercial flagship models — the large, general-purpose tier (the Mistral Large lineage) for the hard end of the spread: complex reasoning, nuanced instruction-following, multilingual work, and demanding tool use. Reach for it when answer quality is the priority.
Commercial small / efficient models — the Mistral Small lineage and related efficient tiers, tuned for cost and latency on the everyday workload: chat, summarization, extraction, classification, and routing. Good answers, fast and cheaper, when you don't need the flagship.
Open-weight models — Mistral built its reputation on releasing capable open models (the original Mistral 7B and the Mixtral mixture-of-experts models among them), which you can run through the API or self-host. Licenses vary — some Apache-2.0, others under Mistral's own research/community license — so check the license on the specific model before you build on it.
Code models — Codestral targets code generation, completion, and fill-in-the-middle across many languages, while Devstral aims at agentic software-engineering tasks. These are why the Codestral API shows up so often in developer searches; note that code-completion endpoints can have their own route and terms distinct from chat.

Mistral also ships embedding models and has expanded into multimodal and OCR-style capabilities. Because all of this shifts as the lab releases new generations and deprecates old ones, check Mistral's current models page for live IDs, context windows, modalities, and deprecation dates before you hard-code a model string.

Getting a Mistral API key on La Plateforme

Access runs through La Plateforme (sometimes written "Mistral le Plateforme" in searches), and three steps stand between you and your first token:

Sign up at Mistral's developer console and open the API Keys section.
Create a new key and copy it immediately — like most providers, the full secret is shown once.
Set up billing. Mistral has historically offered a free experimentation tier alongside paid usage; whether one is active, and its limits, is something to confirm in the console.

Treat the key like any other secret: load it from an environment variable, never commit it to source control, and rotate it if it leaks. The examples below read it from MISTRAL_API_KEY.

export MISTRAL_API_KEY="your_key_here"

The Mistral OpenAI-compatible endpoint

The reason Mistral is painless to adopt: alongside its native API it exposes an OpenAI-compatible surface. If your code already speaks to /v1/chat/completions, switching to Mistral is mostly a base URL and an API-key swap — no new SDK, no rewrite. The base URL is:

https://api.mistral.ai/v1

Point any OpenAI-style client at that base URL, pass your Mistral key as the bearer token, and set model to a current Mistral model ID. Here's a minimal curl call against chat completions:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<current-mistral-model-id>",
    "messages": [
      {"role": "user", "content": "Explain function calling in one sentence."}
    ]
  }'

Replace <current-mistral-model-id> with a real ID from Mistral's models page — a current Mistral Large or Small variant, or a Codestral model for code. Request and response shapes mirror the OpenAI Chat Completions API, so fields like temperature, max_tokens, and stream behave as you'd expect. (Mistral's native endpoints differ in a few details; if you use those, follow Mistral's own reference.)

Mistral with the Python openai client

Because the endpoint is OpenAI-compatible, the official openai Python package talks to Mistral with two overrides — base_url and api_key. No Mistral-specific library is required, though Mistral does publish its own mistralai SDK if you'd rather use native features directly.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.mistral.ai/v1",
    api_key=os.environ["MISTRAL_API_KEY"],
)

resp = client.chat.completions.create(
    model="<current-mistral-model-id>",   # from Mistral's models page
    messages=[
        {"role": "user", "content": "Give me three uses for a European LLM provider."}
    ],
)

print(resp.choices[0].message.content)

Streaming works identically — pass stream=True and iterate the chunks:

stream = client.chat.completions.create(
    model="<current-mistral-model-id>",
    messages=[{"role": "user", "content": "Write a haiku about open weights."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Function/tool calling also follows the OpenAI shape here — you pass a tools array and read back tool_calls — which makes Mistral a drop-in for agent code already written against that pattern. For what "compatible" does and doesn't guarantee, see our explainer on the OpenAI-compatible API.

What the Mistral API is known for

Mistral's reputation rests on a few things that don't usually come bundled together. Lean into it where they matter.

A European provider. For teams that care about where their AI vendor is based and how data is handled under EU norms, a Paris-headquartered lab is a meaningful differentiator. Always read the actual data-handling terms — but for many organizations this is the headline reason Mistral is on the list.
Strong open-weight models. Mistral helped popularize high-quality open models (7B and the Mixtral MoE line), giving you the rare option to prototype on a hosted endpoint and later self-host the same model family if cost, latency, or data control demands it.
Dedicated code models. Codestral (generation, completion, fill-in-the-middle) and Devstral (agentic coding) are purpose-built for developer tooling — autocomplete, refactoring, code-aware agents — rather than chat models pressed into the role.
Solid function calling and multilingual range. Mistral handles tool use and several European languages well, which suits agents and multilingual apps.
Easy migration. Being OpenAI-compatible, you can A/B a Mistral model against your current one by changing a base URL and a model string.

Mistral API pricing, conceptually

Mistral bills the usual way — per token, with input and output priced separately — and the rate scales with model size: the Large-tier flagship costs more per token than the Small-tier models, and code models are priced on their own terms. There is generally a free experimentation tier for low-volume development, but its existence and limits change, so don't bank on it without checking.

Two cost levers are worth keeping in mind:

Pick the smallest model that clears the bar. The biggest lever on a Mistral bill is matching model to task — routing routine work to a Small-tier model and reserving the flagship for genuinely hard prompts often cuts cost far more than shaving tokens.
Self-hosting the open models. Because several Mistral models are open-weight, very high-volume workloads can sometimes be cheaper to run on your own infrastructure than to pay per token — at the cost of operating the serving stack yourself.

For exact numbers, defer to Mistral's official pricing page rather than any figure quoted in a third-party article (including this one). Per-token rates and free-tier terms are precisely what drifts, so verify current values before building a cost model. For how Mistral sits among budget options, see our roundup of the cheapest LLM API.

Routing Mistral behind a gateway for fallback and cost

Here's how Mistral tends to slot into a real stack: a Small-tier model as the cheap default for most traffic; the flagship for hard cases; Codestral on the code paths; and an open-weight model you self-host if a workload outgrows per-token pricing. That's a lot of routing decisions, and hard-wiring each to a specific Mistral endpoint makes that endpoint's availability your app's availability.

The clean pattern is to put Mistral behind an LLM gateway. Your code calls one stable endpoint; the gateway sends everyday requests to a small model, escalates hard ones to the flagship, and falls back automatically to another provider when Mistral is unavailable or rate-limited. You get Mistral's strengths — European hosting, code models, open weights — without scattering provider-specific retry logic across your services. It's also the natural seam for the open-weight split: keep most traffic on Mistral's hosted API and send the slice that needs self-hosting to your own deployment.

flo2 is a developer-first, bring-your-own-key LLM gateway built for exactly this. You add your own Mistral key — plus OpenAI, Anthropic, Gemini, Groq, Cerebras, DeepInfra, xAI, and OpenRouter — and pay each provider directly with zero markup on tokens; flo2 takes no per-token cut, which makes it a genuine zero-markup OpenRouter alternative. One OpenAI- and Anthropic-compatible key routes each request to the cheapest or fastest model, falls back automatically when a provider is down, and gives you true per-call cost accounting on every Mistral model. It's free during Beta, so you can wire Mistral in behind a fallback and start measuring today.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →