2026-06-03 · flo2 blog

DeepSeek API Guide: Models, Pricing & OpenAI-Compatible Setup

If you're hunting for serious model quality without a serious bill, the DeepSeek API is probably already on your shortlist. DeepSeek made its name shipping open, capable models — a strong general-purpose chat model in the V3 family and a dedicated reasoning model in the R1 family — priced well below what the frontier US labs charge for comparable work. The pitch for developers is simple: a familiar OpenAI-compatible endpoint, solid output quality, and a per-token cost low enough that bulk jobs which were uneconomical elsewhere suddenly pencil out. This guide covers the models, getting a key, the OpenAI-compatible setup with working examples, what DeepSeek is genuinely good at, how to think about pricing, and where a gateway fits.

One ground rule up front: model IDs, exact prices, and promotional discounts on DeepSeek move quickly, and a number that's right today can be stale next quarter. Treat DeepSeek's official models and pricing pages as the source of truth, and verify anything below against them before shipping.

DeepSeek API models: V3-class chat and R1-class reasoning

DeepSeek's lineup centers on two jobs, and it helps to think in those terms rather than memorizing a single version number.

DeepSeek exposes these through stable model IDs rather than forcing you to track every release. The long-standing OpenAI-compatible IDs are deepseek-chat for the chat model and deepseek-reasoner for the reasoning model. The catch: that mapping is a moving target — the IDs point at whatever the current generation is under the hood, and DeepSeek has signaled deprecations as the lineup advances to newer generations. So check DeepSeek's current models page for the live IDs, their context windows, and any deprecation dates before you hard-code a string from a blog post.

Getting a DeepSeek API key

Access runs through DeepSeek's developer platform, and three steps stand between you and your first token:

Treat the key like any other secret: load it from an environment variable, never commit it to source control, and rotate it if it leaks. The examples below read it from DEEPSEEK_API_KEY.

export DEEPSEEK_API_KEY="sk-your_key_here"

The DeepSeek OpenAI-compatible endpoint

The reason DeepSeek is so painless to adopt: its API is OpenAI-compatible. If your code already speaks to /v1/chat/completions, moving a call to DeepSeek is mostly a base URL and an API key swap — no new SDK, no rewrite. The base URL is:

https://api.deepseek.com

Point any OpenAI-style client at that base URL, pass your DeepSeek key as the bearer token, and set model to a current DeepSeek model ID. Here's a minimal curl call against chat completions:

curl https://api.deepseek.com/chat/completions \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "user", "content": "Explain a vector database in one sentence."}
    ]
  }'

Request and response shapes mirror the OpenAI Chat Completions API, so fields like temperature, max_tokens, and stream behave as you'd expect. Swap "deepseek-chat" for "deepseek-reasoner" when you want the reasoning model — but confirm both IDs against the current docs, since they're the ones most likely to change.

DeepSeek with the Python openai client

Because the API is OpenAI-compatible, the official openai Python package talks to DeepSeek with two overrides — base_url and api_key. No DeepSeek-specific library required.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.deepseek.com",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

resp = client.chat.completions.create(
    model="deepseek-chat",          # verify current model ID
    messages=[
        {"role": "user", "content": "Give me three uses for a cheap, capable LLM."}
    ],
)

print(resp.choices[0].message.content)

Streaming works identically — pass stream=True and iterate the chunks. One reasoning-model wrinkle: deepseek-reasoner exposes the model's chain of thought separately from the final answer (commonly on a reasoning_content field), so you can log the thinking and the conclusion independently. Check the reasoning-model docs for exact field names and how thinking vs. non-thinking modes are toggled, as that shifts between generations.

What the DeepSeek API is known for

DeepSeek's reputation rests on one combination that's hard to find together: strong quality at very low cost. Lean into it where that trade matters.

DeepSeek API pricing, conceptually

DeepSeek bills the usual way — per token, input and output priced separately, with the reasoning model costing more in practice because it emits extra reasoning tokens. It lands in nearly every "cheapest model" conversation because those per-token rates sit at the low end of the market for the quality you get. For exact numbers, defer to DeepSeek's official pricing page rather than any figure quoted in a third-party article (including this one).

Two distinctive cost levers are worth understanding:

For where DeepSeek sits among the budget options and how to model true cost, see our roundup of the cheapest LLM API.

A note on data governance

This part is neutral but important. As with any external API, read the provider's data-handling terms — retention, training use, and processing location — and check them against your own obligations. Teams with stricter data-governance or data-residency requirements sometimes prefer not to send data to a given first-party endpoint directly. The pragmatic path is to run a DeepSeek-class model via a provider or region that meets your policy: the open models are hosted by third-party inference providers (for example DeepInfra or OpenRouter) and can also be self-hosted on infrastructure you control. The point isn't to discourage DeepSeek — it's that "which model" and "whose endpoint" are separable decisions, and a gateway makes that separation easy to manage.

Routing DeepSeek for cheap bulk work, with fallback

Here's how DeepSeek tends to slot into a real stack: it's the cheap, capable default for the bulk of your traffic, with something else on standby for the cases it isn't ideal for — a provider outage, a task that wants a specific frontier model, or a policy that routes certain data elsewhere. Hard-wiring your app to a single DeepSeek endpoint makes its availability your availability.

The clean pattern is to put DeepSeek behind an LLM gateway. Your code calls one stable endpoint; the gateway routes everyday and high-volume requests to a DeepSeek model for the cost win, and falls back automatically to another provider when DeepSeek is unavailable or when a request needs a different model. You get DeepSeek's economics as the cheap path plus a safety net for the tail, without scattering provider-specific retry logic across your services. It's also the natural seam for the data-governance split above: route most traffic to DeepSeek, send the sensitive slice to a compliant DeepSeek-class host or your own deployment.

flo2 is a developer-first, bring-your-own-key LLM gateway built for exactly this. You add your own provider keys — DeepInfra, OpenRouter, OpenAI, Anthropic, Gemini, Groq, Cerebras, Mistral, xAI — and pay each provider directly with zero markup on tokens; flo2 takes no per-token cut, which makes it a genuine zero-markup OpenRouter alternative. One OpenAI- and Anthropic-compatible key routes each request to the cheapest or fastest model and falls back automatically when a provider is down, so a DeepSeek-class model can be your low-cost default without becoming a single point of failure. It's free during Beta, so you can wire DeepSeek in behind a fallback and start measuring today.

One key, every model — zero markup.
Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. flow → to