2026-06-03 · flo2 blog

Use Groq with the OpenAI SDK: Base URL & Compatible API

If you already have code built on the OpenAI SDK, you can talk to Groq without rewriting any of it. Groq is OpenAI-compatible: it exposes the same Chat Completions wire format at the base URL https://api.groq.com/openai/v1, so the official openai client speaks to Groq's LPU-backed models the moment you change three values — the base URL, the API key, and the model name. This guide gives you the exact curl, Python, and Node calls, spells out what Groq's compatibility layer covers (and where to double-check the docs), and shows how to migrate an existing OpenAI app over in a single sitting.

Where Groq's OpenAI-compatible endpoint lives

Groq mounts its OpenAI-compatible routes under an /openai/v1 prefix — note that it's not a bare /v1 like some other providers. The base URL you hand to any OpenAI client is:

https://api.groq.com/openai/v1

From there you get the familiar paths, the most important being /openai/v1/chat/completions. You authenticate with a Groq API key (the ones that start with gsk_), created in the GroqCloud console, passed as a standard bearer token. Two details save a lot of 404 and 401 debugging up front:

The path segment is /openai/v1. If you drop /openai and point a client at https://api.groq.com/v1, requests won't resolve.
Your OpenAI key won't work here, and your Groq key won't work against OpenAI. Each provider validates its own keys, so swap the credential when you swap the base URL.

A curl call to Groq /chat/completions

The fastest way to confirm your key and base URL are correct is a direct request to the Groq /chat/completions endpoint. Export your key first, then:

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "system", "content": "You are concise."},
      {"role": "user", "content": "Say hello in one sentence."}
    ]
  }'

The response is the standard OpenAI shape: a choices array with message.content, a finish_reason, and a usage object reporting prompt_tokens, completion_tokens, and total_tokens. The model string above is an example — Groq's catalog of hosted models changes over time, so check Groq's current model list in their docs and use a model ID that's live for your account rather than hard-coding one you read in a blog post.

Use Groq with the OpenAI Python library

This is the payoff of compatibility. To use Groq with the openai library, you change the constructor arguments and the model string — nothing about the call site moves.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key="gsk_your_groq_key",  # or os.environ["GROQ_API_KEY"]
)

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are a terse assistant."},
        {"role": "user", "content": "Explain what an LPU is in one sentence."},
    ],
)

print(resp.choices[0].message.content)
print(resp.usage)  # prompt_tokens / completion_tokens / total_tokens

That's the entire change. Every framework that lets you override the OpenAI base URL — LangChain, LlamaIndex, the Vercel AI SDK, instructor, your own internal wrapper — can target Groq the same way, because under the hood they all build the same Chat Completions request.

Streaming responses

Streaming behaves exactly as it does against OpenAI: set stream=True and iterate the chunks. With Groq's high tokens-per-second throughput this is where the speed is most visible — deltas arrive fast.

stream = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Count from 1 to 10."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

The same server-sent-events protocol is used over the wire (data: chunks terminated by data: [DONE]), so a streaming client written for OpenAI parses Groq's stream without changes. If you bill on tokens, request usage in the stream with stream_options={"include_usage": True} and confirm the final chunk carries the usage block.

The Groq base URL in Node

The JavaScript/TypeScript SDK follows the identical pattern — set baseURL and apiKey:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.groq.com/openai/v1",
  apiKey: process.env.GROQ_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Write a haiku about fast inference." }],
});

console.log(resp.choices[0].message.content);

What Groq's compatibility layer supports

Groq's OpenAI-compatible surface covers the common path well, but "compatible" is a contract for the core request, not a guarantee that every optional OpenAI field behaves identically. As a practical map:

Chat completions — well supported, including system/user/assistant roles and multi-turn conversations.
Streaming — supported via stream=True, with optional usage accounting in the final chunk.
Common parameters — temperature, top_p, max_tokens, stop, and seed are handled.
Tools / function calling — available on models that support it, using the standard tools and tool_choice parameters. Support is model-dependent, so confirm a given model handles tools before you rely on it.
JSON / structured output — response_format for JSON output is supported on capable models; strict JSON-schema modes vary by model.

Because Groq adds and retires models and tunes feature coverage on its own cadence, treat the list above as a starting point and verify against Groq's current documentation for the model you intend to use — especially for tools, structured output, and any per-model parameter limits. Don't trust a model ID or a feature claim copied from elsewhere; check what's live for your account.

Parity caveats worth knowing

A few OpenAI features either don't map cleanly or only exist on the OpenAI side. These are the ones that most commonly surprise people moving an app to Groq:

OpenAI-only endpoints. The Assistants API, the Responses API, image generation, and audio/TTS are OpenAI products, not part of Groq's Chat Completions surface. (Groq has its own audio/transcription routes — different shape, separate from this compatibility layer.)
Unsupported parameters. Fields OpenAI accepts but a Groq model doesn't may be ignored or rejected. logprobs, n > 1, and penalty parameters are common gaps — test the ones your code sends.
Rate limits differ. Groq enforces its own requests-per-minute and tokens-per-minute limits, and a burst that was fine on OpenAI can return HTTP 429 here. Read the Retry-After header and back off, or front Groq with fallback (below).
Model names won't match. There is no gpt-4o on Groq. You must pass a Groq model ID, which means routing logic keyed on OpenAI model names needs updating.

Migrate an existing OpenAI app to Groq

If your app already uses the OpenAI SDK, the migration is genuinely three values. You do not touch your messages construction, your streaming loop, or your response parsing:

base_url → https://api.groq.com/openai/v1
api_key → your gsk_... Groq key
model → a current Groq model ID instead of an OpenAI one

The cleanest way to do it is to read all three from environment variables so you can flip between OpenAI and Groq without a code change:

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["LLM_BASE_URL"],   # e.g. https://api.groq.com/openai/v1
    api_key=os.environ["LLM_API_KEY"],     # e.g. gsk_...
)

resp = client.chat.completions.create(
    model=os.environ["LLM_MODEL"],         # e.g. llama-3.3-70b-versatile
    messages=[{"role": "user", "content": "Summarize this changelog."}],
)

After the swap, run your test suite and watch for the parity items above — primarily any unsupported parameters you were passing and any code branching on OpenAI model names. Everything that only relies on the core chat-completion path should pass untouched. If you want the conceptual background on why this drop-in swap works across so many vendors, see OpenAI-compatible API.

Keep flexibility: put Groq behind a gateway

Pointing your SDK straight at Groq is the right first step. But hard-coding one provider's base URL trades one lock-in for another: when Groq rate-limits you, retires a model, or you simply want to compare it against another vendor, you're back to editing application code and juggling keys. A gateway removes that by giving you one endpoint with many providers behind it. You can read the bigger picture in what is an LLM gateway.

That's the job flo2 does. flo2 is a developer-first LLM gateway with zero token markup: you bring your own provider keys — Groq, plus OpenAI, Anthropic, Gemini, Cerebras, DeepInfra, Mistral, xAI, and OpenRouter — and pay each provider directly. A single key, usable through both an OpenAI-compatible and an Anthropic-compatible API, routes each request to the cheapest or fastest model that fits, with fallback chains so a Groq 429 or outage automatically rolls over to another provider instead of failing your user.

from openai import OpenAI

client = OpenAI(
    base_url="https://flo2.com/v1",
    api_key="your_flo2_key",
)

resp = client.chat.completions.create(
    model="auto",  # or name a Groq model; flo2 routes and falls back for you
    messages=[{"role": "user", "content": "Summarize this changelog."}],
)

Because flo2 exposes the same OpenAI-compatible surface you just learned, the migration is once again a base_url + api_key change — except now you get Groq's speed when it's the best fit, automatic fallback when it isn't, plus smart routing, AI racing, A/B testing with an LLM judge to measure model–task fit, opt-in response caching, and true per-call cost accounting across every provider in one place. As the zero-markup OpenRouter alternative, it's free during Beta. Get Groq's throughput today, and keep the freedom to route elsewhere tomorrow, with flo2.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →