2026-06-03 · flo2 blog

OpenAI Responses API Explained (vs Chat Completions)

The OpenAI Responses API is OpenAI's newer, purpose-built endpoint for agentic and multi-step workloads. If you have been using Chat Completions since the GPT-3 era you will find the Responses API familiar in spirit but meaningfully different in shape — it was redesigned from the ground up to handle tool use, multi-turn state, and built-in capabilities that Chat Completions bolts on as optional features. Understanding when to reach for each one, and how a gateway can accept both shapes without forcing you to choose, will save you from an expensive rewrite later.

What Is the OpenAI Responses API?

OpenAI introduced the Responses API (served at /v1/responses) as the successor to Chat Completions for workloads that go beyond a single stateless exchange. Where Chat Completions treats every request as a fresh conversation that you reconstruct by re-sending the full message history, the Responses API is built with a concept of runs and outputs that makes it easier for the model to call tools, iterate over steps, and produce structured results in a way the caller can observe and resume.

A few things are different by design:

Responses API vs Chat Completions: Key Differences

It helps to see the two endpoint shapes side-by-side rather than in the abstract.

Chat Completions (/v1/chat/completions)

The Chat Completions request you already know:

POST /v1/chat/completions
{
  "model": "gpt-4o",
  "messages": [
    { "role": "system",  "content": "You are a helpful assistant." },
    { "role": "user",    "content": "What is the capital of France?" }
  ]
}

The response comes back under choices[0].message.content. You are responsible for prepending the entire message history on every request if you want conversational context.

Responses API (/v1/responses)

A minimal Responses API call looks like this:

POST /v1/responses
Authorization: Bearer <your-openai-key>
Content-Type: application/json

{
  "model": "gpt-4o",
  "instructions": "You are a helpful assistant.",
  "input": "What is the capital of France?",
  "tools": [
    { "type": "web_search_preview" }
  ]
}

The response wraps the answer inside an output array. If the model decides to call web search first it will emit a tool-call item before the final text item — you see the full chain of reasoning rather than just the answer:

{
  "id": "resp_abc123",
  "model": "gpt-4o",
  "output": [
    {
      "type": "web_search_call",
      "id": "ws_1",
      "status": "completed"
    },
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "The capital of France is Paris." }
      ]
    }
  ],
  "usage": { "input_tokens": 42, "output_tokens": 9, "total_tokens": 51 }
}

To continue the conversation, pass "previous_response_id": "resp_abc123" in your next request. OpenAI reconstructs the thread from that pointer — your payload stays small regardless of conversation length.

When to Use Responses API vs Chat Completions

Neither endpoint is universally better. The right choice depends on what your application actually does.

Stick with Chat Completions when

Consider the Responses API when

Migration Considerations

If you are moving an existing Chat Completions integration to the Responses API, the main things to adapt are:

How an LLM Gateway Bridges Both Shapes

The practical challenge for teams that want to hedge across providers is that Chat Completions is the lingua franca of the ecosystem and the Responses API is OpenAI-only. Every time you want to route a request to Claude or a Groq-hosted model you are back to Chat Completions — or you are writing a translation layer yourself.

A gateway that speaks multiple API shapes removes that constraint. flo2 accepts Chat Completions (/v1/chat/completions), the Responses endpoint, and the Anthropic Messages API, and routes each request to whichever provider and model you configure — cheapest, fastest, or a specific target. You keep one integration in your application code regardless of which endpoint shape you prefer, and you can switch underlying providers without a rewrite.

That matters especially for teams building agents. You might start with the Responses API and OpenAI's built-in tools, decide later that a Claude model gives better results for your use case, and want to route there without rearchitecting your client. A gateway that understands both surfaces makes that a configuration change rather than a coding sprint.

For more on how gateways normalize API surfaces, see what is an LLM gateway.

Summary

The OpenAI Responses API is a cleaner abstraction for agentic, multi-step, and tool-heavy workloads — typed output items, server-side state, and built-in hosted tools. Chat Completions remains the right choice when you need broad provider compatibility or are running stateless one-shot tasks. The two endpoints are not mutually exclusive in your architecture; a gateway that accepts both shapes lets you use whichever fits the task while keeping the flexibility to route to any model underneath.

Try routing your Responses API and Chat Completions traffic through flo2 — zero token markup, BYO provider keys, free during beta.

One key, every model — zero markup.
Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. flow → to