2026-06-03 · flo2 blog

OpenAI Responses API Explained (vs Chat Completions)

The OpenAI Responses API is OpenAI's newer, purpose-built endpoint for agentic and multi-step workloads. If you have been using Chat Completions since the GPT-3 era you will find the Responses API familiar in spirit but meaningfully different in shape — it was redesigned from the ground up to handle tool use, multi-turn state, and built-in capabilities that Chat Completions bolts on as optional features. Understanding when to reach for each one, and how a gateway can accept both shapes without forcing you to choose, will save you from an expensive rewrite later.

What Is the OpenAI Responses API?

OpenAI introduced the Responses API (served at /v1/responses) as the successor to Chat Completions for workloads that go beyond a single stateless exchange. Where Chat Completions treats every request as a fresh conversation that you reconstruct by re-sending the full message history, the Responses API is built with a concept of runs and outputs that makes it easier for the model to call tools, iterate over steps, and produce structured results in a way the caller can observe and resume.

A few things are different by design:

Input shape: instead of a flat messages array, the Responses API accepts an input field that can contain richer content types — text, images, files, and tool results — alongside a persistent conversation thread when you pass a previous_response_id.
Output shape: the response wraps model output inside an output array of typed items (text, tool calls, reasoning steps) rather than choices[].message. Each item has a type field so you can pattern-match without parsing free-form strings.
Built-in tools: web search, file search, and code execution are available as first-class hosted tools — you declare them in a tools array and the model calls them internally without you wiring up a function-call loop. Check the current OpenAI docs for the exact tool list, as it evolves quickly.
State continuity: pass the id from one response as previous_response_id in the next call and OpenAI handles reconstructing the thread server-side. You do not resend the full history on every turn.

Responses API vs Chat Completions: Key Differences

It helps to see the two endpoint shapes side-by-side rather than in the abstract.

Chat Completions (`/v1/chat/completions`)

The Chat Completions request you already know:

POST /v1/chat/completions
{
  "model": "gpt-4o",
  "messages": [
    { "role": "system",  "content": "You are a helpful assistant." },
    { "role": "user",    "content": "What is the capital of France?" }
  ]
}

The response comes back under choices[0].message.content. You are responsible for prepending the entire message history on every request if you want conversational context.

Responses API (`/v1/responses`)

A minimal Responses API call looks like this:

POST /v1/responses
Authorization: Bearer <your-openai-key>
Content-Type: application/json

{
  "model": "gpt-4o",
  "instructions": "You are a helpful assistant.",
  "input": "What is the capital of France?",
  "tools": [
    { "type": "web_search_preview" }
  ]
}

The response wraps the answer inside an output array. If the model decides to call web search first it will emit a tool-call item before the final text item — you see the full chain of reasoning rather than just the answer:

{
  "id": "resp_abc123",
  "model": "gpt-4o",
  "output": [
    {
      "type": "web_search_call",
      "id": "ws_1",
      "status": "completed"
    },
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "The capital of France is Paris." }
      ]
    }
  ],
  "usage": { "input_tokens": 42, "output_tokens": 9, "total_tokens": 51 }
}

To continue the conversation, pass "previous_response_id": "resp_abc123" in your next request. OpenAI reconstructs the thread from that pointer — your payload stays small regardless of conversation length.

When to Use Responses API vs Chat Completions

Neither endpoint is universally better. The right choice depends on what your application actually does.

Stick with Chat Completions when

You have an existing integration and no pain point that the Responses API solves — the migration cost is real.
You need the broadest compatibility with non-OpenAI providers. The OpenAI-compatible API surface that Groq, DeepInfra, Mistral, and others implement targets Chat Completions, not Responses.
Your workload is stateless one-shot completions: summarization, classification, single-turn Q&A. Chat Completions is simpler and has less overhead for these.
You are routing across providers and want a common request shape. Most third-party models do not expose a Responses-compatible endpoint yet.

Consider the Responses API when

You are building an agent that loops over tool calls across multiple turns and you want OpenAI to manage state rather than you stitching history together.
You want to use built-in hosted tools (web search, file search, code interpreter) without implementing a function-call orchestration loop yourself.
You need structured, typed output items and you want to avoid parsing choices[0].message.content for embedded tool invocations.
You are building on OpenAI exclusively and long conversation threads are making your Chat Completions payloads large and expensive.

Migration Considerations

If you are moving an existing Chat Completions integration to the Responses API, the main things to adapt are:

Message history management: delete your history-reconstruction logic if you use previous_response_id. If you keep managing history yourself, pass the full thread as the input array.
Response parsing: update your output handler from choices[0].message.content to iterating output items and filtering by type === "message".
Tool definitions: function-calling tools defined for Chat Completions need to be reformulated as { "type": "function", "function": {...} } under the Responses API schema. Check the current OpenAI reference — the shape is similar but not identical.
Streaming: the Responses API streams via server-sent events like Chat Completions does, but the event types differ. Your SSE parser will need updating.
Provider lock-in: the Responses API is OpenAI-specific. If you might route to Anthropic or a third-party model tomorrow, that becomes harder once you have committed to Responses-only output parsing.

How an LLM Gateway Bridges Both Shapes

The practical challenge for teams that want to hedge across providers is that Chat Completions is the lingua franca of the ecosystem and the Responses API is OpenAI-only. Every time you want to route a request to Claude or a Groq-hosted model you are back to Chat Completions — or you are writing a translation layer yourself.

A gateway that speaks multiple API shapes removes that constraint. flo2 accepts Chat Completions (/v1/chat/completions), the Responses endpoint, and the Anthropic Messages API, and routes each request to whichever provider and model you configure — cheapest, fastest, or a specific target. You keep one integration in your application code regardless of which endpoint shape you prefer, and you can switch underlying providers without a rewrite.

That matters especially for teams building agents. You might start with the Responses API and OpenAI's built-in tools, decide later that a Claude model gives better results for your use case, and want to route there without rearchitecting your client. A gateway that understands both surfaces makes that a configuration change rather than a coding sprint.

For more on how gateways normalize API surfaces, see what is an LLM gateway.

Summary

The OpenAI Responses API is a cleaner abstraction for agentic, multi-step, and tool-heavy workloads — typed output items, server-side state, and built-in hosted tools. Chat Completions remains the right choice when you need broad provider compatibility or are running stateless one-shot tasks. The two endpoints are not mutually exclusive in your architecture; a gateway that accepts both shapes lets you use whichever fits the task while keeping the flexibility to route to any model underneath.

Try routing your Responses API and Chat Completions traffic through flo2 — zero token markup, BYO provider keys, free during beta.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →