2026-06-03 · flo2 blog

Anthropic API Guide: Claude, Messages API & First Request

If you want to build with Claude, you go through the Anthropic API. It is the only official way to call Claude models programmatically — no scraping, no unofficial wrappers required. This guide walks you through everything you need to make your first real request: getting an Anthropic API key, understanding the authentication headers, using the Anthropic Messages API endpoint with both curl and the Claude API Python SDK, and grasping the concepts that make Anthropic's API genuinely different from OpenAI's Chat Completions format. Along the way you'll see what prompt caching, content blocks, tool use, and streaming look like in practice — and how a gateway that speaks the Messages API natively gives you flexibility without forcing you to rewrite your code.

Getting an Anthropic API key

Everything starts at console.anthropic.com. Create an account, verify your email, and navigate to API Keys in the left sidebar. Click Create Key, give it a descriptive name, and copy the value immediately — Anthropic only shows it once.

Store it as an environment variable. Never hardcode it in source files or commit it to version control.

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

Your key controls billing and rate limits. If it leaks, rotate it immediately from the console. Anthropic's free-tier credits let you experiment without adding a payment method, but check the current limits in the console — they change.

Authentication: headers you must send

Every request to the Anthropic API needs two headers:

x-api-key — your API key (not Authorization: Bearer like OpenAI; this is a distinct difference)
anthropic-version — a date string like 2023-06-01 that pins the API contract. Anthropic uses this to make breaking changes without breaking existing callers. Always send the latest stable version listed in the docs.

You also need Content-Type: application/json for any request that sends a body.

The Messages endpoint: your first curl request

The core endpoint is:

POST https://api.anthropic.com/v1/messages

Here is a minimal working request with curl:

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 512,
    "messages": [
      {
        "role": "user",
        "content": "Explain the difference between a mutex and a semaphore in two sentences."
      }
    ]
  }'

The response is a JSON object. The model's answer lives inside content, which is an array of content blocks — this is the first meaningful difference from OpenAI. A text block looks like:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "A mutex is a lock that only one thread can hold at a time, used to protect a shared resource from concurrent modification..."
    }
  ],
  "model": "claude-opus-4-5",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 28,
    "output_tokens": 61
  }
}

Note max_tokens is required in every Messages API request — unlike OpenAI where it is optional. Omitting it returns a validation error.

Claude API Python SDK example

The official anthropic Python package wraps the REST API cleanly. Install it:

pip install anthropic

Then the same request in Python:

import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

message = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Explain the difference between a mutex and a semaphore in two sentences."
        }
    ]
)

print(message.content[0].text)

The SDK handles the version header, retries on transient errors, and surfaces typed response objects — message.usage.input_tokens, message.stop_reason, etc. It reads ANTHROPIC_API_KEY from the environment automatically if you omit api_key.

Adding a system prompt

In the Anthropic Messages API, the system prompt is a top-level parameter, not a message in the array. This is one of the sharpest differences from OpenAI, where "role": "system" lives inside messages.

message = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=512,
    system="You are a terse technical writer. Answer in plain English, no jargon.",
    messages=[
        {"role": "user", "content": "What is a race condition?"}
    ]
)

Key concepts unique to the Anthropic Messages API

Content blocks

Both the request and response use content blocks instead of plain strings. A block has a type field — "text", "image", "tool_use", or "tool_result". This makes it straightforward to mix text and images in a single message or to embed tool invocations inline without a separate channel. When you send a simple string as content, the SDK and API accept it as shorthand and convert it internally — but the response always comes back as a typed array.

Tool use

Anthropic's function-calling equivalent is called tool use. You define tools with a JSON schema, Claude returns a tool_use content block when it wants to call one, and you reply with a tool_result block in the next user turn. The schema is similar to OpenAI's function-calling format but not identical — field names and nesting differ. Check the Anthropic tool use docs for the exact shape before you wire up an agent.

Streaming

Pass stream=True to get server-sent events. The SDK exposes a context manager for this:

with client.messages.stream(
    model="claude-opus-4-5",
    max_tokens=512,
    messages=[{"role": "user", "content": "Write a haiku about async IO."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Events include message_start, content_block_delta, and message_delta (which carries final usage). For raw SSE over HTTP, the delta.type field tells you whether you are receiving a text chunk or a tool-call argument fragment.

Prompt caching

Anthropic's prompt caching feature lets you mark a portion of your prompt with a cache_control block. When the same prefix is reused across requests, Anthropic charges a lower input token rate for the cached portion. This is valuable for large system prompts, long documents, or tool definitions that stay constant across many calls. The exact cache duration and pricing are in the Anthropic prompt caching docs — the pricing ratio changes over time, so verify before you build a cost model around it.

How Anthropic pricing and rate limits work (conceptually)

Anthropic charges per input token and per output token, with rates that vary by model. Smaller, faster models cost less than larger frontier ones. Output tokens are typically priced higher than input tokens because they require more compute per token generated.

Rate limits come in two dimensions: requests per minute (RPM) and tokens per minute (TPM). New accounts start at lower tiers and can request increases through the console. When you hit a limit you get a 429 response — build exponential backoff into any production client from day one.

Do not treat any number from a blog post (including this one) as current. Model prices and rate-limit tiers change. Always verify in the Anthropic models overview and the billing section of the console before committing to a cost estimate.

Anthropic Messages API vs OpenAI Chat Completions: the key differences

If you are coming from OpenAI, these are the places you will trip:

Authentication header — Anthropic uses x-api-key; OpenAI uses Authorization: Bearer.
Required versioning — Anthropic requires anthropic-version; OpenAI has no equivalent header.
System prompt — Anthropic's system is a top-level field; OpenAI's is a messages entry with "role": "system".
max_tokens — Required by Anthropic; optional by OpenAI.
Response shape — Anthropic returns content as a typed array of blocks; OpenAI returns choices[0].message.content as a string.
Tool call field names — Similar concept, different field names and nesting.

For a detailed side-by-side mapping of every field, see Anthropic vs OpenAI format.

Using a gateway that speaks the Anthropic Messages API

One practical implication of Anthropic's distinct format: if you write your codebase against the Messages API, you are locked to providers that speak it. Anthropic itself does, obviously. But if you want to route the same call to a cheaper or faster model behind a different provider — without maintaining parallel code paths — you need a layer in between.

A purpose-built LLM gateway like flo2 speaks the Anthropic Messages API natively. You point your existing anthropic client at flo2's endpoint by overriding the base URL, bring your own provider keys, and flo2 routes your request to the model you specify — or to the cheapest or fastest available option automatically. There is no token markup on top of the provider's price, and you do not need to rewrite your request format. Your Anthropic-style code keeps working; the routing layer handles the rest.

import anthropic
import os

# Point your Anthropic client at flo2 instead
client = anthropic.Anthropic(
    api_key=os.environ["FLO2_API_KEY"],
    base_url="https://api.flo2.com",
)

message = client.messages.create(
    model="claude-opus-4-5",   # or any model flo2 routes
    max_tokens=512,
    messages=[
        {"role": "user", "content": "What is prompt caching?"}
    ]
)

print(message.content[0].text)

One line change to base_url and api_key. Everything else is identical to calling Anthropic directly. To understand why that layer is worth adding even for teams that only use Claude today, see what is an LLM gateway.

flo2 is free during its beta period — bring your own provider keys, pay providers directly at their published rates, and get routing, fallback, and observability without a per-token surcharge on top.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →