2026-06-03 · flo2 blog

LLM Function Calling (Tool Use): How It Works, With Examples

LLM function calling (also called tool calling or tool use) lets a language model request that your application run a specific piece of code on its behalf — then use the result to form its final answer. Instead of the model hallucinating a database value or fabricating an API response, it stops, asks your code to fetch the real data, and continues only once it has it. If you are building agents, retrieval pipelines, or any application that needs a model to interact with external systems, understanding the function calling flow is non-negotiable.

What function calling actually is

The term is slightly misleading: the model does not call a function. It returns a structured request describing which function it wants called and with what arguments. Your application receives that request, runs the actual code, and sends the result back to the model in a follow-up message. The model then produces its final reply using the real output.

You describe available tools to the model using JSON Schema. Each tool definition includes a name, a description the model uses to decide when to invoke it, and a parameters object in JSON Schema format that specifies what arguments the model should supply. The richer and more precise your descriptions, the more reliably the model picks the right tool with the right arguments.

The request/response flow, step by step

The full flow has four stages:

Send tools definition + user message. Your request includes a tools array alongside the normal messages array.
Model returns a tool call. Instead of (or in addition to) a text reply, the response contains a tool_calls array specifying the function name and the JSON-encoded arguments.
You execute the function and return the result. Append the assistant's tool-call message to the conversation, then add a tool role message containing the result.
Model produces the final answer. Given the real result, the model generates a natural-language reply or triggers another tool call.

Here is a minimal Python example using the OpenAI SDK that fetches a live weather reading:

import json
from openai import OpenAI

client = OpenAI(base_url="https://api.flo2.com/v1", api_key="YOUR_FLO2_KEY")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Return current temperature and conditions for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'San Francisco'",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit.",
                    },
                },
                "required": ["city"],
            },
        },
    }
]

messages = [{"role": "user", "content": "What's the weather in Tokyo right now?"}]

# --- Stage 1 & 2: model decides to call the tool ---
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

assistant_msg = response.choices[0].message
messages.append(assistant_msg)  # keep the tool-call in the history

# --- Stage 3: run the actual function ---
for tool_call in assistant_msg.tool_calls or []:
    args = json.loads(tool_call.function.arguments)

    # Your real implementation goes here
    result = {"temperature": 18, "unit": "celsius", "conditions": "Partly cloudy"}

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result),
    })

# --- Stage 4: model generates final answer ---
final = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
)

print(final.choices[0].message.content)
# → "It's currently 18 °C and partly cloudy in Tokyo."

OpenAI tools vs Anthropic tool_use: key differences

Both providers support the concept, but the wire format differs. Knowing the differences matters if you call the native APIs directly; if you use an OpenAI-compatible API gateway, these differences are normalized for you.

OpenAI format

Tools are defined in a tools array; each entry has "type": "function" and a nested "function" object with name, description, and parameters.
The model's tool invocation appears in choices[0].message.tool_calls, an array of objects with id, type, and function.name / function.arguments (a JSON string).
Tool results go back as a message with "role": "tool" and a matching tool_call_id.
finish_reason is "tool_calls" when the model wants to invoke a tool.

Anthropic format

Tools are defined in a top-level tools array with name, description, and input_schema (JSON Schema).
The model's tool invocation is a content block of "type": "tool_use" inside the content array, carrying id, name, and input (already a parsed object, not a JSON string).
Tool results go back in a user message as a content block with "type": "tool_result" and matching tool_use_id.
stop_reason is "tool_use".

The conceptual flow is identical; only the field names and nesting differ. An LLM gateway that passes tool parameters through to whichever underlying model you're routing to — and normalizes the response back to the OpenAI shape — means you write the function-calling logic once and swap models without touching application code.

Best practices for production function calling

Write clear, specific schemas

The model reads your description fields to decide whether and how to call a tool. Vague descriptions produce wrong invocations. Be explicit about units, formats, and what the function returns. If a parameter expects an ISO 8601 date string, say so. If a city name should include the country code for disambiguation, say so. Treat the description as documentation for a developer who has never seen your codebase — because from the model's perspective, that's exactly what it is.

Validate arguments before executing

The model's output is probabilistic. Even with a well-defined schema, arguments may be missing, mistyped, or semantically wrong. Use a JSON Schema validator (e.g., ajv in Node.js, jsonschema in Python) before passing arguments to real business logic. Return a descriptive error as the tool result when validation fails — the model can then self-correct and call the tool again with better arguments, instead of your application throwing an uncaught exception.

Handle parallel tool calls

OpenAI models may return multiple tool calls in a single response when they can be executed independently. Always iterate over the full tool_calls array rather than assuming a single call. Run independent calls in parallel to minimize latency, then append all tool result messages before the next completion request.

Handle the no-tool-call case

When tool_choice is "auto", the model decides whether to call a tool at all. Your loop must check whether assistant_msg.tool_calls is non-empty before attempting to iterate. If the model answered directly, skip the tool execution stage and use the response as-is.

Set an iteration cap

Agents that loop — call tool, get result, call another tool — can run indefinitely if something goes wrong. Always enforce a maximum number of iterations (typically 5–15 depending on task complexity). If the limit is reached, surface an error or partial result rather than burning tokens in an infinite loop.

Common use cases

Understanding function calling unlocks a wide category of applications that would otherwise require brittle prompt engineering or post-processing hacks:

Retrieval-augmented generation (RAG). Instead of pre-emptively stuffing context, give the model a search_documents tool and let it decide when retrieval is needed and what to query for.
Database and API actions. A customer support agent can call lookup_order, issue_refund, or update_shipping_address with arguments the model extracts from the conversation — no brittle regex parsing required. See also LLM JSON mode for cases where you need structured output without a full tool-calling loop.
Multi-step agents. Complex tasks decompose naturally into tool sequences: search for context, read a file, write a summary, post a result. Each step is a real function call with a real result, not a hallucinated chain of thought.
Code execution and sandboxed computation. Pass a run_python tool pointing at a sandbox and let the model write and execute code to answer quantitative questions.
Structured data extraction. Define a schema as a "tool" with no side effects, set tool_choice to force the model to call it, and get perfectly structured output — essentially a more reliable alternative to JSON mode for complex schemas.

How a gateway handles tool params across providers

One of the less-discussed benefits of routing through a gateway is transparent tool-parameter passthrough. When you send a request with a tools array to a gateway like flo2, it translates the OpenAI-format tool definition into the correct native format for whichever model you've routed to — GPT-4o, Claude, Gemini, Llama, or others — and translates the response back. You write your tool definitions once, in the OpenAI schema you already know, and they work across every supported model without modification.

This matters in practice because the models that are best at function calling change frequently. A new Anthropic release may outperform GPT-4o on tool-use benchmarks one month, then the ranking flips. With a gateway, you change a model string in your router config and the rest of your application — tool definitions, execution logic, result parsing — stays exactly the same.

If you need zero token markup and want to bring your own provider API keys while routing tool-calling requests across many models behind one endpoint, take a look at flo2 — it's free during beta and ships an OpenAI-compatible API that passes tools, tool_choice, and parallel tool calls through to whichever underlying model you select.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →