2026-06-03 · flo2 blog

System Prompts: How to Write Them (With Examples)

A system prompt is the first thing an LLM reads before any user message arrives. It sets the model's role, rules, tone, and constraints — and getting it right is the single highest-leverage thing you can do to make your LLM application reliable. This guide covers what belongs in a system prompt, concrete system prompt examples, best practices, and the mistakes that silently degrade quality in production.

What Is a System Prompt?

A system prompt (also called a system message) is a block of text you supply to the model that sits outside the conversation turn structure. It is not part of the user's message and not part of the assistant's reply — it is a privileged channel for developer-controlled instructions.

The exact API surface differs by provider:

OpenAI calls it the system message and delivers it as the first element in the messages array with "role": "system".
Anthropic delivers it as a top-level system parameter, separate from the messages array entirely.

Both serve the same purpose: give the model stable, high-authority context before the first user turn. When you route through an OpenAI-compatible gateway such as an LLM gateway, both formats resolve to the same underlying concept regardless of which provider handles the request.

What to Put in a System Prompt

A well-structured system prompt covers six areas. You do not need all six in every app — cut what does not apply — but having a mental checklist prevents gaps that lead to unpredictable behavior.

1. Role and persona

Tell the model who it is. A concise sentence or two is enough. "You are a senior support engineer at Acme Corp" is better than "You are a helpful assistant" because it constrains tone, vocabulary, and the level of technical detail the model should assume.

2. Task definition

State what the model is supposed to do in this application. Is it answering questions? Drafting emails? Classifying text? Summarizing documents? An explicit task statement stops the model from trying to do everything at once.

3. Format and output rules

Specify the shape of the response: Markdown or plain text, JSON schema, maximum length, language. If downstream code parses the output, this section is not optional.

4. Do's and don'ts

Enumerate explicit permissions and prohibitions. "Always cite the source URL when referencing documentation" or "Never speculate about pricing — direct the user to the pricing page." Explicit rules outperform vague guidance like "be accurate."

5. Tools and context

If the model has access to function calls, retrieval results, or a knowledge cutoff, say so. Tell the model what it has access to so it does not hallucinate capabilities or fail to use the ones it has.

6. Refusals and escalation paths

Define what the model should do when it cannot help: hand off to a human, return a specific error message, or ask a clarifying question. Undefined failure modes produce inconsistent user experiences.

A Concrete System Prompt Example

Here is a system prompt for a developer-facing documentation assistant. It is intentionally readable — not a wall of text:

You are a technical documentation assistant for the Acme SDK.

**Role**: Answer developer questions about the SDK with precise, example-driven responses.
**Audience**: Software engineers integrating the SDK into production applications.

**Output rules**:
- Use Markdown. Include code blocks for every example.
- Keep answers under 400 words unless the topic genuinely requires more.
- Always specify the programming language in fenced code blocks.

**Permissions**:
- Reference the SDK changelog and migration guides when relevant.
- Suggest related documentation pages at the end of your answer.

**Prohibitions**:
- Do not speculate about unreleased features.
- Do not provide legal or security compliance advice — direct users to the security team.

**When you cannot answer**: Reply with "I don't have reliable information on this yet. Please open a support ticket at support.acme.com."

Notice the structure: role first, then audience, then output format, then explicit do's and don'ts, then a defined fallback. Each section is short. The model does not have to infer intent from ambiguity.

System Prompt Best Practices

Be specific, not aspirational

"Be helpful and accurate" is not a constraint — every model already tries to do that. Replace vague adjectives with concrete rules. "If the user asks a question your context does not answer, say so explicitly rather than guessing" is actionable. "Be accurate" is not.

Put stable content first to benefit from prompt caching

Providers including Anthropic and OpenAI cache the computed representations of prompt prefixes. When the start of your prompt is identical across requests, the provider skips re-processing that prefix — cutting both cost and latency. This means your role definition, task description, and standing rules should come before any dynamic content (retrieved documents, user-specific context, timestamps). Stable content at the top, dynamic content at the bottom. Read more about how this works in prompt caching.

Keep it concise

A 5,000-token system prompt is not better than a 500-token one by default. Longer prompts dilute the signal of the most important instructions. Every sentence should earn its place. If you find yourself writing the same constraint three different ways, pick the clearest version and delete the other two.

Test variations systematically

The system prompt is a configuration parameter. Treat it like one: version-control it, evaluate outputs against a fixed set of test cases when you change it, and track regressions. Small wording changes can shift model behavior significantly — "do not" versus "avoid" versus "never" are not equivalent to a model.

Use the right field for the right content

Instructions belong in the system prompt. Few-shot examples usually belong there too, especially if they are stable. Conversation history belongs in the messages array. Retrieved documents that change per request belong at the end of the system prompt or in an early user turn — after the stable instructions, so caching still captures the stable prefix.

Common System Prompt Mistakes

Contradictory instructions. "Be concise" in one sentence and "always explain your reasoning step by step" three lines later. The model will resolve the contradiction inconsistently. Pick one.
Putting dynamic content before stable content. If a timestamp or user name appears at the top of your system prompt, every request looks like a cache miss. Move it down.
No defined fallback. When the model does not know the answer and has no instruction for that case, it improvises. Define the fallback.
Assuming the model infers implicit rules. If you need the model to respond only in Spanish, say so explicitly. Implicit expectations become support tickets.
Never testing the system prompt in isolation. Run your system prompt against adversarial inputs — users asking off-topic questions, requests to ignore instructions, edge-case phrasings. Production traffic will find every gap.

Stable System Prompts and Cost Efficiency

One practical consequence of getting system prompt structure right is real cost reduction. When your system prompt is stable across requests — same role definition, same rules, same few-shot examples — the provider caches those tokens. On Anthropic's API, cached input tokens are billed at a significantly lower rate than fresh input tokens; OpenAI's prompt caching applies automatically to eligible requests. A 1,000-token stable system prompt called ten thousand times per day accumulates meaningful savings without any code change beyond ordering your content correctly.

If you are routing across multiple providers through a single endpoint, an LLM gateway ensures your prompt structure reaches each provider correctly — including passing the system parameter in the Anthropic-native format and the messages[0].role = "system" format for OpenAI-compatible endpoints — so you capture caching benefits regardless of which model handles the request.

Start writing better system prompts today, and route them through flo2 to get zero-markup access to every major provider with a single API key — keeping the prompt caching savings for yourself.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →