2026-06-03 · flo2 blog

LLM Cost Calculator: Estimate Your Token Spend Accurately

Before you commit to a model, a provider, or an architecture, you need a reliable LLM cost calculator — a mental (or scripted) model that tells you what a single API call costs, what a day of traffic costs, and what happens to the bill when usage doubles. This guide walks through the core formula, a worked example, how to estimate tokens before you run anything, the factors most developers undercount, and how to close the loop by measuring actual cost from logs rather than estimates.

The formula: how to calculate LLM cost

Every provider quotes prices per million tokens, split into (at least) two rates: one for input tokens and one for output tokens. Output tokens are almost always more expensive — frequently 3–5× higher — because generating tokens is sequential and hardware-intensive, while processing input can be parallelised. See input vs output tokens for a deeper look at why the asymmetry exists.

The baseline formula for a single call is:

# All prices are per million tokens (MTok)
# Obtain current rates from /llm-pricing or each provider's pricing page

call_cost = (input_tokens  / 1_000_000) * input_price_per_mtok
          + (output_tokens / 1_000_000) * output_price_per_mtok

# If the provider supports prompt caching and you have a cache hit,
# replace the cached portion of input_tokens with the discounted rate:
call_cost_with_cache =
    (uncached_input_tokens / 1_000_000) * input_price_per_mtok
  + (cached_input_tokens  / 1_000_000) * cached_input_price_per_mtok
  + (output_tokens        / 1_000_000) * output_price_per_mtok

That is the complete calculation. Everything else in LLM cost estimation is either plugging in the right numbers or accounting for the factors that inflate those numbers beyond your first naive estimate.

Worked illustrative example

The numbers below are invented for illustration. Real prices vary by model, provider, and date — always verify current rates before budgeting.

Illustrative input price: $1.00 / MTok (uncached)
Illustrative cached input price: $0.25 / MTok (75% discount)
Illustrative output price: $4.00 / MTok
Request shape: 800 uncached input tokens, 400 cached input tokens, 350 output tokens

uncached_cost = (800   / 1_000_000) * 1.00  = $0.000800
cached_cost   = (400   / 1_000_000) * 0.25  = $0.000100
output_cost   = (350   / 1_000_000) * 4.00  = $0.001400

call_cost     = $0.000800 + $0.000100 + $0.001400 = $0.002300

The output side — 350 tokens — contributes 61 % of the total cost despite being only a quarter of the total token count. That ratio is typical, and it's why optimising output length is usually a bigger lever than shrinking input length. See AI tokenomics for a framework built around this asymmetry.

How to estimate token counts before you run anything

You rarely know exact token counts at design time. Two practical approaches:

The characters-÷-4 heuristic

For English prose, one token is roughly 3–4 characters. Dividing character count by 4 gives a workable ballpark. Code, JSON, and non-Latin scripts tokenise less efficiently — assume 2–3 characters per token for code-heavy prompts.

# Quick estimate (English prose)
estimated_tokens ≈ len(text_in_characters) / 4

# More conservative for code / structured data
estimated_tokens ≈ len(text_in_characters) / 3

This is fast and good enough for napkin math. For budget-sensitive decisions, use a proper tokenizer.

Using a tokenizer

Most major providers publish the tokenizer they use. OpenAI-compatible models typically use a BPE tokenizer available via the tiktoken library. Running your representative prompts through the tokenizer before launch gives you exact counts and eliminates guesswork from your cost estimates.

Scaling to monthly cost

Once you have a per-call cost, projecting to monthly spend is straightforward:

daily_cost   = requests_per_day * call_cost
monthly_cost = daily_cost * 30

# Or, using token totals directly:
monthly_input_tokens  = requests_per_day * 30 * avg_input_tokens
monthly_output_tokens = requests_per_day * 30 * avg_output_tokens

monthly_cost = (monthly_input_tokens  / 1_000_000) * input_price
             + (monthly_output_tokens / 1_000_000) * output_price

At 5,000 requests per day with the illustrative $0.0023 per call above, daily cost is $11.50 and monthly cost is roughly $345. Double the output length — say, a feature that now asks for structured JSON reasoning instead of a single answer — and the monthly bill approximately doubles, because output tokens dominate. Planning for that inflection point before it hits is exactly what cost modelling is for.

The factors most developers undercount

Initial estimates are almost always optimistic. Here are the costs that inflate real bills beyond the naive calculation:

System prompts and few-shot overhead. A 600-token system prompt appears on every single request. At 5,000 requests per day that is 3 million input tokens per day from the system prompt alone — before the user says a word. If you are not using prompt caching, this is pure waste.
Conversation history / long context. Chat applications that send the full message history on every turn have input token counts that grow linearly with conversation length. A 10-turn conversation at 200 tokens per turn adds 2,000 input tokens compared to the first turn. Budget for the average, not the minimum.
Output length variance. Your average output token count may be 300, but your 95th-percentile call might produce 1,200 tokens. If you are billing customers a flat rate, the long tail pays out of your margin. Model the distribution, not just the mean.
Retries. A retry strategy that fires on 5xx errors or timeouts doubles your cost on affected calls. If 3% of calls retry once, your effective token consumption rises by 3%. If you also retry on bad output (JSON parse failures, failed evals), add that rate too. Tools like exponential backoff reduce retry frequency but do not eliminate the cost.
Streaming and abandoned requests. A user who closes the tab mid-stream has still incurred output token costs up to the point of disconnection, because the model generated those tokens. Some providers bill all output tokens generated; others bill only tokens delivered. Know which you're on.
Tool calls and multi-step agents. An agent loop that makes four LLM calls per user task has four times the token cost of a single call — plus the overhead of serialising tool results back into the context, which adds input tokens on every step. Agent costs compound quickly.

Estimates vs. actuals: closing the loop with per-call logs

Estimates get you to launch. Logs tell you what is actually happening. Every provider API response includes a usage object with exact token counts for that call:

# Typical usage object in an OpenAI-compatible response
{
  "usage": {
    "prompt_tokens": 1247,
    "completion_tokens": 318,
    "prompt_tokens_details": {
      "cached_tokens": 400
    }
  }
}

# Compute actual cost per call server-side
actual_cost = (
  (usage.prompt_tokens - cached_tokens) / 1_000_000 * input_price
  + cached_tokens                        / 1_000_000 * cached_input_price
  + usage.completion_tokens              / 1_000_000 * output_price
)

If you log this on every request, you get a real-time view of cost by model, by endpoint, by user, or by feature. If you do not log it, you are estimating forever — and your estimates will drift as prompts change, users grow, and conversation histories lengthen.

This is where per-call cost accounting pays for itself. flo2 logs token counts and computes the dollar cost on every proxied call, so you see your actual spend without wiring up your own logging pipeline. Because flo2 charges zero token markup and lets you bring your own provider keys, the cost you see in the logs is the cost you actually pay — not a marked-up figure that hides the delta. Check the LLM pricing page for current rates across providers, and compare them against your token logs to verify your estimates at any time.

Putting it together: a practical checklist

Identify your request shape: average input tokens, average cached input tokens, average output tokens, plus the p95 for output.
Apply the formula above with current prices from your provider's pricing page.
Multiply by daily request volume and 30 to get a monthly figure.
Add a retry multiplier (even a conservative 1.03× matters at scale).
Add system-prompt token cost separately — it is often the single largest overlooked line item.
Ship with per-call token logging on day one so estimates give way to actuals as quickly as possible.
Revisit the model after every meaningful prompt change or traffic milestone.

A cost calculator is only as good as the inputs you feed it. Start with estimates, move to actuals, and let the gap between them drive your next optimisation. That loop — estimate, measure, compare, improve — is the core discipline of building on LLMs without cost surprises.

If you want the measuring step handled for you out of the box, flo2 does per-call cost accounting across providers with no token markup and no locked-in pricing. Free during Beta.

One key, every model — zero markup.

Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.

Get your flo2 key →