2026-06-03 · flo2 blog

LLM Cost Calculator: Estimate Your Token Spend Accurately

Before you commit to a model, a provider, or an architecture, you need a reliable LLM cost calculator — a mental (or scripted) model that tells you what a single API call costs, what a day of traffic costs, and what happens to the bill when usage doubles. This guide walks through the core formula, a worked example, how to estimate tokens before you run anything, the factors most developers undercount, and how to close the loop by measuring actual cost from logs rather than estimates.

The formula: how to calculate LLM cost

Every provider quotes prices per million tokens, split into (at least) two rates: one for input tokens and one for output tokens. Output tokens are almost always more expensive — frequently 3–5× higher — because generating tokens is sequential and hardware-intensive, while processing input can be parallelised. See input vs output tokens for a deeper look at why the asymmetry exists.

The baseline formula for a single call is:

# All prices are per million tokens (MTok)
# Obtain current rates from /llm-pricing or each provider's pricing page

call_cost = (input_tokens  / 1_000_000) * input_price_per_mtok
          + (output_tokens / 1_000_000) * output_price_per_mtok

# If the provider supports prompt caching and you have a cache hit,
# replace the cached portion of input_tokens with the discounted rate:
call_cost_with_cache =
    (uncached_input_tokens / 1_000_000) * input_price_per_mtok
  + (cached_input_tokens  / 1_000_000) * cached_input_price_per_mtok
  + (output_tokens        / 1_000_000) * output_price_per_mtok

That is the complete calculation. Everything else in LLM cost estimation is either plugging in the right numbers or accounting for the factors that inflate those numbers beyond your first naive estimate.

Worked illustrative example

The numbers below are invented for illustration. Real prices vary by model, provider, and date — always verify current rates before budgeting.

uncached_cost = (800   / 1_000_000) * 1.00  = $0.000800
cached_cost   = (400   / 1_000_000) * 0.25  = $0.000100
output_cost   = (350   / 1_000_000) * 4.00  = $0.001400

call_cost     = $0.000800 + $0.000100 + $0.001400 = $0.002300

The output side — 350 tokens — contributes 61 % of the total cost despite being only a quarter of the total token count. That ratio is typical, and it's why optimising output length is usually a bigger lever than shrinking input length. See AI tokenomics for a framework built around this asymmetry.

How to estimate token counts before you run anything

You rarely know exact token counts at design time. Two practical approaches:

The characters-÷-4 heuristic

For English prose, one token is roughly 3–4 characters. Dividing character count by 4 gives a workable ballpark. Code, JSON, and non-Latin scripts tokenise less efficiently — assume 2–3 characters per token for code-heavy prompts.

# Quick estimate (English prose)
estimated_tokens ≈ len(text_in_characters) / 4

# More conservative for code / structured data
estimated_tokens ≈ len(text_in_characters) / 3

This is fast and good enough for napkin math. For budget-sensitive decisions, use a proper tokenizer.

Using a tokenizer

Most major providers publish the tokenizer they use. OpenAI-compatible models typically use a BPE tokenizer available via the tiktoken library. Running your representative prompts through the tokenizer before launch gives you exact counts and eliminates guesswork from your cost estimates.

Scaling to monthly cost

Once you have a per-call cost, projecting to monthly spend is straightforward:

daily_cost   = requests_per_day * call_cost
monthly_cost = daily_cost * 30

# Or, using token totals directly:
monthly_input_tokens  = requests_per_day * 30 * avg_input_tokens
monthly_output_tokens = requests_per_day * 30 * avg_output_tokens

monthly_cost = (monthly_input_tokens  / 1_000_000) * input_price
             + (monthly_output_tokens / 1_000_000) * output_price

At 5,000 requests per day with the illustrative $0.0023 per call above, daily cost is $11.50 and monthly cost is roughly $345. Double the output length — say, a feature that now asks for structured JSON reasoning instead of a single answer — and the monthly bill approximately doubles, because output tokens dominate. Planning for that inflection point before it hits is exactly what cost modelling is for.

The factors most developers undercount

Initial estimates are almost always optimistic. Here are the costs that inflate real bills beyond the naive calculation:

Estimates vs. actuals: closing the loop with per-call logs

Estimates get you to launch. Logs tell you what is actually happening. Every provider API response includes a usage object with exact token counts for that call:

# Typical usage object in an OpenAI-compatible response
{
  "usage": {
    "prompt_tokens": 1247,
    "completion_tokens": 318,
    "prompt_tokens_details": {
      "cached_tokens": 400
    }
  }
}

# Compute actual cost per call server-side
actual_cost = (
  (usage.prompt_tokens - cached_tokens) / 1_000_000 * input_price
  + cached_tokens                        / 1_000_000 * cached_input_price
  + usage.completion_tokens              / 1_000_000 * output_price
)

If you log this on every request, you get a real-time view of cost by model, by endpoint, by user, or by feature. If you do not log it, you are estimating forever — and your estimates will drift as prompts change, users grow, and conversation histories lengthen.

This is where per-call cost accounting pays for itself. flo2 logs token counts and computes the dollar cost on every proxied call, so you see your actual spend without wiring up your own logging pipeline. Because flo2 charges zero token markup and lets you bring your own provider keys, the cost you see in the logs is the cost you actually pay — not a marked-up figure that hides the delta. Check the LLM pricing page for current rates across providers, and compare them against your token logs to verify your estimates at any time.

Putting it together: a practical checklist

A cost calculator is only as good as the inputs you feed it. Start with estimates, move to actuals, and let the gap between them drive your next optimisation. That loop — estimate, measure, compare, improve — is the core discipline of building on LLMs without cost surprises.

If you want the measuring step handled for you out of the box, flo2 does per-call cost accounting across providers with no token markup and no locked-in pricing. Free during Beta.

One key, every model — zero markup.
Bring your own provider keys. flo2 routes to the cheapest, fastest model with fallback, racing and true cost accounting — free during Beta.
Get your flo2 key →
© 2026 flo2.com — the zero-markup LLM gateway & router. flow → to