Dev Hub Solutions

Product studio

Get in touch
AILive

Token Counter

Estimate tokens and cost for any LLM prompt.

Characters

183

Words

30

Input tokens

~43

Per-call cost estimate

Token counts are approximate (~4 chars/token). Prices reflect public list pricing and may shift over time.

ModelInput $/1MOutput $/1MEst. cost
GPT-4o$5.00$15.00$0.007715
GPT-4o mini$0.15$0.60$0.000306
GPT-4 Turbo$10.00$30.00$0.015430
Claude 3.7 Sonnet$3.00$15.00$0.007629
Claude 3.5 Haiku$0.80$4.00$0.002034
Claude 3 Opus$15.00$75.00$0.038145
Gemini 1.5 Pro$1.25$5.00$0.002554
Gemini 1.5 Flash$0.07$0.30$0.000153

About Token Counter

An approximation-based token counter. Uses ~4-char-per-token heuristic (close to BPE tokenizers in practice) and a hand-maintained price table to estimate per-call cost across major LLMs.

Why we built it

We were tab-switching between five different token counters to compare costs. So we built one with all the models in one place.

How to use

  1. 1Paste your prompt into the editor.
  2. 2Watch token estimates update across every major model.
  3. 3See per-call cost in USD based on current pricing.

Every modern LLM API charges by the token. Context windows are measured in tokens. Rate limits are measured in tokens per minute. Caching savings are measured in cached input tokens. The token is the unit you actually pay for and the unit you actually run out of — and yet most people guess. This counter gives you an instant, per-model token estimate plus per-call cost so you can budget your prompts before you ship them.

Tokenizers aren't interchangeable

GPT-4o uses `o200k_base` (200,000-token vocabulary). Claude uses a custom BPE tokenizer that closely tracks `cl100k_base`. Gemini uses SentencePiece. Llama 3 uses Tiktoken-style BPE on a 128k vocabulary. The result: the same string can tokenize to 12 tokens in GPT-4o, 14 in Claude Sonnet, and 13 in Gemini. For English prose the spread is small — usually within 10%. For code, JSON, non-English text, or emoji-heavy strings, the spread widens. This counter approximates all of them with a ~4-char-per-token heuristic that lands within roughly 5–10% of the true count for typical text.

Where the cost actually comes from

For most LLM workloads, output tokens cost 2–5× more than input tokens. A 10K-token prompt that produces a 500-token answer often spends more on the 500 output tokens than the 10K input tokens. This counter shows input cost, projected output cost (you set the expected output length), and total — so the number you read is the number you'll actually be billed. Prompt caching, where available (Claude, OpenAI), pushes cached input tokens to ~10% of normal cost. We don't apply caching automatically, but we surface the savings so you can decide.

Frequently asked questions

Quick answers to the questions people actually ask about Token Counter.

Is this an exact token count or an estimate?

An estimate. Different models use different tokenizers — GPT-4o uses tiktoken's `o200k_base`, Claude uses its own BPE variant, Gemini uses SentencePiece. Running each true tokenizer in a browser would mean shipping 5–10 MB of vocabulary files. Instead, we use a ~4-character-per-token approximation, which is accurate to within 5–10% for English text. For exact counts, use the model's official tokenizer.

How is the cost calculated?

We multiply estimated input tokens by each model's published per-token input price. Output cost is shown separately because output tokens are usually 2–4× more expensive than input. Prices are hand-maintained from the model providers' public pricing pages and refreshed regularly. Always sanity-check against the provider's pricing page for production budgeting.

Which models are included?

Current OpenAI (GPT-4o, GPT-4 Turbo, GPT-4o mini, o-series reasoning models), Anthropic Claude (Haiku 4.5, Sonnet 4.6, Opus 4.7), Google (Gemini 2.5 Pro, Gemini 2.5 Flash), and open-weight reference points for Llama and DeepSeek. Updated as new models ship.

Why does the same text use different token counts on each model?

Each model's tokenizer was trained on a slightly different corpus and has a different vocabulary size. GPT-4o's `o200k_base` has 200,000 tokens; older Claude tokenizers had ~100,000. Common English words usually tokenize to 1 token everywhere. Code, non-English text, and unusual symbols vary the most — that's where the per-model differences appear.

Will it tell me if I'll exceed the context window?

Yes — for each model, the counter shows current tokens vs. context window. Context windows in 2026 range from 200K (Claude Sonnet 4.6 default) to 1M (Claude Opus 4.7 1M, Gemini 2.5 Pro). If your prompt's estimated count exceeds the window, you'll need to truncate, summarise, or use prompt caching to fit.

Does it count system prompts and tool definitions?

Whatever you paste, it counts. If you want a real budget for a Claude or GPT call, paste the system prompt, the user message, and any tool schema JSON — all three count against your context window. Tool definitions can balloon your prompt fast if you're not careful.