What is a token?
A token is the unit large language models read and bill in — not a word, not a character. It's a chunk of text produced by the model's tokenizer, often a whole short word, a word-piece, or a single punctuation mark. As a rough rule of thumb, 1 token ≈ 4 characters of English ≈ ¾ of a word, but the real number depends entirely on the text: code, emoji, and non-English scripts tokenize very differently.
This tool counts tokens the way the models actually do, using OpenAI's openly-published tokenizers, then turns that count into the two answers you really need: what it costs and whether it fits.
Why count tokens?
Two reasons, and they're the whole job of this tool:
- Cost. Every API charges per token — separately for the prompt you send (input) and the reply you get back (output). A prompt that looks small can get expensive at scale. Counting first means no surprise bills.
- Context limits. Every model has a maximum context window — the total tokens it can consider at once (input plus the reply). Go over it and the request fails or silently truncates. Counting first means no cut-off prompts.
How to read the cost estimate
- Input cost is your prompt's tokens × the model's input price.
- Total cost adds the reply: it assumes the model generates the number of output tokens you reserve with the "Reserve room for the reply" buttons. Set that to your realistic expected answer length.
- Prices are listed per million tokens by every provider; the tool does the division for you.
Costs are shown in USD because that's how the providers publish them.
Context fit
The Context fit column shows whether your input — plus the reply you reserved room for — stays inside each model's window. Green means it fits, with the remaining headroom shown. Red means it exceeds the window, with the overflow amount. A bigger window (Gemini's 1–2M tokens) isn't always cheaper, so weigh fit against the cost columns.
Why some counts are "estimate"
OpenAI publishes its exact tokenizer, so GPT-4o, GPT-4 Turbo, and GPT-3.5 counts are exact. Anthropic and Google don't ship a public JavaScript tokenizer, so their rows use the closest available encoding as a proxy and are clearly labelled estimate. They're close enough to plan around, but treat them as approximate — never as a billing guarantee. Showing you which is which is the honest thing to do.
FAQ
How accurate is the token count? For the OpenAI models it's exact — it uses the same BPE tokenizer the API uses. For Claude and Gemini it's a labelled estimate, because those providers don't publish a browser tokenizer.
Does my text get uploaded? No. Counting happens entirely in your browser. Nothing is sent to a server or to any model — this tool never calls an API.
Which models are supported? GPT-4o, GPT-4o mini, GPT-4 Turbo, GPT-3.5 Turbo, Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku, Gemini 1.5 Pro, and Gemini 1.5 Flash.
Why might the price be wrong? Model prices change. Every result is stamped with a "pricing as of" date — if it's old, double-check the provider's current pricing page before relying on the numbers.
Is anything stored on my device? No. The tool keeps no history and writes nothing to local storage.