AIDeveloperProductivity

How AI Reads Your Text: Tokens, Costs, and Context Windows Explained

Language models do not read words — they read tokens. Understanding tokens is the key to predicting what an AI request will cost and whether your prompt will even fit. Here is how it works, in plain English.

Mahdi MoradiJune 4, 20267 min read

Photo by Johnny Briggs on Unsplash

When you send a prompt to ChatGPT, Claude, or Gemini, the model never sees your words the way you wrote them. Before anything happens, your text is chopped into small pieces called tokens — and almost everything that matters about working with AI, from the bill you pay to whether your prompt is accepted at all, is measured in those tokens. Once you understand them, a lot of confusing AI behaviour suddenly makes sense.

What exactly is a token?

A token is a chunk of text the model treats as a single unit. It is usually not a whole word and not a single letter — it sits somewhere in between. Common short words like "the" or "and" are one token each. Longer or rarer words get split into pieces: "tokenization" might become "token" + "ization". Spaces, punctuation, and line breaks count too.

A useful rule of thumb

For everyday English, 1 token is roughly 4 characters, or about ¾ of a word. So 1,000 tokens is around 750 words. It is only an average — the real count depends entirely on what you write.

That "depends on what you write" part is important. Code, with its brackets and symbols, tokenizes far less efficiently than prose. Emoji can cost several tokens each. And languages that do not use the Latin alphabet — Chinese, Japanese, Arabic — often use noticeably more tokens per character than English, because the tokenizer was trained mostly on English-heavy data.

[Image: typewriter keys close up with paper]

Code and symbols tokenize very differently from plain prose.

Why tokens decide your bill

Every major AI API charges by the token, and it charges twice: once for the tokens you send (the input, or prompt) and again for the tokens the model generates back (the output, or completion). Output tokens are almost always more expensive than input tokens — often three to five times more — because generating text is the costly part.

Providers publish prices per million tokens, which can make small numbers look harmless. But costs scale with usage. A prompt that costs a fraction of a cent once becomes a real number when you run it across thousands of documents, or stuff a long history into every request. This is exactly why estimating before you build is worth the thirty seconds it takes.

See the cost before you spend it

ZipTools' Token Counter shows the token count, estimated input cost, and total cost across GPT-4o, Claude, and Gemini as you type — all in your browser, with nothing uploaded. Paste a prompt and you will know what it costs before you ever call an API.

The context window: will it even fit?

Every model has a context window — the maximum number of tokens it can hold in mind at once. Crucially, that budget covers both your input and the reply. If your prompt is 120,000 tokens and the window is 128,000, you have only 8,000 tokens left for the answer. Ask for more and the request fails or, worse, quietly truncates and gives you a half-formed response.

GPT-4o family: 128,000 tokens
Claude 3 family: 200,000 tokens
Gemini 1.5 Pro: up to 2,000,000 tokens
GPT-3.5 Turbo: 16,385 tokens — small, and easy to overflow with long documents

A bigger window is not automatically better. Larger context often costs more, and stuffing a window full of marginally-relevant text can actually make answers worse. The practical move is to check that your input plus a realistic reply comfortably fits — with headroom to spare.

Why token counts sometimes disagree

Paste the same sentence into two different counters and you may get two different numbers. That is not a bug — different model families use different tokenizers. OpenAI publishes its tokenizers openly, so a good counter can give you an exact GPT-4o or GPT-4 count. Anthropic and Google do not ship a public browser tokenizer, so any count for Claude or Gemini is a close approximation, not a guarantee.

Beware confidently-wrong tools

A counter that presents an approximate Claude or Gemini count as if it were exact is misleading you. Trust the ones that label estimates honestly — and that stamp their prices with a date, because model pricing changes regularly.

Practical habits that save tokens (and money)

Trim the obvious fat — remove boilerplate, repeated instructions, and dead context from your prompts
Reserve only the output length you actually need; do not pay for a 4,000-token reply when 400 will do
Summarise long histories instead of resending the full transcript every turn
Pick the smallest model that does the job — GPT-4o mini and Claude Haiku are a fraction of the flagship price
Count before you scale: estimate one request, then multiply by your real volume

None of this requires being an AI expert. It just requires being able to see the numbers — and that is exactly the gap a good token counter fills. Paste your prompt, read the count, the cost, and the fit, and you are making an informed decision instead of a hopeful guess.

Try it on your own prompt

Open the Token Counter, paste a prompt you actually use, and compare the cost across models side by side. It is free, instant, and your text never leaves your browser.

Mahdi Moradi

Full-stack software engineer and founder of Bornara AI, building free privacy-first tools at ZipTools. Based in Calgary, Canada.

Try the tool mentioned in this article.

Open token counter

How AI Background Removal Works — The Technology Behind Instant Cutouts

Theme Photos / Unsplash

AIImage

How AI Background Removal Works — The Technology Behind Instant Cutouts

Neural networks can separate foreground from background in seconds. Here's how the technology works, why client-side processing matters, and how to get the best results.

May 167 min read

Read

How to Cut Your OpenAI and Claude API Costs (Without Worse Output)

Towfiqu barbhuiya / Unsplash

AIDeveloper

How to Cut Your OpenAI and Claude API Costs (Without Worse Output)

AI API bills creep up quietly, token by token. Here are the practical levers that actually lower your cost per request — and how to check the savings before you ship.

Jun 47 min read

Read

Context Windows Explained: GPT-4o vs Claude vs Gemini

Bhautik Patel / Unsplash

AIDeveloper

Context Windows Explained: GPT-4o vs Claude vs Gemini

A bigger context window sounds better — but it changes your cost, your latency, and even your answer quality. Here is what the window really means and how to pick the right one.

Jun 47 min read

Read

What exactly is a token?

Why tokens decide your bill

The context window: will it even fit?

Why token counts sometimes disagree

Practical habits that save tokens (and money)

Related articles

How AI Background Removal Works — The Technology Behind Instant Cutouts

How to Cut Your OpenAI and Claude API Costs (Without Worse Output)

Context Windows Explained: GPT-4o vs Claude vs Gemini