Question 1

How much does prompt caching save?

Accepted Answer

It depends on two things: how much of your prompt is static (repeated across calls) and how big your provider's cache-read discount is. Caching only discounts the reused prefix — the system prompt, few-shot examples, and any fixed long context — so if 70% of your prompt is static and reads are 90% cheaper, you cut roughly 63% off your input bill. Set your numbers above for an estimate. Output tokens are never cached and aren't affected.

Question 2

What's the cache-read discount for Anthropic, OpenAI, and Gemini?

Accepted Answer

Ballparks (confirm current values with each provider, as they change): Anthropic cache reads are about 90% cheaper than base input (with a ~25% surcharge the first time the cache is written); OpenAI cached input is roughly 50% off (more on some models); Google Gemini offers context caching with its own per-model pricing. This tool lets you set the discount directly so it stays accurate for whichever provider and model you use.

Question 3

What is prompt caching and when is it worth it?

Accepted Answer

Providers can store the processed form of a repeated prompt prefix so subsequent calls skip re-processing it, billing those tokens at a steep discount. It's worth it whenever you send the same large static context many times within the cache's lifetime — agents (which re-send a growing history every step), RAG systems with a fixed instruction block, chatbots with a big system prompt, or batch jobs over a shared document. If every prompt is unique, caching won't help.

Question 4

Why doesn't caching discount the whole prompt?

Accepted Answer

Only the stable prefix can be cached. The model caches everything up to the first point your prompt changes, so the dynamic tail — the user's actual question, the current tool result — is always billed at full price, and you only benefit if the static part comes first and is reused. Structuring prompts with the fixed content at the top maximizes the cacheable share.

Question 5

Are there downsides or limits?

Accepted Answer

Caches expire (a short TTL, around 5 minutes on some providers), so infrequent traffic may never hit a warm cache. There's also usually a small one-time write surcharge on the first call that populates the cache. Net, caching is a clear win for high-frequency, large-static-prefix workloads and roughly neutral for sparse, all-unique traffic — this calculator helps you tell which side you're on.

Prompt Caching Savings Calculator

Frequently asked

Liked the tool? Get the signal.