Skip to content

Free tool

AI Agent Cost Calculator

What does it actually cost to run an AI agent? An agent is a multi-step loop that re-sends its growing context every step — so it costs far more than a single API call. Set your usage to see the monthly spend across the major models, and watch how the bill climbs as steps and context grow.

Per run: ~40k input + 3k output tokens — note the input is ~3.3× the base because context compounds each step.

  • GPT-4o minicheapest$7.92/mo
  • Gemini 2.5 Flash$20.00/mo
  • Claude Haiku$44.80/mo
  • Gemini 2.5 Pro$82.00/mo
  • GPT-4o$132/mo
  • Claude Sonnet 4$168/mo
  • Claude Opus 4$840/mo

Agents amplify token spend — a local model at $0/token changes the math entirely.

Approximate public list prices ($/1M tokens). A real agent's cost varies with caching, retries and parallel branches — this is a directional estimate, not a quote.

Frequently asked

How much does it cost to run an AI agent?

It depends on three things: how many times the agent runs, how many LLM calls (steps) each run takes, and how big the context is at each step. Set those above and you'll get a monthly estimate for each major model. The key driver people miss is that an agent re-sends its growing history every step, so a 10-step agent costs far more than 10 single calls — often 3-6× more once context compounds.

Why does context accumulation make agents expensive?

An agent is a loop: at each step it sends the system prompt, the task, everything it has said so far, and every tool result back to the model, then gets a new response. So step 8 might carry the tokens from steps 1-7. Input tokens grow roughly quadratically with the step count, and you pay for those input tokens every step. Toggle 'Context accumulates' to see the difference — it's usually the largest line in an agent's bill.

How do I cut my agent's cost?

Four levers: (1) use a cheaper or smaller model for routine steps and reserve the frontier model for the hard ones; (2) trim or summarize the running context instead of re-sending everything; (3) cache the stable system prompt (most providers bill cached tokens far less); and (4) reduce step count with better planning. Running a capable model locally removes per-token cost entirely, which is why agentic workloads are a strong case for local inference.

What counts as a 'step' and 'tool-result tokens'?

A step is one LLM call in the agent's loop (reason → act → observe). Tool-result tokens are whatever a tool returns — a web page, a database row, a file — that gets fed back into the model's context for the next step. These are easy to underestimate: a single scraped page can be thousands of tokens, and in an accumulating agent they're re-sent on every subsequent step.

Are these prices exact?

They're approximate public list prices ($/1M tokens) and are directional, not a quote — confirm current pricing with each provider. A real agent's bill also shifts with prompt caching, retries, and parallel branches. Use this to compare models and sanity-check a budget, not to bill a client.

Newsletter

Liked the tool? Get the signal.

One weekly email on the AI + hardware that actually matters — from the people who build these calculators.

Free · unsubscribe anytime · no spam.