Training a model is a one-time headline number. Inference is the recurring bill that scales with every user and every request, and it is what quietly decides whether an AI product survives.
A quick read — the essentials, fast.
The eye-watering numbers in AI coverage are almost always about training: the cost to build a model from scratch. Those numbers are real, but they are also a one-time, mostly fixed cost. The number that actually determines whether an AI product lives or dies is inference, the cost of running the model every single time someone uses it. Training is the down payment. Inference is the rent, and the rent never stops.
Training is a sunk cost; inference is a tax on growth#
Once a model is trained, that money is spent. Inference is different in a way that should keep founders up at night: it scales directly with usage. Every active user, every request, every retry adds to the bill. The more successful your product, the more it costs to run, and that cost arrives in real time whether or not the revenue does.
This inverts the usual software intuition. Traditional software gets cheaper per user as you grow, because the cost of serving one more user rounds to zero. AI products do not get that gift for free. More users means more inference means more cost, unless you engineer your way out of it. A viral hit can become an existential threat: the spike in usage you celebrated on Monday is the bill you cannot pay on Friday.
The trap of the free-and-generous launch#
The failure pattern is easy to spot in hindsight. A product launches with a generous free tier and an expensive model behind every interaction. Users love it. Usage climbs. And the cost of serving that usage climbs right alongside, with no revenue keeping pace.







Discussion