Most production AI tasks are routine, and a new class of small models handles them at a fraction of the cost. The frontier models are becoming the exception, not the default.
A quick read — the essentials, fast.
For two years the story of AI was a race to build the biggest model. The signal now points the other way. Teams shipping real products are quietly routing most of their traffic to small models, the kind that run cheaply and answer in a fraction of a second, and reserving the expensive frontier models for the handful of requests that actually need them.
This is not a downgrade. It is a recognition of what most AI workloads actually look like once they leave the demo and meet real users.
Most requests are boring#
If you instrument a deployed AI feature and look at what users actually ask, a pattern shows up fast. The bulk of requests are short, repetitive, and shallow: classify this ticket, extract these fields, rewrite this sentence, decide whether this comment is spam, tag this photo, summarize this paragraph. None of that needs a model that can also pass a law exam or write a working compiler.
A rough split most teams recognize:
- A large majority of calls are routine and well within a small model's reach.
- A smaller slice needs real reasoning, long context, or careful synthesis.
- A thin sliver is genuinely hard and worth paying frontier prices for.
When the cheap tier can absorb most of the volume, the economics change completely. You are no longer paying premium rates for a thousand requests to find the ten that needed it. You pay premium only for the ten.







Discussion