Table of contents
Open Table of contents
The Pattern Is Already Visible
The cost of running a frontier-class AI model has dropped dramatically over the past two years. What cost tens of dollars per million tokens in early 2023 now costs a fraction of a dollar for equivalent capability. Model providers are competing on price. Open-source alternatives are closing the gap. Hardware is getting faster.
And yet, enterprise AI budgets are growing faster than they ever have. Organizations that started with one or two AI use cases are now running dozens. The median enterprise AI bill is climbing year over year — not because unit costs went up, but because consumption expanded faster than prices fell.
This is exactly what Jevons described. Efficiency doesn’t reduce demand for a resource people want more of. It unlocks it.
Stanford economist Erik Brynjolfsson has identified three conditions for a Jevons Paradox to take hold: the technology makes workers meaningfully more productive, that productivity translates into lower effective costs, and demand responds elastically — meaning people find many more ways to use it when it gets cheaper.
All three conditions hold for AI compute. And unlike agriculture — where food demand eventually hits a biological ceiling even as farming gets more efficient — AI demand has no obvious saturation point. Every cost reduction opens new applications that weren’t viable before.
Three Forces Compounding the Effect
It’s not just that people use more AI when it gets cheaper. Three specific dynamics are compounding the effect in ways engineering leaders should understand.
Application proliferation. When inference was expensive, organizations were selective. Only the highest-value use cases justified the cost. As prices dropped, the economic threshold fell with them. Summarization, classification, content generation, code review, internal search — use cases that didn’t pencil out a year ago are now in production. Each one adds to the aggregate bill.
Volume intensity per application. Cheaper inference doesn’t just enable new applications — it changes how existing ones behave. A customer support system that handled hundreds of AI interactions per day at premium pricing might handle tens of thousands at current rates. The per-interaction cost fell. The line item grew.
Model complexity escalation. This is the force most engineering teams underestimate. The industry isn’t standing still at simpler, cheaper models. Post-training techniques now use multiples of the compute required for the original training run. Test-time scaling — where models “think longer” on harder problems — can consume orders of magnitude more compute than a simple inference call. Users and product teams naturally gravitate toward more capable (and more expensive) model configurations as they become available. Deloitte projects that inference workloads will represent roughly two-thirds of all AI compute by the end of 2026, up from about one-third in 2023.
Each of these forces independently drives consumption upward. Together, they compound.
We’re Still Early
The temptation when hearing about Jevons Paradox is to assume it describes a mature phenomenon — something that’s already played out. For AI, the opposite is true. We’re closer to the beginning of the demand curve than the middle.
Consider the infrastructure signals. Global AI data center capital expenditure is expected to reach hundreds of billions of dollars in 2026, with projections heading toward a trillion dollars annually by the end of the decade. Bain estimates 200 gigawatts of new compute capacity will be needed globally by 2030 — enough to push data centers from roughly 2% to nearly 9% of total US electricity consumption. The power grid, which has seen flat demand growth for two decades, is suddenly a bottleneck.
AI’s computational demand has been growing at roughly twice the rate of Moore’s Law over the past decade. Efficiency gains are real, but the frontier of what people want to do with AI keeps advancing faster. Every new capability — multimodal understanding, agentic workflows, real-time reasoning — opens another wave of demand that dwarfs the savings from the last round of optimization.
This is what Jevons saw in the coal industry, scaled up and accelerated. The resource gets used more efficiently. And precisely because of that efficiency, far more of it gets used.
What This Means for Engineering Leaders
Jevons Paradox isn’t an argument against efficiency. More efficient AI is unambiguously good — it enables more value creation, broader access, and better products. The paradox is a warning against a specific planning error: assuming that efficiency gains will translate into lower total spend.
If your AI cost strategy depends on “costs coming down,” you’re planning for a world that Jevons would recognize as a fantasy.
Here’s what holds up instead:
Treat cost as an architecture decision, not a finance review. Roughly 80% of cloud costs are locked in at design time — in the choice of model, the data pipeline architecture, the caching strategy, the retry logic. By the time finance reviews the quarterly bill, the decisions that shaped it are months old and expensive to change. FinOps as an architecture discipline is the only approach that catches cost at the point where it’s still cheap to alter.
Build for elastic demand, not fixed budgets. If Jevons is right — and the data says he is — your AI usage will grow faster than you expect. Architectures that assume stable consumption will break. Design systems with model routing, automatic right-sizing, and cost attribution from the start. The inference tax is already the dominant cost driver — and agentic workloads will multiply it further.
Get visibility into the full cost stack. The token bill you see from your model provider is typically only 20–40% of your actual AI infrastructure cost. The rest — data pipelines, vector databases, monitoring, storage, orchestration — hides across your cloud bill in ways that standard reporting wasn’t designed to reveal. You can’t govern what you can’t measure, and Jevons guarantees that the unmeasured portions will grow.
Account for environmental compounding. Every watt of compute your AI consumes has a carbon footprint. As demand scales, so does environmental impact — even if each individual inference is more efficient than before. GreenOps practices that align cost optimization with carbon reduction aren’t a nice-to-have. They’re a structural necessity in a world where total consumption keeps rising.
Efficiency Is the Accelerant
Jevons Paradox reframes how engineering leaders should think about AI’s trajectory. Efficiency isn’t the brake on AI spending — it’s the accelerant. Every optimization makes AI accessible to more use cases, more users, and more ambitious architectures. That’s a good thing for the value AI creates. It’s a dangerous thing if your cost model assumes the opposite.
The organizations that build cost-aware architectures now — with visibility, governance, and elastic design — will turn expanding demand into competitive advantage. They’ll capture the upside of cheaper, more capable AI while keeping spend proportional to value.
The ones that wait for efficiency to solve their cost problem will keep waiting. Jevons told us how this ends 160 years ago.
Sources
- NPR Planet Money, “Why the AI world is suddenly obsessed with Jevons paradox” — Original paradox history and Nadella’s invocation after DeepSeek
- NavyaAI, “Tokens got 99.7% cheaper. So why did your AI bill triple?” — Token pricing trajectory and enterprise spending data
- Deloitte, “Why AI’s next phase will likely demand more computational power, not less” — Post-training compute multipliers and inference workload projections
- Bain & Company, “How Can We Meet AI’s Insatiable Demand for Compute Power?” — 200 GW compute capacity forecast and electricity demand projections
- arXiv, “From Efficiency Gains to Rebound Effects: The Problem of Jevons’ Paradox in AI” — Academic analysis of rebound effects in AI energy consumption