What is the AI compute crunch, and why are AI tools hitting usage limits?

In late March, some of the heaviest users of Anthropic’s large Claude language models began posting screenshots of a strange new rarity: They were hitting usage limits of five hours in 20 minutes. Complaints spread across Reddit, GitHub, and X. Anthropic told subscribers that their sessions would exceed usage limits more quickly during peak hours. The company also blocked certain third-party tools, including OpenClaw, from taking advantage of its flat-rate subscription limits. A few weeks earlier, Boris Cherny, who runs Claude Code, said a default setting for how the model thought had been lowered.
Users immediately wondered why a paid AI tool was suddenly giving them less. Had the AI boom started to outpace the machines needed to sustain it?
The pressure is not limited to Anthropic. OpenAI has started shutting down Sora, its video generation platform, as the number of developers using its Codex coding assistant soared to four million per week. Investors and developers are now talking about an “IT crisis,” the possibility that demand for AI will grow faster than companies can build data centers and power them.
On supporting science journalism
If you enjoy this article, please consider supporting our award-winning journalism by subscription. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
The stakes are higher than developer frustration. If AI becomes the everyday interface for coding, science, learning, medicine, customer service, defense planning, and office work, then access to computing becomes access to economic speed. And limits are starting to appear in the products used.
The numbers are already high. In a July 2025 white paper, Anthropic projected that the U.S. AI sector would need at least 50 gigawatts of electrical capacity by 2028 to maintain global AI leadership, or roughly the output of 50 large nuclear reactors. The International Energy Agency predicts that global data center electricity consumption is on track to double by 2030.
The calculation is not new. Every conversation with Claude or GPT runs on the same underlying machine that calculates spreadsheet totals and renders video games: wafers of silicon etched with billions of microscopic switches, organized into specialized processors. Training a frontier model may require tens of thousands of these processors to operate for weeks or months. Once the model is trained, using it also consumes computation every time someone asks a question. This demand now extends across the entire supply chain. On January 15, Taiwan Semiconductor Manufacturing Company (TSMC), which makes most of the world’s advanced AI chips, announced that it would spend up to $56 billion this year alone to expand its capacity. Customers always ask for more.
AI policy expert Lennart Heim is a helpful guide to this machinery. He previously led computing research at the RAND Center on AI, Security and Technology and co-founded Epoch AI, which tracks the resources behind cutting-edge AI models. Its pace is where a cloud dashboard becomes a construction project, where digital demand collides with factories, transformers, chips and cables.
[An edited transcript of the telephone interview follows.]
Developers say rate limits and blocking of third-party tools look like a math problem. What does a compute shortage really mean?
When we talk about “computation,” we mean computing power. For AI, the training calculation scales with the size of the model: larger neural networks need more data, and more data requires more processing power. What has been underestimated for years is that the same relationship applies to deployment. Running the model for users (inference) is extremely computationally intensive, as larger models require more computing power to run. So if more people use AI with more tokens and more intensity, you will need more compute. If 10 times more people use AI than 10 times more people, you’ll need almost 100 times more compute.
Why does a flat-rate subscription break down for AI in a way that previous internet services didn’t?
The Internet works thanks to flat-rate subscriptions: you pay $20 per month and benefit from effectively unlimited use. This works when the marginal cost per user is low: a power user of Google Workspace doesn’t cost Google much more than a light user. With AI, things break. Using AI 10x more costs the provider about 10x more money. Paying per token means you are literally paying for your resources; paying $20 flat rate means you often burn through more compute than $20 can buy. This is why we see rate limits mainly on monthly subscription plans. At some point, you have to limit rates.
Beyond throughput limits, what levers do these companies have to control the amount of computing consumed by users?
They have several levers. If you use ChatGPT, you default to a mode called Auto: you ask a question and ChatGPT determines which model should answer. Is it a really smart model that thinks for a long time, or are you just asking about the weather, in which case it can give you an immediate answer. Anthropic has started defaulting to Claude Sonnet, which is a smaller, less powerful model. It works cheaply, but you also get less intelligence out of it.
People also don’t use these tools effectively. It’s like asking Albert Einstein how to open a bottle of wine.
The OpenAI Codex offers more use for the money than Claude Code. Is this sustainable, or will we see everyone move to more restrictive plans?
OpenAI is the company with more money and a higher valuation, and it simply has more compute. Building a data center is difficult; Building chips is perhaps the hardest thing in the world. Even if OpenAI stopped developing good models tomorrow, they have a ton of compute, which gives them a ton of power.
Anthropic’s problem is that data centers are incredibly expensive (you have to pay NVIDIA a lot of money) and if you build them too much you spend huge amounts of money on unused capacity. You want to build exactly what you need, but you can’t predict it.
The future will continue to be somewhat constrained by the math, and eventually market mechanisms will resolve it: you will raise the price. Right now, I would say these companies prefer to keep prices low, so that everyone benefits from the experience, rather than raise prices.
Walk me through the supply chain. What are the biggest bottlenecks preventing AI companies from simply developing more compute?
Historically, software companies were able to grow 10 to 100 times in a short period of time because they were not bound by physical constraints: this is the philosophy of Silicon Valley. But if we had 100 times more AI users tomorrow, we simply wouldn’t have enough compute to serve them.
This mindset is directly reflected in the supply chain. For example, TSMC is a company that, if it builds a factory with no customers and it is not 80% utilized, goes bankrupt. Sam Altman comes in saying he needs 100 times more chips, and they say, “You’re crazy.” This is part of the reason we have a compute shortage.
Same thing further down the chain: once you have the chips, you need electricity – you need gas turbines. You go to the gas turbine manufacturers and say, “We need N times more gas turbines,” and they say, “You’re kidding me, this industry has been stable over the last decade. » This is where the digital world meets the physical world. At the moment we don’t have enough memory. Much of that will go to AI chips, meaning memory prices will rise and your smartphone will cost more next year. Companies want to develop more memory and do not have enough clean room space. They need special factories, called “manufacturing plants”, but only a handful of companies in the world can build these factories, and they are all full.
Are training models and responses to user queries competing for the same resources?
Companies want to build bigger, better systems so they can raise more money and eventually build AGI – and at the same time, they want to make money now. Inference increases when everyone is awake and using it; training is ongoing.
A better framework is probably not training versus inference, but R&D computing versus service computing: people should test ideas. Recent reports suggest that the majority (around 60%) were related to the R&D calculation. This shows how these companies are constantly making trade-offs between building better products and allocating compute to users.



