Nvidia CEO Jensen Huang says the next AI boom belongs to inference

Jensen Huang took the stage at the SAP Center on Monday for his GTC keynote and did what he does best: turning a product keynote into a zoning hearing for the future. THE Nvidia The $NVDA founder and CEO opened the company’s closely watched developer conference by promising a tour of “every layer” of AI, then spent the next part arguing that the company isn’t just selling chips into a hot market. No. The company wants to define the entire physical system of the AI economy: computing, networking, storage, software, models, factories and – because subtlety is clearly out of season – maybe even orbital data centers.
The keynote blasted announcements in all directions, but the real message was more specific than the confetti cannon suggested. Huang wanted investors, customers, and competitors to hear four things clearly: demand for AI continues to grow fast enough to justify indecent spending; inference is now at the center of the battlefield; agents are expected to step out of chatbots and into the daily machinery of office work; and the next gold rush after digital AI could be physical AI, in which robots, autonomous systems and industrial software burn through even more data and infrastructure.
Well, that’s a big number
Huang’s greatest flexibility was digital. He marked CUDA’s 20th anniversary, called it the engine of accelerated computing, said computing demand had increased “1 million times over the last few years,” then upped the ante by saying he now sees at least $1 trillion in revenue opportunities between 2025 and 2027. NvidiaIt’s telling, like the down payment money.
This number also did some discreet cleaning work. Nvidia has spent months answering the usual questions that arise whenever a company becomes the lead cashier of a capital spending spree: How long can it last, what happens when hyperscalers trust each other on costs, and how much of the next phase trickles down to custom chips and cheaper alternatives?
Huang’s response was to broaden the perspective. The token, as GTC’s opening video states, is the cornerstone of the new era of AI. Huang’s point was that the activity related to these tokens would not be limited to training giant models and admiring them in benchmarks. It goes into production, where the meter never stops running.
Inference Takes Center Stage
One of the sharpest lines in the keynote was perhaps also the simplest: “The inflection of inference has arrived.” Huang divided inference into two stages – pre-filling and decoding – and presented a system in which NvidiaVera Rubin’s chips take care of the pre-filling work, while Groq-derived silicon tackles decoding, the step that provides the answer. This is important because inference is where NvidiaThe next war becomes even more complicated. The training has enriched the company. Serving hundreds of millions of users in real time is where customers start asking rude questions about cost, latency, and whether they really need the same silicon at each stage.
Huang’s response was therefore classic Nvidia. Don’t defend the GPU in isolation, swallow the whole stack. He described Vera Rubin as “a generational leap” built around seven chips and five rack-scale systems, with Nvidia claiming that the platform can train a large mixture of expert models with a quarter of the number of GPUs compared to Blackwell and deliver up to 10x the inference throughput per watt at a tenth of the cost per token. He also used the speech to look beyond Rubin to the future Feynman platform, because in Nvidia-land, the next generation is already in the wings before the current one has finished bowing out.
This was San Jose’s deepest message. Huang wasn’t so much proposing a faster game as more dependency. Nvidia announced a Vera Rubin DSX AI factory reference design, DSX simulation tools for planning AI factories before they are built, and a broader menu of storage, networking, and system components intended to operate as a vertically integrated machine. The message was hard to ignore: stop thinking about servers, start thinking about campuses. Or, if you are Nvidiastart sending invoices as a public service.
The agents leave the demonstration stage
If the speech on equipment was aimed at keeping Nvidia At the center of the inference, the software discourse was about ensuring that enterprise AI did not become someone else’s party. Huang said that “Claude Code and OpenClaw have triggered the agent inflection point,” adding that “employees will be supercharged by the specialized, custom border agent teams they will deploy and manage.”
Nvidia has coupled this rhetoric with its Agent Toolkit, its OpenShell runtime, and its AI-Q model – software that it says can help companies create autonomous agents with policy guardrails and, in the case of AI-Q, reduce query costs by more than 50% through a hybrid mix of border and Nvidiaour own open models.
There was a strategic hedge nestled throughout this opening. Nvidia has unveiled the Nemotron Coalition with Black Forest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection AI, Sarvam and Thinking Machines Lab, with the first project intended to support the upcoming Nemotron 4 family of models. Read the subtext, and it’s pretty clear that Nvidia doesn’t want the future of AI software to be neatly divided between a few giant vendors of closed models and a stack of commodity hardware underneath. He also wants a hand in the open model layer – the element that determines who can build, tune and own AI outside the walls of the biggest labs.
The robot’s pitch is getting bigger
Huang expanded NvidiaThe story of went beyond digital assistants for a while, and GTC pushed this theme even harder. Nvidia announced a plan for a physical AI data factory with Microsoft $MSFT Azure and Nebius, intended to automate how training data is generated, augmented and evaluated for robotics, vision AI agents and autonomous vehicles. The pitch is pretty simple: real-world data is scarce, edge cases are boring, and synthetic data and simulation can turn computation into the raw material these systems need.
Huang also previewed the GR00T N2, a next-generation robot base model based on DreamZero research that the company says more than doubles the success compared to leading VLA models on new tasks in new environments. This section of the keynote will perhaps end up aging the best. Chatbots have excited Wall Street. Physical AI is the thing that could keep infrastructure frenzied for years to come, because robots, industrial systems, and autonomous machines don’t just need models: they need endless amounts of training data, simulation, networking, sensors, and edge computing. Huang even took the story a step further and said Nvidia is going into space, with future Vera Rubin-based systems aimed at orbital data centers and autonomous space operations. Of course, this sounds a bit like a man who discovered that there are still a few untouched sectors on the bingo card. But it also looks like a company determined to make “AI infrastructure” mean almost every expensive machine in sight.
Once Huang finished, the keynote seemed bigger than a commencement calendar. It looked like an empire map. Yes, there was DLSS 5 for graphics, new tie-ins with industrial software, cutting-edge telecommunications partnerships, and an avalanche of plumbing for developers. But the lasting result was simpler and much more important: Nvidia wants AI to stop being understood as a category of software and start being treated as an infrastructure project on a public service scale, with NvidiaHardware and software are integrated into each layer.
It’s a very Jensen Huang message – carefully presented and slightly modest. What’s disconcerting for his competitors is that, for now at least, he still has plenty of customers willing to rely on him.


