Microsoft just killed triangle-by-triangle ray tracing

It’s hard to believe that the whole real-time ray-tracing thing started all the way back in 2018, with the release of the NVIDIA GeForce 20-series of GPUs. Suddenly, the lighting method used in high-end pre-rendered Hollywood movies could be used in live video games. Well, at least that was the promise.
In practice, ray-tracing hasn’t made the impact it could have. Part of the issue is that the consoles in this generation launched with barely any ray-tracing capability at all, but even on PC it’s really only the higher-end systems that could make decent use of it. That might be changing, because as part of a series of developer announcements, Microsoft is laying out the roadmap for a fundamental shift in this technology, which promises to make it much more efficient and widespread in the future.
Clustered geometry
The end of triangle-by-triangle ray tracing
In case you need a refresher, here’s an excellent explainer by NVIDIA on what ray-tracing actually is.
The irony of ray-tracing is that tracing the actual rays isn’t always the most taxing part of the process. Before you can do the lighting pass, an enormous amount of data about the scene has to be prepared to optimize the ray-tracing process. There’s a data structure that is used to tell if a ray has intersected an object in the scene or not.
How it’s worked so far is that the acceleration structure is built from the triangles in the geometry of the scene. That works just fine at small scale with just a few triangles, but modern graphics have enormous amounts of geometry.
Even worse, that geometry is often dynamic in nature. For example, with mesh shaders you can generate or manage geometry more dynamically, but that can impact the acceleration structure. Modern video games also make use of asset streaming, which means that there’s constantly new geometry being streamed in that needs to be part of that data structure, and geometry that’s left memory and shouldn’t be.
This can result in a situation where a big part of the time your GPU spends rendering a ray-traced scene, it’s just wasting time rebuilding the whole acceleration structure. This is why Microsoft has come up with a new approach to building that data structure it calls “Clustered Geometry.”
Instead of treating every triangle as an individual unit, the new model groups nearby triangles into compact clusters, which are then used as building blocks for acceleration structures. That sounds fancy, but what does it actually mean for us? Here are the key takeaways:
- Acceleration structures can be built faster because they’re assembled from pre-grouped chunks
- Memory usage improves because clusters can be reused across multiple structures
- Work can be spread across frames instead of done all at once
This concept isn’t new in computer graphics as a whole, it’s been applied in concepts like Nanite Virtualized Geometry, but this brings the concept into the mainstream.
Partitioned TLAS (PTLAS): A new hierarchy level
Ray tracing relies on a hierarchy of data structures to keep things fast. At a high level:
- BLAS = individual objects (Bottom Level Acceleration Structures)
- TLAS = the whole scene (Top Level Acceleration Structures)
Again, TLAS can become a bottleneck in large dynamic scenes. The idea is straightforward: instead of treating the top-level structure as one big object, it’s split into smaller, more manageable pieces.
It’s been a long time coming, given how games have changed over the years. PTLAS helps with geometry streaming from disk, scenes with lots of animation, and dynamic level-of-detail systems. If you’ve been paying attention to current and last generation games, all of these should sound familiar. Because modern games are packed with these scenarios.
With PTLAS, game engines only need to update parts of the scene that’s changed. If a full rebuild of data structures isn’t strictly necessary, you skip all that computational effort. Now, updates and rebuilds are modular.
GPU-driven acceleration structures
Your CPU is getting cut out
I think many people are surprised to know that their CPUs play an important part in the ray-tracing process. Up to this point, it was your CPU responsible for coordinating much of the process.
So turning on ray-tracing means taking a hit on frame rate that’s also CPU-limited. DXR has always allowed acceleration structures to be built on the GPU, but the new direction pushes further toward letting the GPU handle more of the process autonomously. With less work on the CPU, overall performance, especially minimum performance, should get a good kick in the pants.
Clustered geometry + PTLAS = scalable ray tracing
Less of a cliff, more of a mountain
When you combine these two technologies, you get way more scalability with ray tracing. That means more systems can handle it, and the performance requirement increases more gracefully with higher resolution or more scene detail.
It’s going to be a while before this is implemented in actual shipping games, but the future or ray tracing seems bright indeed. Even if the actual version number of DirectX appears to be frozen in time.

