In 2022, Nvidia introduced hardware-level Shader Execution Reordering (SER) with its RTX 40-series GPUs in order to make ray tracing less taxing. Now, it's officially part of DXR 1.2, which is included in the new DirectX Agility SDK (version 1.619). The announcement blog isn't a casual read because of all the technical jargon, so let's break down what this actually means and how it improves performance.
It allows the GPU to find patterns across rays, grouping them together to enable better parallel execution. SER works hand-in-hand with Opacity Micromaps (OMMs), the other highlight feature included in DXR 1.2, which saves processing power by telling the GPU not to run a shader when hitting a transparent or translucent surface.
Your graphics card will only shade the visible pixels as the Opacity Micromaps will give it precise hints on what part of the scene needs to be opaque (and what doesn't). So, SER begins by grouping similar ray-traced shaders together, and then the OMMs let it skip the "invisible" ones entirely. Reducing unnecessary shader work simply allows you to maintain more FPS in games, especially in complex scenes.
In a branching blog, Microsoft shows its own demo for SER, where a scene is rendered with and without it. Using SER, Nvidia GPUs saw a 40% boost in performance while some Intel Arc B-series GPUs got up to 90% more FPS. This feature, now being standardized, means we can potentially see Intel and AMD implement their own hardware-level SER in next-gen GPUs.
The last noteworthy inclusion in this SDK update was Shader Model 6.9, which is what actually enables developers to interface with both OMMs and SER. This will make game developers very happy, but it's ultimately up to them to implement these features before a player-facing upgrade is ever seen. To be clear, these features were announced last year but just came out of preview today.
There are a lot more details in the blog that we didn't go over, such as support for Long Vector, 16-bit float operations, and general changes to streamline hardware overhead. Some of them target the poorly optimized games we see today, struggling with anything less than 12 GB of VRAM. It's all early, programmer-focused patchwork for now, but it can translate to real-world improvements soon.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
