In 3DMark Speed Way, we have improved our existing illumination pipeline by removing its reliance on any precomputed illumination data (e.g., lightmap and probes), making it completely dynamic. To highlight dynamic illumination, we limit further effects and keep the rest of the rendering pipeline simple.

CPU side

Since Speed Way does not have a CPU test, the main role of the CPU in the test is to compose command lists for the GPU to execute. The task system allows heavy parallel execution. The rendering — including scene update, visibility evaluation and command list building—is done with multiple CPU threads using one thread per available logical CPU core. This shortens the CPU rendering time and reduces the chance of the CPU becoming a bottleneck.

GPU side

The GPU side of the rendering is composed of multiple rasterization, compute, and ray tracing passes. Some passes run in an asynchronous compute queue.

What are meshes and mesh shaders?

In 3D graphics, a mesh is the set of vertices, edges, and faces that define the shape of an object. In current graphics pipelines, all the geometry data in a mesh must be processed sequentially before any further steps can be taken.

This is wasteful as there will always be geometry on the backside of objects, outside of the camera field of view, or behind other objects. Mesh shaders allow for certain processing stages for these non-visible geometries to be skipped efficiently.

Mesh shaders replace the old approach with a new model and allow for more efficient and flexible ways to shade geometry data in a mesh. This is achieved by processing chunks of the mesh, called meshlets, in parallel. Amplification shaders, another part of the mesh shader pipeline, can then be used to efficiently determine which meshlets are visible before shading.

The end result is a far more efficient way of culling meshlets and shading than traditional methods.

Continuous mesh LOD and culling

G-buffer and shadow map rendering use amplification and mesh shaders for selecting LODs on a mesh cluster level, similar to the 3DMark Mesh Shader feature test. Most meshes are hand-optimized and only have one constant level of detail (LOD), but most detailed geometry will benefit from this LOD selection.

Like the 3DMark Mesh Shader feature test, Speed Way uses cluster culling. Simple proxy geometries are first drawn, and bounding spheres are used to check behind them for parts of complex geometries. The vertex processing stage for parts that are behind these proxy geometries in the current frame are skipped.

Both culling and LOD selection are applied on a cluster level so that the mesh is an object with a single material divided into clusters of configurable size. The vertex and triangle count of the clusters are higher than the vertex and triangle counts of a mesh shader group output (meshlet). This allows dynamic LOD selection for clusters by generating variable amounts of mesh shader groups for a single cluster. More detailed LODs will cause more mesh shader groups to be dispatched from the amplification shader.

Direct Illumination

Direct illumination will utilize a basic deferred renderer. World-space clustered culling is used for lights, as that structure can be used in ray-traced effects as well. Shading will be executed in a compute shader.

What is ray tracing?

Ray tracing is a technique used to simulate how light and shadows will behave in a rendered scene. Compared to traditional rasterization, ray-traced scenes produce far more realistic lighting and shadow effects. While not a new technology, it’s only in recent years consumer GPUs have been capable of running real-time ray-traced games at frame rates acceptable to gamers.

Real-time ray tracing brings a new level of realism to in-game graphics. With DirectX Raytracing, games can render accurate real-time reflections of dynamic objects and produces reflections of objects that exist outside of the main camera view. Reflections are not just for mirrors and chrome, and many other types of surfaces in games will look more realistic with ray tracing.

Ray tracing passes and hit shaders

In the ray-tracing pipeline passes, illumination will be computed in hit shaders. The glass reflections utilize ray queries in pixel shader, so the hit shaders will not affect that pass.

Indirect Specular Illumination

Indirect specular illumination (reflections) are approximated using the diffuse lightmap’s spherical harmonics for roughest surfaces and ray tracing for other surfaces. A spatiotemporal denoiser is used for the noisy ray-traced reflections. For rougher reflections, the direct illumination lightmap is used to approximate the lighting of the reflected surface in a separate hit shader.

Indirect specular illumination (reflections) are approximated using the diffuse lightmap’s spherical harmonics for roughest surfaces and ray tracing for other surfaces. A spatiotemporal denoiser is used for the noisy ray-traced reflections. For rougher reflections, the direct illumination lightmap is used to approximate the lighting of the reflected surface in a separate hit shader.

For more accurate reflected motion vectors, curvature is also included in the gbuffer. A relatively accurate estimate of curvature within a triangle is computed beforehand into a per-triangle value for each mesh.

Direct Illumination Light Map

A low-resolution lightmap is computed each frame for diffusing direct light. This allows efficient evaluation of direct light for the diffuse global illumination, and in some cases, reflection as well.

Indirect Diffuse Illumination

Indirect diffuse illumination will be implemented using dynamically updated lightmaps, where one indirect bounce per frame is evaluated in the lightmap’s texture space using path tracing. Denoising in screen space, and heavy temporal filtering in lightmap space is used. The lightmap is encoded with spherical harmonics to preserve directional information.

The direct illumination lightmap is used to compute direct illumination of the surface at the evaluated bounce.

Transparent geometry draw

For rendering transparent geometries we use a variant of an order-independent transparency technique called Order-Independent Transparency Approximation with Raster Order Views. Simply put, transparent geometry is rendered and a per-pixel visibility function (accumulated transparency) is approximated by merging pixels into the compressed function. Then the transparent geometry is re-rendered, illuminated and additively blended according to the visibility function.

Ambient occlusion

Ambient occlusion uses an adaptive screen-space technique. It is computed using a group of compute shader passes

Particles

Particles are simulated on the GPU using the asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and shadow map rendering commands are submitted to the main command queue

Particle illumination

Particles are rendered as transparent surfaces with approximated visibility.