VRMark Cyan Room runs on a DirectX 12 engine that uses a multithreaded pipeline optimized for VR rendering.
The engine pipeline is optimized for VR. Scene update, shadow map draw, particle simulations, physics simulation, and geometry visibility solving and culling are executed only once per frame, and the results are shared for both eye views. All other rendering passes are executed per eye view.
The rendering, including scene update, visibility evaluation and command list building, is done with multiple CPU threads using one thread per are available logical CPU core. The aim is to reduce the CPU load by using multiple cores.
Cyan Room implements explicit multi-GPU rendering for systems with 2×GPUs. The engine implements multi-GPU support using a linked-node configuration. Heterogeneous adapters are not supported.
The Umbra occlusion library is used to accelerate and optimize object visibility evaluation for all cameras, including the main camera and light views. Culling only runs on the CPU and does not consume GPU resources.
One descriptor heap for each descriptor type is created when the scene is loaded and then used during the tests. Hardware Tier 1 is sufficient for containing all the required descriptors in the heaps. Root signature constants and descriptors are used when suitable.
Implicit resource heaps are used for most resources.
Asynchronous compute is used extensively to overlap multiple rendering passes and achieve maximum utilization of the GPU.
The following rendering passes/features for the left eye are run asynchronously:
- Particle simulation
- Light culling
- MSAA edge detection
The engine supports Phong tessellation and displacement-map-based detail tessellation. Tessellation factors are adjusted to give a sensible edge length for the output geometry on the render target. For shadow maps, the edge length is also calculated from the main camera to reduce aliasing due to different tessellation factors between the main camera and shadow map camera. Back-facing patches and those outside of the view frustum are culled by setting the tessellation factor to zero. When the size of an object's bounding box on the render target drops below a threshold, tessellation is turned off by disabling hull and domain shaders. If an object has several geometry LODs, tessellation is used on the most detailed LOD.
Objects are rendered in two steps; first, all opaque objects are drawn into G-buffer. In the second step, transparent objects are rendered using an order-independent transparency algorithm to another target, which is then resolved on top of surface illumination later on.
Geometry rendering uses LOD system to reduce the number of vertices and triangles for far-away objects. This also results in a bigger on-screen triangle size.
The material system uses physically-based materials. The system supports textures for albedo, metallicity, normal, roughness, displacement, luminance, blend, opacity, detail normal, and cavity. A material need not use all textures.
Opaque objects are directly rendered to the G-buffer.
Transparent objects are rendered using a technique called Weighted Order-independent Transparency. The technique only requires two render targets and the special blending settings to achieve a good approximation of real transparency in the scene.
There are also additively blended objects, which do not require special treatment.
The lighting of opaque surfaces is evaluated using a tiled method in multiple separate passes.
Before the main illumination passes, compute shaders are used to cull lights and mark the tiles that are to be illuminated for shadowed and unshadowed lights.
Every lighting pass has its own result texture. All illumination passes are executed on 8x8 tiles.
Prebaked global diffuse illumination and prebaked environment reflections are evaluated for all tiles.
Unshadowed lights contribution is evaluated by using light culling data per tile.
Shadowed lights are evaluated similarly, but with their own light-culling data and shadow maps.
Shadowed and unshadowed passes are executed indirectly only on tiles that contain appropriate light data.
The combined result is fed to the post-processing stages.
Shadows are sampled in surface illumination shaders.
Particles are simulated on the GPU using the asynchronous compute queue. Simulation work is submitted to the asynchronous queue. G-buﬀer and shadow map rendering commands are submitted to the main command queue.
Particles can be illuminated with scene lights or they can be self-illuminated. The output buffers of the GPU light culling pass are used as inputs for illuminated particles. The illuminated particles are drawn without tessellation and they are illuminated in the pixel shader. Particles are blended together with the same order-independent technique as transparent geometries.
Bloom is based on a compute shader FFT that evaluates several effects with one filter kernel. The filter combines blur, streak, lenticular halo, and anamorphic flare effects.
Deferred multi-sample anti-aliasing (MSAA)
MSAA is implemented in the following fashion:
- Multi-sampled G-buﬀer is drawn
- Edges are detected, an optimal sample mask is generated, and a single sample luminance and depth is outputted
- Illumination is multi-sampled with the sample mask on the edges
- Single sampled pixels use resolved G-buffer surfaces
- The rest of the pipeline (GI, post-processing) uses single sampled resources
At the beginning of every frame, a multi-sampled G-buﬀer is created with a selected sample count. Supported sample counts are 2, 4 and 8. Multi-sampled textures are drawn in geometry draw tasks.
After geometry draw tasks are done edge pixels are detected. Edge detection is done based on depth, normals, and fog density. This method produces signiﬁcantly fewer complex pixels than using SV Coverage. Detection is done in a separate edge renderer shader pass, which takes the multi-sampled G-buffer as shader resource views and ﬁnds the geometry edges. In addition, the edge detector identifies how many samples in a multi-sampled fragment contain unique data and computes a weighting factor for each unique sample (for example, if a texel is fully covered by rasterized fragment, this would correspond to a single unique sample with a weight 4 in case of MSAAx4). These data (edge bitmask and the weighting factors) are packed into 16-bit unsigned normalized edge texture.
The illumination pass takes the G-buffer and edge texture as resources. If the current shaded position is on the edge, illumination is calculated with contribution from each unique MSAA sample weighted by the corresponding weighting factor extracted from the edge texture, this calculation is distributed for the whole thread group.
Fast approximate anti-aliasing (FXAA)
FXAA is implemented in the post-processing chain. The implementation is described in this whitepaper.
Conservative Morphological Anti-Aliasing (CMAA)
CMAA is implemented in the post-processing chain using this implementation.
The engine uses the OpenAL Soft library to produce spatial effects for the scene audio based on distance and location relative to the camera. Audio occlusion and acoustics are not simulated in audio effects.