The Fire Strike engine uses DirectX 11 feature level 11.
The multithreading model is based on DX11 deferred device contexts and command lists. The engine utilizes one thread per available logical CPU core. One of the threads is considered as the main thread, which uses both immediate device context and deferred device context. The other threads are worker threads, which use only deferred device contexts.
Rendering workload is distributed between the threads by distributing items (e.g. geometries and lights) in the rendered scene to the threads. Each thread is assigned a roughly equal amount of scene items.
When rendering a frame, each thread does the work associated to items assigned to the thread. That includes, for example, computation of transformation matrix hierarchies, computation of shader parameters (constants buffer contents and dynamic vertex data) and recording of DX API calls to a command list. When the main thread is finished with the tasks associated with its own items, it executes the command lists recorded by worker threads.
The engine supports rendering with and without tessellation. The supported tessellation techniques are PN Triangles, Phong, and displacement map based detail tessellation. Both triangle and quad based tessellation is supported.
Tessellation factors are adjusted to achieve the desired edge length for output geometry on the render target. Additionally, patches that are back facing and patches that are outside of the view frustum are culled by setting the tessellation factor to zero.
Tessellation is turned entirely off by disabling hull and domain shaders when the size of the object’s bounding box on the render target drops below a given threshold. This applies both to g-buffer and shadow map drawing.
Lighting is done in a deferred style. Geometry attributes are first rendered to a set of render targets. Ambient occlusion is then computed from the depth and normal data. Finally, illumination is rendered based on those attributes.
Two different surface shading models and g-buffer compositions are supported. The more complex model uses four textures and depth texture as the g-buffer. The simpler model uses two textures and depth texture.
Surface illumination model is either combination of Oren-Nayar diffuse reflectance and Cook-Torrance specular reflectance or basic Blinn Phong reflectance model. A simple surface shading model is used on Feature Level 10 demo and tests while the complex model is used on Feature Level 11 demo and tests. Optionally atmospheric attenuation is also computed.
Horizon based screen space ambient occlusion can be applied to the surface illumination.
Point, spot and directional lights are supported. Spot and directional lights can be shadowed. For spot lights, shadow texture size is selected based on the size of the light volume in screen space. Shadow maps are sampled using best candidate sample distribution. Sample pattern is dithered with 4 × 4 pixel pattern.
The renderer supports volume illumination. It is computed by approximating the light scattered towards the viewer by the medium between eye and the visible surface on each lit pixel. The approximation is based on volume ray casting and the Rayleigh-Mie scattering and attenuation model.
One ray is cast on each lit pixel for each light. The cast ray is sampled at several depth levels. Sampling quality is improved by dithering sampling depths with a 4 × 4 pixel pattern. The achieved result is blurred to combine the different sampling depths on neighboring pixels before combining the volume illumination with the surface illumination.
When rendering illumination, there are two high dynamic range render targets. One is for surface illumination and the other for volume illumination.
Particle effects are rendered on top of opaque surface illumination with additive or alpha blending. Particles are simulated on the GPU. Particles can be either simply self-illuminated or receive illumination from scene lights.
Lights that participate in particle illumination can be individually selected. To illuminate particles, the selected lights are rendered to three volume textures that are fitted into view frustum. The textures contain incident radiance in each texel stored as spherical harmonics. Each of the three textures holds data for one color channel storing four coefficients. Incident radiance from each light is rendered to these volume textures as part of light rendering.
When rendering illuminated particles, hull and domain shaders are enabled. Incident radiance volume texture sampling is done in the domain shader. Tessellation factors are set to produce fixed-size triangles in screen pixels. Tessellation is used to avoid sampling incident radiance textures in the pixel shader.
Particles can cast shadows on opaque surfaces and on other particles. For generating particle shadows, particle transmittance is first rendered to a 3D texture. The transmittance texture is rendered from the shadow-casting light like a shadow map. After particles have been rendered to the texture, an accumulated transmittance 3D texture is generated by accumulating values of each depth slice in the transmittance texture. The accumulated transmittance texture can then be sampled when rendering illumination or incident radiance that is used to illuminate particles.
Particles can be used to generate a distortion effect. For particles that generate the effect, a distortion field is rendered to a texture using a 3D noise texture as input. This field is then used to distort the input image in the post-processing phase.
Depth of field
The effect is computed using the following procedure:
- Circle of confusion radius is computed for all screen pixels and stored in a full resolution texture.
- Half and quarter resolution versions are made from the radius texture and the original illumination texture.
- Positions of out-of-focus pixels whose circle of confusion radius exceeds a predefined threshold are appended to a buffer.
- The position buffer is used as point primitive vertex data and, using Geometry Shaders, the image of a hexagon-shaped bokeh is splatted to the positions of these vertices. Splatting is done to a texture that is divided into regions with different resolutions using multiple viewports. The first region is screen resolution and the rest are a series of halved regions down to 1x1 texel resolution. The screen space radius of the splatted bokeh determines the used resolution. The larger the radius the smaller the used splatting resolution.
- Steps 3 and 4 are performed separately for half and quarter resolution image data with different radius thresholds. Larger bokehs are generated from lower resolution image data.
- The different regions of the splatting texture are combined by up-scaling the data in the smaller resolution regions step by step to the screen resolution region.
- The out-of-focus illumination is combined with the original illumination.
The effect is computed by first applying a filter to the computed illumination in the frequency domain like in the bloom effect. The filtered result is then splatted in several scales and intensities on top of the input image using additive blending. The effect is computed in the same resolution as the bloom effect and therefore the forward FFT needs to be performed only once for both effects. As in the bloom effect, the forward and inverse FFTs are performed using the CS and 32bit floating point textures.
The effect is computed by transforming the computed illumination to the frequency domain using Fast Fourier Transform (FFT) and applying the bloom filter to the input in that domain. An inverse FFT is then applied to the filtered image. The forward FFT, applying the bloom filter and inverse FFT are done with the CS. The effect is computed in reduced resolution. The input image resolution is halved two or three times depending on settings and then rounded up to the nearest power of two. The FFTs are computed using 32bit floating point textures. A procedurally pre-computed texture is used as the bloom filter. The filter combines blur, streak, lenticular halo and anamorphic flare effects.
MSAA and FXAA anti-aliasing methods are supported.
In MSAA method G-buffer textures are multisampled with the chosen sample count. Edge mask is generated based on differences in G-buffer sample values. The mask is used in the illumination phase to select for which pixels illumination is evaluated for all G-buffer samples. For pixels that are not considered edge pixels, illumination is evaluated only for the first G-buffer sample. Volume illumination is always evaluated only for the first G-buffer sample due to its low-frequency nature.
FXAA is applied after tone mapping making it the final step in post-processing.
The implementation of the smoke simulation is based on Ronald Fedkiw's paper "Visual Simulation of Smoke" with the addition of viscous term as in Jos Stam's "Stable Fluids" but without a temperature simulation. Thus the smoke is simulated in a uniform grid where velocity is modeled with incompressible Euler equations. Advection is solved with a semi-Lagrangian method.
Vorticity confinement method is then applied to the velocity field to reinforce vortices. Diffusion and projection are then computed by the Jacobi iteration method. The simulation is done entirely with Compute Shaders. Cylinders that interact with the smoke are implicit objects which are voxelized into the velocity and density field in Compute Shaders.