The Sling Shot engine uses OpenGL ES 3.0 on Android and iOS devices.
The engine utilizes one thread per available CPU core. One of the threads is considered as the main thread, which makes the graphics API calls. The other threads are worker threads, which do not make API calls.
The rendering workload is distributed between the threads by distributing items (e.g. geometries and lights) in the rendered scene to the threads. Each thread is assigned a roughly equal amount of scene items. When rendering a frame, each thread does the work associated with items assigned to the thread. That includes, for example, computation of transformation matrix hierarchies and computation of shader parameters (constants buffer contents and dynamic vertex data). When the main thread is finished with the tasks associated with its own items, it executes API calls for items assigned to the worker threads.
Lighting is done in a deferred style. Geometry attributes are first rendered to a set of render targets. Finally, illumination is rendered based on those attributes.
The g-buffer is composed of two 32 bits per pixel textures and a depth texture. Surface illumination model the basic Blinn Phong reflectance model.
Point, spot and directional lights are supported. Spot and directional lights can be shadowed. For spot lights, shadow texture size is selected based on the size of the light volume in screen space. Shadow maps are sampled using best candidate sample distribution. The sample pattern is dithered with a 4×4 pixel pattern.
The renderer supports volume illumination. It is computed by approximating the light scattered towards the viewer by the medium between the eye and the visible surface on each lit pixel. The approximation is based on volume ray casting and a simple scattering and attenuation model.
One ray is cast on each lit pixel for each light. The cast ray is sampled at several depth levels. Sampling quality is improved by dithering sampling depths with a 4×4 pixel pattern. The achieved result is blurred to combine the different sampling depths on neighboring pixels before combining the volume illumination with the surface illumination.
When rendering illumination, there are two high dynamic range render targets. One is for surface illumination and the other for volume illumination.
Particle effects are rendered on top of opaque surface illumination with additive or alpha blending. Particles are simulated on the GPU utilizing transform feedback. Particles are simply self-illuminated.
Depth of field
The effect is computed by filtering rendered illumination in half resolution with three separable skewed box filters that form hexagonal bokeh pattern when combined.
The filtering is performed in two passes that exploit similarities in the three filters to avoid duplicate work.
The first pass renders to two render targets and the second pass the one target combining results of the three filters. Before filtering, a circle of confusion radius is evaluated for each pixel and the illumination is premultiplied with the radius.
After filtering, illumination is reconstructed by dividing the result with the radius. This makes the filter gather out of focus illumination and prevents it from bleeding in focus illumination to neighbor pixels.
The effect is computed by transforming the computed illumination to the frequency domain using Fast Fourier Transform (FFT) and applying a bloom filter to the input in that domain. An inverse FFT is then applied to the filtered image.
The forward FFT, applying the bloom filter, and inverse FFT are done using the fragment shader. The FFT is performed with Cooley-Tukey algorithm as a series of render passes.
The effect is computed in 256 × 256 resolution in both Sling Shot and Sling Shot Extreme. In Sling Shot, the FFTs are computed using 16-bit floating-point textures. In Sling Shot Extreme, the FFTs are computed using 32-bit floating-point textures. A procedurally pre-computed texture is used as the bloom filter. The filter combines blur, streak, lenticular halo and anamorphic flare effects.
With Sling Shot Extreme, Compute Shaders are used for the FFT and bloom. A total of 256 invocations within a work group is required.