Quantization is the process of reducing the precision of computations from float to integer data types, increasing performance for compatible hardware. This lowers memory requirements when performing inferences and reduces the size of the neural networks by up to 4 times the original size.
The benchmark includes quantized models for testing integer performance on the supported engines and hardware. There are two main methods of producing quantized models. The first method is called quantize-aware training which produces a model that is calibrated during the training process and is often better for model accuracy. The second method is post-training quantization which takes a pre-trained float model and converts it to the integer data type.
In the Procyon AI Computer Vison Benchmark, we use post-training quantization methods to produce integer models that are compatible with the inference engines included in the benchmark. Below you will find information on the methods used to quantize models for each inference engine supported by the AI Computer Vision Benchmark.
- Microsoft ® Windows ML - Documentation
- Qualcomm ® SNPE - Documentation
- Intel® OpenVINO™ - Documentation
- NVIDIA® TensorRT™ - Documentation
- Apple® Core ML™ - Documentation
The performance data produced by integer models in the benchmark represents the integer performance of each supported inference engine. However, the output quality can vary due to the difference in the implementation of the post-training quantization methods available for each inference engine. To see the difference in the output quality between the various engines and model precisions, please see the Computer Vision section of the UL Procyon AI quality metrics here.