The overall UL Procyon AI Inference Benchmark score is the sum of the Performance and Quality scores for the float and integer variants of the AI models when using NNAPI. The higher the score, the better the performance.
UL Procyon AI Inference Benchmark score = Integer performance score + Float performance score + Integer quality score + Float quality score
Performance scores
Performance scores are calculated from the average inference time for each AI model. The higher the score, the better the performance.
Float performance score
The float performance score represents the combined performance of the Float variants of the AI models.
Float performance score = K * (1 / GEOMEAN(average inference times of float models)) Where K = 200,000
K is a scaling constant used to bring the score in line with the traditional range for UL benchmarks.
Integer performance score
The integer performance score represents the combined performance of Integer variants of the AI models.
Integer performance score = K * (1 / GEOMEAN(Average inference time of integer models)), Where K = 47,000
K is a scaling constant used to bring the score in line with the traditional range for UL benchmarks.
Quality scores
Quality scores are calculated by comparing the model outputs with pre-calculated test outputs. The higher the score, the better the quality.
The method for evaluating the quality of model outputs depends on the AI task as described below. After evaluating the output from each task, we use the following formula to calculate the Float and Integer quality scores.
Quality score = K * GEOMEAN( AVG of Model Quality results) Where K = 450
K is a scaling constant used to bring the score in line with the traditional range for UL benchmarks.
Image classification tasks
The quality of the outputs is calculated based on the Top1 accuracy score of the test set. Top1 accuracy means that the model answer with the highest probability should match the ground-truth answer.
Image segmentation tasks
The quality of the output image is calculated by taking the Euclidean distance between the output image and the ground-truth image and then normalizing the result based on the size of the image. The formula below is used to calculate the quality of a single output image.
Quality = 1 / (sum of the Euclidean distance of all pixels / ( 513 * 513 * 3))
Object detection tasks
The quality of output boundaries and recognition is calculated using the F1 precision score. There is a threshold of 50% which a boundary must fulfil when we calculate the Intersection over union (IOU) with the ground-truth boundaries for that input image.
The IOUs are used to identify whether a detection is a True positive, True negative, False positive or False negative. These enable us to calculate the precision, recall and finally the F1 accuracy of the model. The formula for calculating the F1 precision score for an input is shown below.
F1 precision score = 2 * ((Recall * Precision) / ( Recall + Precision))