UL Benchmarks How are the benchmark scores calculated?

The overall UL Procyon AI Inference Benchmark score is the sum of the Performance and Quality scores for the float and integer variants of the AI models when using NNAPI. The higher the score, the better the performance.

UL Procyon AI Inference Benchmark score = Integer performance score + Float performance score + Integer quality score + Float quality score

Performance scores

Performance scores are calculated from the average inference time for each AI model. The higher the score, the better the performance.

Float performance score

The float performance score represents the combined performance of the Float variants of the AI models.

Float performance score = K * (1 / GEOMEAN(average inference times of float models)) 
Where K = 200,000

K is a scaling constant used to bring the score in line with the traditional range for UL benchmarks.

Integer performance score

The integer performance score represents the combined performance of Integer variants of the AI models.

Integer performance score = K * (1 / GEOMEAN(Average inference time of integer models)), 
Where K = 47,000

K is a scaling constant used to bring the score in line with the traditional range for UL benchmarks.

Quality scores

Quality scores are calculated by comparing the model outputs with pre-calculated test outputs. The higher the score, the better the quality.

The method for evaluating the quality of model outputs depends on the AI task as described below. After evaluating the output from each task, we use the following formula to calculate the Float and Integer quality scores.

Quality score = K * GEOMEAN( AVG of Model Quality results)
Where K = 450

K is a scaling constant used to bring the score in line with the traditional range for UL benchmarks.

Image classification tasks

The quality of the outputs is calculated based on the Top1 accuracy score of the test set. Top1 accuracy means that the model answer with the highest probability should match the ground-truth answer.

Image segmentation tasks

The quality of the output image is calculated by taking the Euclidean distance between the output image and the ground-truth image and then normalizing the result based on the size of the image. The formula below is used to calculate the quality of a single output image.

Quality = 1 / (sum of the Euclidean distance of all pixels / ( 513 * 513 * 3))

Object detection tasks

The quality of output boundaries and recognition is calculated using the F1 precision score. There is a threshold of 50% which a boundary must fulfil when we calculate the Intersection over union (IOU) with the ground-truth boundaries for that input image.

The IOUs are used to identify whether a detection is a True positive, True negative, False positive or False negative. These enable us to calculate the precision, recall and finally the F1 accuracy of the model. The formula for calculating the F1 precision score for an input is shown below.

F1 precision score = 2 * ((Recall * Precision) / ( Recall + Precision))

Benchmarks

AI Inference for Android

Getting started

Home screen

Benchmark results

Benchmark tests

Device screen

How are the benchmark scores calculated?

Performance scores

Float performance score

Integer performance score

Quality scores

Image classification tasks

Image segmentation tasks

Object detection tasks

UL Procyon

Benchmarks

Services

Support

Compare

UL Benchmarks

About UL