The Final Benchmark result is calculated as such: 


Where the constant is used to scale the result to an expected range for current average hardware: 


AverageInferences is the Geometric Mean of the results of each model step. 

ConvNeXt is the number of images per second in the ConvNext Image classification step

DETR is the inferences per second of in the Object detection step

Ersgan is the inferences per second in the Video Super-resolution step


CombinedBLIP is calculated as:

  • BLIPDecoder is the inferences per second for the decoding component of the image captioning step
  • BLIPEncoder the inferences per second for the encoding component of the image captioning step


CombinedSAM2 can be calculated as: 

  • Sam2Decoder is the inferences per second of the decoding component of the image segmentation step 
  • Sam2Encoder is the inferences per second of the encoding component of the image segmentation step.