The UL Procyon® AI Computer Vision Benchmark measures the machine learning inference performance of Windows devices using common machine-vision tasks such as image classification, image segmentation, object detection, and super-resolution. These tasks are executed using a range of popular, state-of-the-art neural networks and can run on the device’s CPU, GPU or a dedicated AI accelerator for comparing hardware performance differences.

MobileNet V3

MobileNet V3 is a compact visual recognition model that was created specifically for mobile devices. The benchmark uses MobileNet V3 to identify the subject of an image, taking an image as the input and outputting a list of probabilities for the content in the image. The benchmark uses the large minimalistic variant of MobileNet V3.

Input size: 1x224x224x3

Output size: 1x1001

Inception V4

Inception V4 is a state-of-the-art model for image classification tasks. Designed for accuracy, it is a much wider and deeper model than MobileNet. The benchmark uses Inception V4 to identify the subject of an image, taking an image as the input and outputting a list of probabilities for the content identified in the image.

Input size: 1x299x299x3

Output size: 1x1001

YOLO V3

YOLO, which stands for You Only Look Once, is an object detection model that aims to identify the location of objects in an image. The benchmark uses YOLO V3 to produce bounding boxes around objects with probabilities on the confidence of each detection.

Input size: 1x416x416x3

Output size: 1x13x13x255, 1x26x26x255, 1x52x52x255

DeepLab V3

DeepLab is an image segmentation model that aims to cluster the pixels of an image that belong to the same object class. Semantic image segmentation labels each region of the image with a class of object. The benchmark uses MobileNet V2 for feature extraction enabling fast inference with little difference in quality compared with larger models.

Input size: 1x513x513x3

Output size: 1x513x513x21

ResNet50

ResNet 50 is an image classification model that provides a novel way of adding more convolutional layers with the use of residual blocks. Its release enabled the training of deep neural networks previously not possible. The benchmark uses ResNet 50 to identify image subjects, outputting a list of probabilities for the content identified in the image.

Input size: 1x3x224x224

Output size: 1x1000

Real-ESRGAN

Real-ESRGAN is a super-resolution model trained on synthetic data for increasing the resolution of an image, reconstructing a higher resolution image from a lower resolution counterpart. The model used in the benchmark is the general image variant of Real-ESRGAN, and upscales a 250x250 pixels image to an 1000x1000 image.

Input size: 1x3x250x250

Output size: 1x3x1000x1000