Image classification

In AI image classification, the AI model inspects an image or video frame and classifies its contents. For the image classification test in the Procyon Computer Vision Benchmark, we use the ConvNeXt-Tiny (ImageNet-1K) AI model.

Uses for image classification include smart visual search functions such as searching, sorting and tagging images or videos in a content library, retail inventory management and even applications in product quality control or assisting medical diagnoses.

Image captioning

Image captioning refers to generating natural‑language descriptions of an image using an AI model that combines visual understanding with language generation. In the Procyon Computer Vision Benchmark, this task uses the BLIP (Base) model, where each caption is produced through one encoder pass followed by multiple decoder steps.

This workload mirrors several emerging Windows 11 scenarios, such as AI‑enhanced accessibility capabilities, smart content tagging in applications like photos and screenshot or visual summary features in productivity tools.

Video Pipeline

Video object detection

Object detection states what is in the image and where each object is, for example, the width, height and class label. It is a key tool for differentiation, and anything that needs identification and localization uses object detection.

Models used here are Base DETR with ResNet50 backbone. The increased reliance on visual understanding to speed up tasks in our everyday PC usage makes this a key insight into the daily impact of AI within the office.

Video segmentation

Video or image segmentation is the technique of identifying and partitioning regions of an image or video frame into regions.

This test uses the SAM2 small variant AI model by Meta. AI image or video segmentation is used for tasks such as blurring the background of a video or applying masks to objects.

Video upscaling

AI-enhanced upscaling takes an initial video or image and improves its fidelity by using AI to work out and re-add missing information. The video upscaling section of this benchmark uses the Real-ERSGAN AI model.

This AI use case can be used to improve low-quality images or video feeds, or reduce the bandwidth needed for a clear picture, such as for a video call made from a place with poor signal.

Did you find it helpful? Yes No

Benchmarks

UL Procyon

Getting started

Procyon Essentials

Office Productivity Benchmark

AI Computer Vision Benchmark 2.0

AI Text Generation Benchmark

AI Image Generation Benchmark

Battery Life Benchmark

Photo Editing Benchmark

Video Editing Benchmark

One-Hour Battery Consumption Benchmark

AI Computer Vision Benchmark

Procyon Labs

Command line guide

Procyon AI Computer Vision Benchmark v2.0 benchmark tests

Image classification

Image captioning

Video Pipeline

Video object detection

Video segmentation

Video upscaling

UL Procyon

Benchmarks

Services

Support

Compare

UL Benchmarks

About UL