The AI Text Generation Benchmark supports most hardware vendors and two inference engines at launch. 

ONNX runtime with DirectML 

  • Compatible with supported ONNX hardware that meets benchmark minimum requirements from AMD, NVIDIA and Intel. 

Intel OpenVINO

  • Compatible with only Intel Arc hardware.


Used AI Models

The benchmark consists of 4 different AI models, ranging in size and system requirements.  As models increase in parameter size, system requirements increase.

Parameters show the size of the trained model, which correlates to the system RAM or VRAM requirements. 


Phi-3.5-mini-instruct

Parameters

3.8 Billion

Workload

Light

Developer

Microsoft

Target Hardware for test

Lightweight accelerators (e.g. iGPUs)

Description

Phi-3.5-mini is designed for memory constrained environments and is the lightest workload in the benchmark. It’s a good representation of an AI model that may be used when other tasks are the current focus of the user. 

Official site

https://azure.microsoft.com/en-us/products/phi/#Use-cases 


Mistral-7B-Instruct

Parameters

7.3 Billion

Workload

Medium

Developer

Mistral AI

Target Hardware for test

Integrated to discrete AI accelerators
 (e.g. iGPUs & desktop GPUs)

Description

This is a very widely used AI inference engine, deigned to be easily customizable to a variety of tasks. 

Official site

https://mistral.ai/news/announcing-mistral-7b/ 


Llama-3.1-8B-Instruct

Parameters

8 Billion

Workload

Medium

Developer

Meta

Target Hardware for test

Integrated to discrete AI accelerators
 (e.g. iGPUs & desktop GPUs)

Description

This is a newer and more compact version of Llama, with lower system requirements and supporting a wider range of GPUs.

Official site

https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1 


Llama-2-13B-chat-hf 

Parameters

13 Billion

Workload

Heavy

Developer

Meta

Target Hardware for test

Powerful discrete AI accelerators
 (e.g. high-end desktop GPUs)

Description

This is the most demanding AI model in the benchmark, and at launch only runs on high-end discrete GPUs. 

 This generally exceeds the heaviest of text generation tasks a typical user would encounter. 

Official site

https://www.llama.com/docs/model-cards-and-prompt-formats/other-models#meta-llama-2