In some regions, UL Procyon cannot automatically download the required AI models. In these cases, users will have to manually download the models.
Pytorch Models
Stable Diffusion 1.5
| HFID | nmkd/stable-diffusion-v1-5 |
| Link | https://huggingface.co/nmkd/stable-diffusion-1.5-fp16/tree/main |
| Revision | b80ddddd72f4bafc3d0832f32e2d5ea3212f0d59 |
| Note | Used for TensorRT, ONNXRuntime-DirectML Olive-Optimized, OpenVINO and CoreML. Conversion is run locally. |
Stable Diffusion XL
| HFID | stabilityai/stable-diffusion-xl-base-1.0 |
| Link | https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main |
| Variant | Pytorch fp16 (safetensors) |
| Revision | 76d28af79639c28a79fa5c6c6468febd3490a37e |
| Note | Used for TensorRT, OpenVINO and Core ML. Conversion is run locally. Only the .safetensors variants of the models are needed. |
| HFID | madebyollin/sdxl-vae-fp16-fix |
| Link | https://huggingface.co/madebyollin/sdxl-vae-fp16-fix |
| Variant | fp16 (safetensors) |
| Revision | 207b116dae70ace3637169f1ddd2434b91b3a8cd |
| Note | Used for TensorRT, Olive Optimized model for ONNX Runtime with DirectML, OpenVINO and Core ML. Conversion is run locally. |
Converted Olive-optimized ONNX models
Stable Diffusion XL
| HFID | greentree/SDXL-olive-optimized |
| Link | https://huggingface.co/greentree/SDXL-olive-optimized/tree/main |
| Revision | 2411094a7a9fff6ae91f51157e8b60d3c5f19895 |
| Note | Used for ONNX Runtime with DirectML. No conversion is run. |
Converted AMD-optimized ONNX models
Stable Diffusion 1.5
| HFID | amd/stable-diffusion-1.5_io16_amdgpu |
| Link | https://huggingface.co/amd/stable-diffusion-1.5_io16_amdgpu |
| Revision | 4f74843d677e2bd705ca4a05a102c5a0cbe63015 |
| Use | Used for ONNX Runtime with DirectML. No conversion is run. |
Stable Diffusion XL
| HFID | amd/stable-diffusion-xl-1.0_io16_amdgpu |
| Link | https://huggingface.co/amd/stable-diffusion-xl-1.0_io16_amdgpu |
| Revision | 9ba9fa7e800a38164328606ae193535b2e8df65f |
| Use | Used for ONNX Runtime with DirectML. No conversion is run. |
Quantized OpenVINO IR models
Stable Diffusion 1.5
| HFID | intel/sd-1.5-square-quantized |
| Link | https://huggingface.co/Intel/sd-1.5-square-quantized/tree/main/INT8 |
| Variant | int8a16 Quantized OVIR |
| Revision | 95260894b20743af5a86255c93bcf3a81febb1df |
| Use | Used for OpenVINO Runtime with w8a16 precision. No conversion is run for these models. Requires the full SD15 fp16 pytorch models for converting the Text Encoder and VAE. |
| Files | INT8/unet_int8a16.bin INT8/unet_int8a16.xml |
Quantized RyzenAI ONNX models
Stable Diffusion 1.5
| HFID | amd/stable-diffusion-1.5-amdnpu |
| Link | https://huggingface.co/amd/stable-diffusion-1.5-amdnpu/tree/main |
| Revision | 7133b64502cab4c217cacdb452d2bbed18c0a166 |
| Use | Used for ONNX Runtime RyzenAI NPU execution. No conversion is run for these models. |
| Files | scheduler/scheduler_config.json text_encoder/model.onnx tokenizer/merges.txt tokenizer/special_tokens_map.json tokenizer/tokenizer_config.json tokenizer/vocab.json unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.ctrlpkt unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.fconst unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.state unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.super unet/model_NHWC.onnx unet_w8a16/.cache unet/.cache/NhwcConv_0-conv_inConv_meta.json unet/.cache/ops-config.json vae_decoder/dd_metastate_Sd15_Decoder_NhwcConv_0-post_quant_convConv.ctrlpkt vae_decoder/dd_metastate_Sd15_Decoder_NhwcConv_0-post_quant_convConv.fconst vae_decoder/dd_metastate_Sd15_Decoder_NhwcConv_0-post_quant_convConv.state vae_decoder/dd_metastate_Sd15_Decoder_NhwcConv_0-post_quant_convConv.super vae_decoder/model_NHWC.onnx vae_decoder/.cache/NhwcConv_0-post_quant_convConv_meta.json |
Quantized Qualcomm models
Stable Diffusion 1.5 Quantized
| HFID | qualcomm/Stable-Diffusion-v1.5_aihub |
| Link | Snapdragon X Snapdragon X2 |
| MD5 | V73/unet 46baa043270f73fefabf00bc3c9b7661 V73/vae 9b2c32794208e5b829d17e2c0f04a942 V73/text_encoder b8e6ada03d3350a33f774ad27b01fe50 V81/unet 5c57b511cff600d280894f96e5a6824e V81/vae ba6f91e3bd17c02decdc4a103f0551c8 V81/text_encoder d3347ae78f124723ff503cee5a73462d |
| Note | Used for QNN Runtime with w8a16 precision. No conversion is run. Requires the tokenizer and scheduler of the original SD15 fp16 pytorch model to be placed on disk as well. |
| Files | v73/text_encoder/text_encoder.bin v73/unet/unet.bin v73/vae_decoder/vae.bin v81/text_encoder/text_encoder.bin v81/unet/unet.bin v81/vae_decoder/vae.bin |
Installing the models
For Windows
By default, the benchmark is installed in
%ProgramData%\UL\Procyon\chops\dlc\ai-imagegeneration-benchmark\
- If it does not exist, create a subfolder named ‘models’ in this directory:
%ProgramData%\UL\Procyon\chops\dlc\ai-imagegeneration-benchmark\models
- In this ‘models’ folder, create the following subfolders based on the tests you are looking to run:
- For non-converted Pytorch models:
Create a subfolder 'pytorch' and place each full Pytorch model in it with the model's HF ID in the folder structure; E.g....\ai-imagegeneration-benchmark\models\pytorch\nmkd\stable-diffusion-1.5-fp16\<each subfolder of the model>
Please note:
The first run of benchmarks using these models can take significantly longer, as the models need to be converted. - For converted Olive Optimized ONNX models for ONNX Runtime with DirectML:
Create a subfolder ‘onnx_olive_optimized’ and place each full model in it with the model’s HF ID in the folder structure; E.g....\ai-imagegeneration-benchmark\models\onnx_olive_optimized\nmkd\stable-diffusion-1.5-fp16\<each subfolder of the model>
- For converted AMD Optimized ONNX models for ONNX Runtime with DirectML:
Create a subfolder ‘onnx_amd_optimized’ and place each full model in it with the model’s HF ID in the folder structure; E.g....\ai-imagegeneration-benchmark\models\onnx_amd_optimized\nmkd\stable-diffusion-1.5-fp16\<each subfolder of the model>
- For quantized ONNX RyzenAI models for ONNX Runtime with RyzenAI:
Create a subfolder ‘onnx_amd_optimized’ and place each full model in it with the model’s HF ID in the folder structure; E.g....\ai-imagegeneration-benchmark\models\onnx_amd_optimized\amd\stable-diffusion-1.5-amdnpu\<each subfolder of the model>Note that unet and vae_decoder have _w8a16 suffix in the directory name.
- For quantized OVIR models for OpenVINO Runtime:
Create a directory ‘ovir\<HF ID>\unet_w8a16’ and place each part of the w8a16 model in it:...\ai-imagegeneration-benchmark\models\ovir\intel\sd-1.5-square-quantized\unet_w8a16\<each required unet part>
- For quantized QNN models for QNN Runtime:
Create a directory ‘qnn\<HF ID>\unet’ and place each model in it:...\ai-imagegeneration-benchmark\models\qnn\qualcomm\Stable-Diffusion-v1.5_aihub\<architecture>\<submodel>\<submodel>.bin keeping the original name of the files: ...\<v73 or v81>\text_encoder\text_encoder.bin ...\<v73 or v81>\unet\unet.bin ...\<v73 or v81>\vae_decoder\vae.bin Follow the instructions in step (2.1) for the required pytorch model files
- For non-converted Pytorch models:
For macOS
The location of benchmark models are installed in two different directories depending on whether the AI Image Generation Benchmark is being run as a root user or not.
- When using the .pkg installed version of Procyon Image Generation as a non-root user (default), the benchmark models are installed into the following directory:
/Users/Shared/Library/UL/Procyon/mac-ai-imagegeneration-benchmark/models - When run as root, the models are instead installed into the AI Image Generation Benchmark for macOS installation directory:
/Library/UL/Procyon/AIImageGeneration/chops/dlc/mac-ai-imagegeneration-benchmark/models - When extracting the models from a .zip package, the models are installed into the extracted zip:
<path-to-extracted-zip>/AIImageGeneration/chops/dlc/mac-ai-imagegeneration-benchmark/models
Note:
Not all models for all engines are required to always be present in the installation directory.
- For OpenVINO, only the OVIR models must exist.
- For ONNX Runtime-DirectML, only the Olive-optimized ONNX models must exist.
- For TensorRT, only the Engine created for the current settings (batch size, resolution) and hardware must exist. The Engine is generated from the CUDA-optimized ONNX models in case changes are made.