This test models use cases of video conferencing applications. The test uses two scenarios: a private call and a group call.
Implementation
The Video Conferencing test uses Windows Media Foundation for video playback and encoding. Face detection is implemented using library OpenCV (http://opencv.org).
The Video Conferencing test supports OpenCL. The benchmark application selects a preferred OpenCL device to use.
Face detection is made by using cascade classifier haarcascade_frontalface_alt.xml.
Parameters for one-to-one video conferencing: scale factor 1.1, min neighbors 10, min size 110x110 and max size 300x300.
Parameters for group video conferencing: scale factor 1.05, min neighbors 5, min size 110x110 and max size 300x300.
Part 1: one-to-one video conferencing with basic quality video
- Encode: 720p, 30 FPS, H.264 video, bitrate 14380 kb/s
- Playback: 720p, 30 FPS, H.264 video, bitrate 11773 kb/s
- Two video streams (a local and a remote one)
- Both streams are displayed on screen downscaled to a fixed resolution window.
- Face detection performed on the local stream
- Stage 1 - CPU:
- Code path: x86/x64
- Runtime: 10s
- Stage 1 - OpenCL:
- Condition to run: a suitable OpenCL device must be available
- Code path: OpenCL
- Runtime: 10s
Part 2: group video conferencing with high quality outgoing video
- Encode: 1080p, 30 FPS, H.264 video, bitrate 12731 kb/s
- Playbacks: 720p, 30 FPS, H.264 video, bitrate 10152 - 12251 kb/s
- Four streams (a local and three remote ones)
- All streams are displayed on screen downscaled to a fixed resolution window.
- Face detection performed on the local stream
- Stage 2 - CPU:
- Code path: x86/x64
- Runtime: 10s
- Stage 2 - OpenCL:
- Condition to run: a suitable OpenCL device must be available
- Code path: OpenCL
- Runtime: 10s
Workloads
In both the private and group call scenarios, the sent video stream is processed in following manner:
- Caller face location is detected in periodic intervals
- The perceived quality of each frame is improved based on the face location information by blurring the background.
Private call scenario
In the private call scenario, the test runs a 1-to-1 call at a resolution of 1280 × 720 at 30 FPS. The workload measures the frame rate of the video call.
Playback private CPU = M_1 Where: M_1 = dbg_pcm10_chat_play_private_average_frame_rate_cpu
Playback private OCL = M_2 Where: M_2 = dbg_pcm10_chat_play_private_average_frame_rate_ocl
Encode private OCL = M_3 / M_4 Where: M_3 = dbg_pcm10_chat_play_private_average_frame_rate_ocl M_4 = dbg_pcm10_chat_encode_private_elapsed_ocl
Face detect private CPU = 1000 / M_5 Where: M_5 = dbg_pcm10_chat_encode_private_facedetect_average_time_per_frame_cpu
Face detect private OCL = 1000 / M_6 Where: M_6 = dbg_pcm10_chat_encode_private_facedetect_average_time_per_frame_ocl
Group call scenario
In the group call scenario, the call has four participants and the video resolution is 1920 × 1080 at 30 FPS. The workload measures the frame rate of the video call.
Playback group CPU = geomean(M_7,M_8,M_9) Where: M_7 = dbg_pcm10_chat_play_private_average_frame_rate_cpu_p1 M_8 = dbg_pcm10_chat_play_private_average_frame_rate_cpu_p2 M_9 = dbg_pcm10_chat_play_private_average_frame_rate_cpu_p3
Playback group OCL = geomean(M_10,M_11,M_12) Where: M_10 = dbg_pcm10_chat_play_group_average_frame_rate_cpu_p1 M_11 = dbg_pcm10_chat_play_group_average_frame_rate_cpu_p2 M_12 = dbg_pcm10_chat_play_group_average_frame_rate_cpu_p3
Encode group OCL = M_13 / M_14 Where: M_13 = dbg_pcm10_chat_encode_group_sink_frames_ocl M_14 = dbg_pcm10_chat_encode_group_elapsed_ocl
Face detect group CPU = 1000 / M_15 Where: M_15 = dbg_pcm10_chat_encode_group_facedetect_average_time_per_frame_cpu
Face detect group OCL = 1000 / M_16 Where: M_16 = dbg_pcm10_chat_encode_group_facedetect_average_time_per_frame_ocl
Video Conferencing score
We use a weighted harmonic mean to calculate the Video Conferencing score from the workload scores.
Video Conferencing score = K * geomean(R_1,R_2*,R_3 ) * The geometric mean weight of R_2 is 2. Where: K = scoring coefficient = 275 R_1 = Overall playback rate = geomean(A_1,A_2,A_3,A_4) R_2 = Overall encode rate = geomean(A_5,A_6) R_3 = Overall face detect rate = geomean(A_7,A_8,A_9,A_10)
Where:
Result | Definition | Unit | Typical range |
---|---|---|---|
A_1 | Playback private CPU | FPS | 30 |
A_2 | Playback private OCL | FPS | 30 |
A_3 | Playback group CPU | FPS | 30 |
A_4 | Playback group OCL | FPS | 30 |
A_5 | Encode private OCL | FPS | 22-30 |
A_6 | Encode group OCL | FPS | 15-23 |
A_7 | Face detect private CPU | FPS | 30-71 |
A_8 | Face detect private OCL | FPS | 51-97 |
A_9 | Face detect group CPU | FPS | 5.6-13 |
A_10 | Face detect group OCL | FPS | 11-31 |