This test models use cases of video conferencing applications. The test uses two scenarios: a private call and a group call.

Implementation

The Video Conferencing test uses Windows Media Foundation for video playback and encoding. Face detection is implemented using library OpenCV (http://opencv.org). 

The Video Conferencing test supports OpenCL. The benchmark application selects a preferred OpenCL device to use.

Face detection is made by using cascade classifier haarcascade_frontalface_alt.xml. 

Parameters for one-to-one video conferencing: scale factor 1.1, min neighbors 10, min size 110x110 and max size 300x300.

Parameters for group video conferencing: scale factor 1.05, min neighbors 5, min size 110x110 and max size 300x300.

Part 1: one-to-one video conferencing with basic quality video

  • Encode: 720p, 30 FPS, H.264 video, bitrate 14380 kb/s 
  • Playback: 720p, 30 FPS, H.264 video, bitrate 11773 kb/s
  • Two video streams (a local and a remote one)
  • Both streams are displayed on screen downscaled to a fixed resolution window.
  • Face detection performed on the local stream
  • Stage 1 - CPU:
    • Code path: x86/x64
    • Runtime: 10s
  • Stage 1 - OpenCL:
    • Condition to run: a suitable OpenCL device must be available
    • Code path: OpenCL
    • Runtime: 10s

Part 2: group video conferencing with high quality outgoing video

  • Encode: 1080p, 30 FPS, H.264 video, bitrate 12731 kb/s
  • Playbacks: 720p, 30 FPS, H.264 video, bitrate 10152 - 12251 kb/s
  • Four streams (a local and three remote ones)
  • All streams are displayed on screen downscaled to a fixed resolution window.
  • Face detection performed on the local stream
  • Stage 2 - CPU:
    • Code path: x86/x64
    • Runtime: 10s
  • Stage 2 - OpenCL:
    • Condition to run: a suitable OpenCL device must be available
    • Code path: OpenCL
    • Runtime: 10s

Workloads

In both the private and group call scenarios, the sent video stream is processed in following manner:

  • Caller face location is detected in periodic intervals
  • The perceived quality of each frame is improved based on the face location information by blurring the background.

Private call scenario

In the private call scenario, the test runs a 1-to-1 call at a resolution of 1280 × 720 at 30 FPS. The workload measures the frame rate of the video call.

Playback private CPU    =    M_1

Where: 
M_1    =    dbg_pcm10_chat_play_private_average_frame_rate_cpu
Playback private OCL    =    M_2

Where: 
M_2    =    dbg_pcm10_chat_play_private_average_frame_rate_ocl
Encode private OCL    =    M_3 / M_4

Where: 
M_3    =    dbg_pcm10_chat_play_private_average_frame_rate_ocl
M_4    =    dbg_pcm10_chat_encode_private_elapsed_ocl
Face detect private CPU    =    1000 / M_5

Where: 
M_5    =    dbg_pcm10_chat_encode_private_facedetect_average_time_per_frame_cpu
Face detect private OCL    =    1000 / M_6

Where: 
M_6    =    dbg_pcm10_chat_encode_private_facedetect_average_time_per_frame_ocl

Group call scenario

In the group call scenario, the call has four participants and the video resolution is 1920 × 1080 at 30 FPS. The workload measures the frame rate of the video call.

Playback group CPU    =    geomean(M_7,M_8,M_9)

Where: 
M_7    =    dbg_pcm10_chat_play_private_average_frame_rate_cpu_p1
M_8    =    dbg_pcm10_chat_play_private_average_frame_rate_cpu_p2
M_9    =    dbg_pcm10_chat_play_private_average_frame_rate_cpu_p3
Playback group OCL    =    geomean(M_10,M_11,M_12)

Where: 
M_10    =    dbg_pcm10_chat_play_group_average_frame_rate_cpu_p1
M_11    =    dbg_pcm10_chat_play_group_average_frame_rate_cpu_p2
M_12    =    dbg_pcm10_chat_play_group_average_frame_rate_cpu_p3
Encode group OCL    =    M_13 / M_14

Where: 
M_13    =    dbg_pcm10_chat_encode_group_sink_frames_ocl
M_14    =    dbg_pcm10_chat_encode_group_elapsed_ocl
Face detect group CPU    =    1000 / M_15

Where: 
M_15    =    dbg_pcm10_chat_encode_group_facedetect_average_time_per_frame_cpu
Face detect group OCL    =    1000 / M_16

Where: 
M_16    =    dbg_pcm10_chat_encode_group_facedetect_average_time_per_frame_ocl

Video Conferencing score

We use a weighted harmonic mean to calculate the Video Conferencing score from the workload scores.

Video Conferencing score    =    K * geomean(R_1,R_2*,R_3 )

* The geometric mean weight of R_2 is 2.

Where: 
K      =    scoring coefficient        =    275
R_1    =    Overall playback rate      =    geomean(A_1,A_2,A_3,A_4)
R_2    =    Overall encode rate        =    geomean(A_5,A_6)
R_3    =    Overall face detect rate   =    geomean(A_7,A_8,A_9,A_10)

Where:

ResultDefinitionUnitTypical range
A_1Playback private CPU
FPS30
A_2Playback private OCL
FPS30
A_3
Playback group CPU
FPS30
A_4Playback group OCL
FPS30
A_5Encode private OCL
FPS22-30
A_6Encode group OCL
FPS15-23
A_7Face detect private CPU
FPS30-71
A_8Face detect private OCL
FPS51-97
A_9Face detect group CPU
FPS5.6-13
A_10Face detect group OCL
FPS11-31