Institute for Telecommunication Sciences / Resources / Video Quality Research / Guides and Tutorials /
Subject Screening Overview
Subject screening algorithms detect and discard subjects.
MATALB® code implementing several algorithms
is available below.
To Screen Subjects...
A subject's scores can be erroneous for a multitude of
reasons, including failures on the part of the subject, the
experimenter, the rating method, the video playback system, or the
rating recording system. Many experimenters err on the
side of discarding valid subjects. The goal is eliminate all
subjects whose data might be invalid. The motivation is to only
retain subjects who are able to detect just noticeable differences
and can rate video sequences consistently.
Currently, all automated subject screening algorithms apply this
philosphy. They rely upon thresholds, which are essentially
educated guesses. These thresholds may need to be adjusted for new
types of experiments.
...Or Not To Screen Subjects
Discarding subjects makes it appear that you are doctoring your
data to fit your hypothesis. Many psychologists believe it is
inappropriate to discard any subject, unless the subject clearly
misunderstood the rating task or the task was too
difficult. Examples of subjects who misunderstand are
someone who rated whether or not they liked the video content when
asked to rate the video quality, or someone who applied the
scale in reverse (e.g., marks "excellent" when meaning "bad" and
vice versa). An example of a subject who finds the task too
difficult is someone who rates all sequences identically (e.g.,
The philosophy is that if we cannot explain why that
subject scored differently, then we must assume that the
differences are genuine and need to be included in the data
analysis. To apply this philosophy, either keep all subjects
or examine the data manually for obvious problems. Each
discarded subject must be justified in the experiment report.
And a Compromise
Alternatively, analyze the data twice: once with subject
screening and once without. The experiment report may emphasize the
analysis performed on the screened data, yet
also mentions any opposing conclusions reached by the
non-screened data analysis. This compromise approach is advisable
when subject screening eliminates a large fraction of subjects.
A large fraction of discarded subjects will cause some
researchers to doubt the validity of an experiment.
The experiment report should mention exactly why each subject
MATLAB Code for Popular Subject Screening Algorithms
Following is a list of popular subject
screening techniques. MATLAB code for the automated techniques
is available here. This code may be used for
any purpose, commercial or non-commercial. Please contact me if you find
any bugs or errors in this code.
ITU-R Rec. BT.500 Annex 2 Clause 2.3.1
ITU-R Rec. BT.500 Annex 2 Clause 2.3.1 recommends a technique
for screening Double
Stimulus Impairment Scale (DSIS) and Double Stimulus
Continuous Quality Scale (DSCQS) data. However, this
technique has been applied to tests conducted with other methods.
This technique discards subjects whose ratings
disagree frequently with other subjects. This technique
can only be used with scores that have a normal
distribution. Note that because the BT.500 technique analyzes
agreement, it might discard a subjects who scores consistently
higher or lower than other subjects, despite agreeing on the
ranking of sequences by quality.
ITU-R Rec. BT.500 Annex 2 Clause 2.3.2 recommends a technique
for Single Stimulus Continuous Quality Evaluation
(SSCQE), however this algorithm is not implemented in the above
ITU-R BT.1788 (SAMVIQ) Annex 2 Clauses 3.2, 3.3 and 3.4 in
ITU-R BT.1788, also known as SAMVIQ, demands that subjects have
a stable and coherent method to vote degradations of quality. The
rejection criteria uses both Pearson correlation and Spearman rank
correlation. This technique rejects subject who do not
associate with other subjects (i.e., rank impairments
VQEG HDTV Test Plan, Annex I
The Video Quality Experts Group (VQEG) HDTV Phase I Test Plan includes a method for
screening subjects in Annex I (page 37). The rejection
criteria tests consistency of scores using Pearson correlation on a
per-clip basis. This technique rejects subject who do not
associate with other subjects (e.g., rank impairments differently).
The thresholds were chosen to be appropriate for
Absolute Category Rating (ACR) tests.
Note that if the threshould were to be adjusted to be very low
(e.g., 0.30), then the VQEG HDTV test plan rejection criteria would
probably only eliminate subjects who did not understand the
VQEG MM Test Plan, Annex VI
The Video Quality Experts Group (VQEG) Multimedia Phase I Test Plan
includes a method for screening subjects in Annex VI (page
57). The rejection criteria tests consistency of scores using
Pearson correlation on both a per-clip basis and averaging
scores across all scenes associated with one impairment (i.e.,
per-HRC or Hypothetical Reference Circuit). This
technique rejects subject who do not associate with other
subjects (e.g., rank impairments differently). The thresholds
were chosen to be appropriate for ACR tests.