Institute for Telecommunication Sciences
the research laboratory of the National Telecommunications and Information Administration

Institute for Telecommunication Sciences / Resources / Audio Quality Research / Audio Demos for Frame Duration Study

Audio Demos for Frame Duration Study

Demo 1:  In Support of Figure 1.

Figure 1: Speech quality as function of frame duration for oracle magnitude recovery (OMR) case. Noise types are coffee shop (blue), saw (red), and white (gold). Noise at 0 dB SNR. Dashed and solid lines show lower and higher stationarity speech, (Ψ and Ψ) resp. Nominal POLQA MOS-LQO scale: 1 means  “Bad”, 5 means “excellent”.

Example Audio Files

  1 ms 4 ms 10 ms 20 ms 40 ms 100 ms 400 ms
ΨL, Coffee Shop Noise low_coffee_1ms.wav  low_coffee_4ms.wav low_coffee_10ms.wav low_coffee_20ms.wav low_coffee_40ms.wav low_coffee_100ms.wav low_coffee_400ms.wav
ΨL, Saw Noise low_saw_1ms.wav low_saw_4ms.wav low_saw_10ms.wav low_saw_20ms.wav low_saw_40ms.wav low_saw_100ms.wav low_saw_400ms.wav
ΨL, White Noise low_white_1ms.wav low_white_4ms.wav low_white_10ms.wav low_white_20ms.wav low_white_40ms.wav low_white_100ms.wav low_white_400ms.wav
ΨH, Coffee Shop Noise high_coffee_1ms.wav high_coffee_4ms.wav high_coffee_10ms.wav high_coffee_20ms.wav high_coffee_40ms.wav high_coffee_100ms.wav high_coffee_400ms.wav
ΨH, Saw Noise high_saw_1ms.wav high_saw_4ms.wav high_saw_10ms.wav high_saw_20ms.wav high_saw_40ms.wav high_saw_100ms.wav high_saw_400ms.wav
ΨH, White Noise high_white_1ms.wav high_white_4ms.wav high_white_10ms.wav high_white_20ms.wav high_white_40ms.wav high_white_100ms.wav high_white_400ms.wav

Original speech files are low_original.wav and high_original.wav.

All 44 of these .wav files are available in FrameDurationDemo1.zip (51 MB)


Demo 2:  In Support of Figure 2.

Figure 2: Speech quality as function of frame duration for 0 dB SNR coffee shop noise. Oracle Binary Mask (OBM), Oracle Magnitude Recovery (OMR), and convolutional model noise model (CNM) shown in blue, red, and black, resp. Dashed and solid lines show lower and higher stationarity speech (Ψ and ΨH) resp. Nominal POLQA MOS-LQO scale: 1 means  “Bad”, 5 means “excellent”. Simple CNM captures vast majority of OBM and OMR quality effects.

Example Audio Files

  1 ms 4 ms 10 ms 20 ms 40 ms 100 ms 400 ms
ΨL, Convolutional Noise Model  eqn. (9) low_CNM_1ms.wav low_CNM_4ms.wav low_CNM_10ms.wav low_CNM_20ms.wav low_CNM_40ms.wav low_CNM_100ms.wav low_CNM_400ms.wav
ΨL, OBM low_OBM_1ms.wav low_OBM_4ms.wav low_OBM_10ms.wav low_OBM_20ms.wav low_OBM_40ms.wav low_OBM_100ms.wav low_OBM_400ms.wav
ΨL, OMR low_OMR_1ms.wav low_OMR_4ms.wav low_OMR_10ms.wav low_OMR_20ms.wav low_OMR_40ms.wav low_OMR_100ms.wav low_OMR_400ms.wav
ΨH, Convolutional Noise Model eqn. (9) high_CNM_1ms.wav high_CNM_4ms.wav high_CNM_10ms.wav high_CNM_20ms.wav high_CNM_40ms.wav high_CNM_100ms.wav high_CNM_400ms.wav
ΨH, OBM high_OBM_1ms.wav high_OBM_4ms.wav high_OBM_10ms.wav high_OBM_20ms.wav high_OBM_40ms.wav high_OBM_100ms.wav high_OBM_400ms.wav
ΨH, OMR high_OMR_1ms.wav high_OMR_4ms.wav high_OMR_10ms.wav high_OMR_20ms.wav high_OMR_40ms.wav high_OMR_100ms.wav high_OMR_400ms.wav

Original speech files are low_original.wav and high_original.wav.

All 44 of these .wav files are available in FrameDurationDemo2.zip (48 MB)


Demo 3:  Greater Stationary Reduces Perceptibility of Temporal Blurring

   Original Temporal Blurring (via CNM), 500 ms frame
Less Stationary Piano Excerpt (Ψ=27 ms) pianoLS_org.wav pianoLS_cnm.wav
More Stationary Piano Excerpt (Ψ=230 ms) pianoMS_org.wav pianoMS_cnm.wav

 


Demo 4:  Similarity of Artifacts

At shorter frame durations the artifacts caused by random frame phase (RFP) sound similar to artifacts of convolutional noise model (CNM), oracle binary mask (OBM), and oracle magnitude recovery (OMR).  Random frame phase means simply multiplying all samples of each time domain frame by either +1 or -1, chosen at random.

  1 ms 4 ms 10 ms
Random Frame Phase low_RFP_1ms.wav low_RFP_4ms.wav low_RFP_10ms.wav
Convolutional Noise Model of Eqn. (9) low_CNM_1ms.wav low_CNM_4ms.wav low_CNM_10ms.wav
OBM low_OBM_1ms.wav low_OBM_4ms.wav low_OBM_10ms.wav
OMR low_OMR_1ms.wav low_OMR_4ms.wav low_OMR_10ms.wav

 


Demo 5:  Some Musical Examples

Four musical excerpts (2 sec each) processed with convolutional noise model (CNM), oracle binary mask (OBM), and oracle magnitude recovery (OMR) for the case of coffee shop noise at 0 dB SNR.

  1 ms 4 ms 10 ms 40 ms 100 ms 400 ms
Convolutional Noise Model  eqn. (9) CNM_1ms.wav CNM_4ms.wav CNM_10ms.wav CNM_40ms.wav CNM_100ms.wav CNM_400ms.wav
OBM OBM_1ms.wav OBM_4ms.wav OBM_10ms.wav OBM_40ms.wav OBM_100ms.wav OBM_400ms.wav
OMR OMR_1ms.wav OMR_4ms.wav OMR_10ms.wav OMR_40ms.wav OMR_100ms.wav OMR_400ms.wav

Original music file is original.wav.

All 19 of these .wav files are available in FrameDurationDemo5.zip (12 MB)