Audio Quality Research

Outputs

Digital compression and transmission of speech and audio signals are two of the enabling technologies that have contributed to the current explosion of telecommunications and broadcasting offerings. Examples of these offerings include cellular telephones, personal communications systems (PCS), satellite telephony, digital audio broadcasting, voice messaging systems, voice over Internet protocol (VoIP) services, Minidisc equipment, Motion Picture Experts Group (MPEG) 1, Layer-3 (MP3) music files and MPEG Advanced Audio Coding systems. Digital compression allows these systems to deliver high-quality speech using bit rates between 4 and 64 kbit/s. Audio signals, including music and entertainment soundtracks, are typically delivered at rates between 16 and 256 kbit/s per channel. Compressed speech and audio signals can be transmitted as data packets, thus sharing channel capacity with other data streams. Some compression and transmission systems employ multiple streams of packets to increase robustness to transmission failures, or to allow for higher quality audio when and where bandwidth permits.

These digital compression and transmission techniques and the associated economic trade-offs have created important new speech and audio quality issues. Equipment manufacturers, service providers, and users all seek equipment and services that maximize delivered audio quality under applicable transmission channel constraints. But due to the increasingly complex time-varying interactions among audio signal content, source coding, channel coding, and channel conditions, it is becoming more and more difficult to define or measure audio quality. The ITS Audio Quality Research Program develops and verifies tools that assist with audio quality assessment and optimization.

The most fundamental and correct measures of audio quality come from subjective listening experiments. The Audio Quality Research Program uses these experiments to provide answers to pressing questions in the field. In order to generate the most useful and reliable results, this work is done in accordance with applicable recommendations from the International Telecommunication Union (ITU). In FY99, program staff designed, conducted, and analyzed an experiment that investigated the links between variable transmission delays and speech quality. Variable transmission delays can occur in packetized transmission systems, and are presently considered a major factor in the design of VoIP systems. The Figure shows how packet delay variation can lead to a unique sort of speech distortion. Results from the experiment will provide equipment designers with new guidance for their efforts to mitigate the effects of delay variation.

Carefully conducted subjective listening experiments tend to be fairly complex and time consuming, and the associated costs make them inappropriate for some applications. The Audio Quality Research Program continues to be involved in developing and evaluating practical alternatives to subjective listening experiments. The program has recently made a significant contribution in this area: the measuring normalizing block (MNB) algorithms for estimating the perceived quality of 4 kHz bandwidth speech. The MNB algorithms work because they model both human hearing and human judgement. A simple hearing model is followed by a more sophisticated judgement model. The judgement model involves measuring and normalizing spectral deviations at multiple time and frequency scales. To most realistically emulate listeners' patterns of adaptation and reaction to spectral deviations, the measuring normalizing blocks are combined so that analysis proceeds from larger scales to smaller scales. When speech quality estimates from the MNB algorithms are compared with the results of subjective listening experiments, a good degree of correlation is found. Improvement over earlier algorithms is particularly evident for highly compressed speech transmission systems, and those suffering from bit errors and frame erasures. Thus these algorithms furnish industry, Government, and other users with valuable tools that provide rapid and reliable quality feedback in an increased number of application areas. The MNB algorithms form both the American National Standards Institute Telecommunications Standard T1.518-1998 (see Publications Cited) and ITU Recommendation P.861, Appendix II, 1998.

During FY99, program staff continued to apply the MNB algorithms to a variety of speech transmission systems, with an emphasis on emerging VoIP systems. These studies provided valuable information for the future refinement and extension of the MNB algorithms. One of these extensions is frame-energy plane partitioning, which allows separate estimators to be applied to different parts of speech signals. As an example, pauses in speech may be analyzed for background noise, while active speech is analyzed for distortion. This approach offers the potential of more reliable quality estimates in the presence of background noise. These improved estimates will be important for PCS, cellular, and satellite telephones operated in vehicles or other noisy environments.

In FY99, program staff also made significant efforts to track and analyze speech transmission delay in systems where that delay has significant variation. This work builds on the subjective listening experiment described above and is particularly applicable to VoIP systems. It is expected that this work will lead to tools that provide more effective quality feedback for VoIP systems. Other staff efforts included the application of quality assessment results in the coding and transmission areas, with a potential for improved audio quality and more robust transmission.

Throughout FY99, program staff disseminated program results to industry, Government, and academia through numerous technical publications, participation in workshops, conferences and symposia, submitted and invited talks and lectures, and laboratory demonstrations. Laboratory demonstrations often focused on subjective testing capabilities and issues, and on prototype audio quality test instruments that implement MNB algorithms in near real time.

Example of a speech waveform impaired by delay variation in a packet 
network.
Example of a speech waveform impaired by delay variation in a packet network.

Recent Publications

S. Voran, "Advances in objective estimation of perceived speech quality," in Proc. 1999 IEEE Workshop on Speech Coding, Porvoo, Finland, Jun. 1999.

S. Voran, "Objective estimation of perceived speech quality, Part I: Development of the measuring normalizing block technique; Part II: Evaluation of the measuring normalizing block technique," IEEE Transactions on Speech and Audio Processing, vol. 7, no. 4, pp. 371-382 and pp. 383-390, Jul. 1999.

For more information, contact:
Stephen D. Voran
303-497-3839
e-mail:sv@its.bldrdoc.gov