Institute for Telecommunication Sciences
the research laboratory of the National Telecommunications and Information Administration

Institute for Telecommunication Sciences / Resources / Audio Quality Research / Publications &Talks

Publications & Talks

Audio Quality Research Program publications and talks are listed here in reverse chronological order. Abstracts and slideshows are available in HTML format and documents are available in the Adobe Acrobat portable document format.

Stephen D. Voran; Andrew A. Catellier, "Intelligibility Robustness of Five Speech Codec Modes in Frame-Erasure and Background-Noise Environments," NTIA Technical Report TR-18-529, December 2017

Frame erasures and background noise are two factors that can interact with speech coding to reduce speech intelligibility and thus impair public safety mission-critical voice communications. We conducted two tests of intelligibility in the face of th...

Stephen D. Voran, "The Selection of Spectral Magnitude Exponents for Separating Two Sources Is Dominated by Phase Distribution Not Magnitude Distribution," Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 15-18, 2017

Separating an acoustic signal into desired and undesired components is an important and well-established problem. It is commonly addressed by decomposing spectral magnitudes after exponentiation and the choice of exponent has been studied from numero...

Stephen D. Voran, "A Multiple Bandwidth Objective Speech Intelligibility Estimator Based on Articulation Index Band Correlations and Attention," Proceedings of the 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP2017), New Orleans, LA, March 5-9, 2017

We present ABC-MRT16—a new algorithm for objective estimation of speech intelligibility following the Modified Rhyme Test (MRT) paradigm. ABC-MRT16 is simple, effective and robust. When compared to subjective MRT data from 367 diverse conditions that...

Stephen D. Voran; Andrew A. Catellier, "A Crowdsourced Speech Intelligibility Test that Agrees with, Has Higher Repeatability than, Lab Tests," NTIA Technical Memo TM-17-523, February 2017

Crowdsourcing of subjective speech, audio, and video quality of experience (QoE) tests has received much interest and study, but crowdsourcing of speech intelligibility testing has not. We hypothesize that speech intelligibility tests offer a unique ...

Andrew A. Catellier; Stephen D. Voran, "Intelligibility of Selected Speech Codecs in Frame-Erasure Conditions," NTIA Technical Report TR-17-522, November 2016

We describe the design, implementation, and analysis of a speech intelligibility test. The test included five codec modes, four frame-erasure rates, and two background noise environments, for a total of 40 conditions. The test protocol required twent...

Stephen D. Voran, "Exploration of the Additivity Approximation for Spectral Magnitudes," 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 19, 2015

The separation of acoustic signals is often accomplished through subtractive decompositions of frequency-domain representations. This is typically enabled by the zero phase approximation or the uncorrelated signals approximation but both of these are...

Stephen D. Voran; Andrew A. Catellier, "Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE," NTIA Technical Report TR-15-520, September 2015

We describe a major effort to quantify the speech intelligibility associated with a range of narrowband, wideband, and fullband digital audio coding algorithms in various acoustic noise environments. The work emphasizes the relationship between these...

Stephen D. Voran, "Using articulation index band correlations to objectively estimate speech intelligibility consistent with the modified rhyme test," 2013 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics, October 20-23, 2013

We present an objective estimator of speech intelligibility that follows the paradigm of the Modified Rhyme Test (MRT). For each input, the estimator uses temporal correlations within articulation index bands to select one of six possible words from ...

Stephen D. Voran, "Lossless Compression of G.711 Speech Using Only Look-Up Tables," Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2013, pp. 8179–8183

The lossless compression algorithm specified in ITU-T Recommendation G.711.0 provides bit-exact G.711 speech coding at reduced bit-rates. We introduce two Look-Up Coders (LUCs) that also offer bit-exact G.711 speech coding at reduced rates but the LU...

Stephen D. Voran; Andrew A. Catellier, "When Should a Speech Coding Quality Increase be Allowed Within a Talk-Spurt?," Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2013, pp. 8149–8153

The value or harm associated with an increase in speech coding quality depends on the type of the increase as well as the temporal location of the increase in an utterance. For example, some increases in speech coding bandwidth can be perceived as im...

David J. Atkinson; Andrew A. Catellier, "Intelligibility of Analog FM and Updated P25 Radio Systems in the Presence of Fireground Noise: Test Plan and Results," NTIA Technical Report TR-13-495, May 2013

This report describes a modified rhyme test (MRT) conducted to characterize the behavior of digital and analog communication in the presence of background noise and moderate RF channel degradation. This is done through the use of reference systems to...

David J. Atkinson; Stephen D. Voran; Andrew A. Catellier, "Intelligibility of the Adaptive Multi-Rate Speech Coder in Emergency-Response Environments," NTIA Technical Report TR-13-493, December 2012

This report describes speech intelligibility testing conducted on the Adaptive Multi-Rate (AMR) speech coder in several different environments simulating emergency response conditions and especially fireground conditions. The intelligibility testing ...

Stephen D. Voran; Andrew A. Catellier, "Gradient Ascent Subjective Multimedia Quality Testing," EURASIP Journal on Image and Video Processing, vol. 2011, Article ID 472185, 14 pages, March 15, 2011. doi: 10.1155/2011/472185

Subjective testing is the most direct means of assessing multimedia quality as experienced by users. When multiple dimensions must be evaluated, these tests can become slow and costly.We present gradient ascent subjective testing (GAST) as an efficie...

Stephen D. Voran; Andrew A. Catellier, "Multiple Description Speech Coding Using Speech Polarity Decomposition," Proceedings of the IEEE Global Communications Conference (GLOBECOM 2010), pp.1-6, Miami, December 6-10, 2010. doi: 10.1109/GLOCOM.2010.5683769

We present and evaluate a new multiple–description coding extension to the international standard for pulse code modulation speech coding (ITU–T Rec. G.711). This extension is inserted between the G.711 encoder and decoder. It uses speech–polarity de...

S. Voran, "Subjective Ratings of Instantaneous and Gradual Transitions from Narrowband to Wideband Active Speech," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4674-4677, Dallas, March 15-19, 2010. doi: 10.1109/ICASSP.2010.5495187

In advanced heterogeneous telecommunication networks, network resources can dynamically dictate the type of speech coding that is used. An increase in resources allows for lower coding distortion or it might also be used to provide wideband speech in...

Andrew A. Catellier; Stephen D. Voran, "Low Rate Speech Coding and Random Bit Errors: A Subjective Speech Quality Matching Experiment," NTIA Technical Report TR-10-462, October 2009

When bit errors are introduced between a speech encoder and a speech decoder, the quality of the received speech is reduced. The specific relationship between speech quality and bit error rate (BER) can be different for each speech coding and channel...

Stephen D. Voran; Andrew A. Catellier, "Gradient Ascent Paired-Comparison Subjective Quality Testing," Proceedings of the IEEE First International Workshop on Quality of Multimedia Experience (QoMEX 2009), pp. 133-138, San Diego, July 29-31, 2009. doi: 10.1109/QOMEX.2009.5246964

(This paper won the QoMEX 2009 Best Paper Award.) Subjective testing is the most direct means of assessing audio, video, and multimedia quality as experienced by users and maximizing the information gathered while minimizing the number of trials ...

Andrew A. Catellier; Stephen D. Voran, "Relationships Between Intelligibility, Speaker Identification, and the Detection of Dramatized Urgency," NTIA Technical Report TR-09-459, November 2008

The systems used for public safety speech communications must be intelligible. It is also desirable that they transmit secondary information, such as the attributes of a speaker's voice. This secondary information can allow a user to identify the spe...

David J. Atkinson; Andrew A. Catellier, "Intelligibility of Selected Radio Systems in the Presence of Fireground Noise: Test Plan and Results," NTIA Technical Report TR-08-453, June 2008

This report describes an experiment conducted to measure the intelligibility of selected radio communication systems when those systems are employed in high-background-noise environments experienced by firefighters. The test plan for a Modified Rhyme...

Andrew A. Catellier; Stephen D. Voran, "Speaker Identification in Low-Rate Coded Speech," Proceedings of the 7th International MESAQIN (Measurement of Audio and Video Quality in Networks) Conference, Prague, Czech Republic, May 2008.

While useful speech communication systems must be intelligible, most systems aim to transmit secondary information, such as attributes of a speaker's voice, as well. This secondary information can allow a listener to identify the speaker and his emot...

S. Voran, "Listener Detection of Talker Stress in Low-rate Coded Speech," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 4813-4816, Las Vegas, March 31-April 4, 2008. doi: 10.1109/ICASSP.2008.4518734

We describe an experiment where listeners were asked to detect two specific forms of stress in talkers' recorded voices heard via six different simulated communication systems. Both task–induced stress and dramatized urgency were used. Communication ...

S. Voran, "Lossless Audio Coding with Bandwidth Extension Layers," Proceedings of the 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 239-242, New Paltz, NY, October 21-24, 2007. doi: 10.1109/ASPAA.2007.4393036

Layered audio coding typically offers reduced distortion as bit rate is increased, but that distortion is spread across the entire band until the lossless coding bit rate is reached and distortion is eliminated. We propose a layered audio coding para...

S. Voran, "Reducing Quantization Error by Matching Pseudoerror Statistics," Proceedings of the 12th IEEE Digital Signal Processing Workshop, pp. 187-192, Grand Teton National Park, Wyoming, September 24-27, 2006. doi: 10.1109/DSPWS.2006.265440

We investigate the use of an adaptive processor (a quantizer pseudoinverse) and the statistics of the associated pseudoerror signal to reduce quantization error in scalar quantizers when a small amount of prior knowledge about the signal x is availab...

S. Voran, "Listening-Time Relationships in a Subjective Speech Quality Experiment," Proceedings of the 5th International MESAQIN (Measurement of Speech and Audio Quality in Networks) Conference, Prague, Czech Republic, June 2006.

We have designed, conducted, and analyzed a subjective speech quality experiment with unrestricted timing where subjects can vote whenever their opinions are fully formed, rather than at fixed time intervals. Analysis of the resulting listening times...

S. Voran, "A Basic Experiment on Time-Varying Speech Quality," Proceedings of the 4th International MESAQIN (Measurement of Speech and Audio Quality in Networks) Conference, Prague, Czech Republic, June 2005.

We present a general formulation of a basic open question regarding the perception of time-varying speech quality. We then describe the design, implementation, conduct, and analysis of a practical experiment that addresses a small but fundamental par...

S. Voran, "Multiple-Description PCM Speech Coding by Complementary Asymmetric Vector Quantizers," Proceedings of the 2005 IEEE Region 5 and IEEE Denver Section Technical, Professional and Student Development Workshop, pp. 59-65, Boulder, CO, April 7-8, 2005. doi: 10.1109/TPSD.2005.1614348

We describe new 2-channel multiple-description speech coders based on the ITU-T Recommendation G.711 PCM speech coder. The new coders operate in the PCM code domain in order to exploit the companding gain of PCM. They apply pairs of complementary asy...

S. Voran, "A Multiple-Description PCM Speech Coder using Structured Dual Vector Quantizers," Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), pp. 129- 132, Philadelphia, March 2005. doi: 10.1109/ICASSP.2005.1415067

We describe a 2-channel multiple-description speech coder based on the ITU-T Recommendation G.711 PCM speech coder. The new coder operates in the PCM code domain in order to exploit the companding gain of PCM. It applies a pair of 2-dimensional struc...

S. Voran, "Compensating for Gain in Objective Quality Estimation Algorithms," Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), vol.3, pp. 1068-71 , Montreal, May 17-21, 2004. doi: 10.1109/ICASSP.2004.1326733

When objectively estimating speech, audio, or video quality, it is often necessary to compensate for a system gain or to "gain match" two or more signals. One can take three views of a system, leading to three different definitions of gain, and three...

S. Voran, "A Bottom-Up Algorithm for Estimating Time-Varying Delays in Coded Speech," Proceedings of the 3rd International Conference on Measurement of Speech and Audio Quality in Networks, Prague, Czech Republic, May 2004.

In packetized speech transmission, end–to–end delay can vary, even over short timescales. Estimating the resulting speech delay histories is critical to diagnostic and quality estimation efforts. We present a new bottom–up algorithm for estimating ti...

S. Voran, "Perception of Temporal Discontinuity Impairments in Coded Speech - A Proposal for Objective Estimators and Some Subjective Test Results," Proceedings of the 2nd International Conference on Measurement of Speech and Audio Quality in Networks, Prague, Czech Republic, May 2003.

Temporal discontinuities in received speech are a reality of Internet Telephony or Voice over Internet Protocol (VoIP) systems. These relatively new impairments pose unique challenges to objective estimators of perceived speech quality. We suggest th...

S. Voran, "The Channel-Optimized Multiple-Description Scalar Quantizer," Proceedings of the 2002 IEEE 10th Digital Signal Processing Workshop, pp. 400- 405, Calloway Gardens, Pine Mountain, Georgia, October 13-16, 2002. doi: 10.1109/DSPWS.2002.1231141

Multiple–description coding is one way to gain robustness against lossy channels. We extend the multiple–description scalar quantizer (MDSQ) to a channel–optimized MDSQ (COMDSQ) that minimizes mean–squared error for a given channel environment. We di...

Stephen D. Voran, "An iterated nested least-squares algorithm for fitting multiple data sets," NTIA Technical Memo TM-03-397, October 2002

A multiple data set fitting problem often arises in conjunction with the development of objective estimators of perceived audio or video quality. In such development work, we often seek the best linear relationship between a set of objective audio or...

Stephen D. Voran, "Compensating for system gain: Motivations, derivations, and relations for three common solutions," NTIA Technical Memo TM-03-398, October 2002

It is often desirable to compensate for system gain, especially before objectively estimating perceived audio or video quality from system inputs and outputs. A common approach is to scale the system output to compensate for system gain. One can take...

Stephen D. Voran, "Estimation of system gain and bias using noisy observations with known noise power ratio," NTIA Technical Report TR-02-395, September 2002

The identification of linear systems from input and output observations is an important and well-studied topic. When both the input and output observations are noisy, the resulting problem is sometimes called the "errors in variables" problem. Existi...

S. Voran, "Results on Reverse Water-Filling, SNR, and Log-Spectral Error in Codebook-Based Coding," Proceedings of the 2000 IEEE Workshop on Speech Coding, Delavan, Wisconsin, September 17-20, 2000. doi: 10.1109/SCFT.2000.878387

This paper identifies optimum levels of reverse water-filling for codebook-based coding of noise and speech signals. We find that there is little to be gained from optimizing an effective rate parameter. We identify trade-offs between SNR and log-spe...

Stephen D. Voran; Stephen Wolf, "Objective Estimation of Video and Speech Quality to Support Network QoS Efforts," 2nd Department of Energy/Internet2 Quality of Service Workshop, pp. 38-40, Houston, TX, February 2000.

One of the questions that ongoing QoS efforts seek to answer is: "Given fixed network resources, how does one provide the highest possible quality of service to the maximal number of users in a fair way, even when those users are generating competing...

S. Voran, "Objective Estimation of Perceived Speech Quality, Part 1: Development of the Measuring Normalizing Block Technique," IEEE Transactions on Speech and Audio Processing, vol.7, no.4, pp.371-382, July 1999. doi: 10.1109/89.771259

Perceived speech quality is most directly measured by subjective listening tests. These tests are often slow and expensive, and numerous attempts have been made to supplement them with objective estimators of perceived speech quality. These attempts ...

S. Voran, "Objective Estimation of Perceived Speech Quality, Part 2: Evaluation of the Measuring Normalizing Block Technique," IEEE Transactions on Speech and Audio Processing, vol.7, no.4, pp.383-390, July 1999. doi: 10.1109/89.771260

Part 1 of this paper describes a new approach to the objective estimation of perceived speech quality. This new approach uses a simple but effective perceptual transformation and a distance measure that consists of a hierarchy of measuring normalizin...

S. Voran, "Advances in Objective Estimation of Perceived Speech Quality," Proceedings of the 1999 IEEE Workshop on Speech Coding, pp. 138-140, Porvoo, Finland, June 1999. doi: 10.1109/SCFT.1999.781510

We present two techniques that can be used to enhance objective estimators of perceived speech quality. Frame normalization and frame-energy plane partitioning are described and applied to a log-spectral-error-based estimator. The resulting estimator...

S. Voran, "Observations on Frequency-Domain Companding for Audio Coding," Proceedings of the Eighth IEEE Digital Signal Processing Workshop, Bryce Canyon National Park, Utah, August 1998.

Frequency-domain companding can be used in conjunction with audio coders that produce white coding noise. In [1-2] it is demonstrated empirically that this technique colors white coding noise so that it is better masked by audio signals, resulting in...

S. Voran, "A Simplified Version of the ITU Algorithm for Objective Measurement of Speech Codec Quality," Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol.1, pp. 537-54, Seattle, May 1998. doi: 10.1109/ICASSP.1998.674486

ITU-T Recommendation P.861 describes an objective speech quality assessment algorithm for speech codecs. This algorithm transforms codec input and output speech signals into a perceptual domain, compares them, and generates a noise disturbance value,...

Stephen D. Voran, "Objective Estimation of Perceived Speech Quality Using Measuring Normalizing Blocks," NTIA Technical Report TR-98-347, April 1998

Perceived speech quality is most directly measured by subjective listening tests. These tests are often slow and expensive, and numerous attempts have been made to supplement them with objective estimators of perceived speech quality. These attempts ...

S. Voran, "Perception-Based Bit-Allocation Algorithms for Audio Coding," Proceedings of the 1997 IEEE AASP Workshop on Applications of Signal Processing to Audio and Acoustics, New Platz, NY, October 19-22, 1997. doi: 10.1109/ASPAA.1997.625586

We describe six algorithms for bit allocation in audio coding. Each algorithm stems from the minimization of a different perceptually–motivated objective function. Three of these objective functions are extensions of existing ones, and three are new....

David J. Atkinson; Stephen D. Voran, "Summary of Objective Audio Quality Measure Performance Data Presented to T1A1," ANSI T1A1 Contribution T1A1.7/97-042, October 22, 1997.

This contribution aggregates the available performance data on the MNB and P.861 objective speech quality measures. Specifically, results presented in contributions T1A1.7/97-032 and T1A1.7/97-034 are examined. Based on examination of the aggregated ...

S. Voran, "Estimation of Perceived Speech Quality using Measuring Normalizing Blocks," Proceedings of the 1997 IEEE Workshop on Speech Coding for Telecommunications, pp. 83-84, Pocono Manor, PA, September 7-10, 1997. doi: 10.1109/SCFT.1997.623907

We describe a new approach to the estimation of perceived speech quality. The approach uses a simple, but effective, perceptual transformation to emulate hearing and a hierarchy of Measuring Normalizing Blocks (MNB's) to emulate auditory judgment. Th...

S. Voran, "Listener Ratings of Speech Passbands," Proceedings of the 1997 IEEE Workshop on Speech Coding for Telecommunications, pp. 81-82, Pocono Manor, PA, September 7-10, 1997. doi: 10.1109/SCFT.1997.623906

We describe a listening experiment that measures the perceived speech quality of 19 speech passbands using 8 talkers and 28 listeners. Results are referenced to the traditional wide-band and narrow-band telephony passbands. Our findings may help thos...

S. Voran, "An Algorithm for Estimating the Delay of Telephony Speech," Contribution to ITU-T SG-12 Experts Group on Speech Quality, Document Number SQ 75.96, September 1996.

This contribution is provided for informational purposes. It contains a description of an algorithm that has sucessfully been used to estimate the delay of telephony band speech. The algorithm features a coarse stage that uses speech envelopes and a ...

S. Voran, "Observations on Auditory Excitation and Masking Patterns," Proceedings of the 1995 Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 206-209, New Platz, NY, October 15-18, 1995. doi: 10.1109/ASPAA.1995.482992

Excitation patterns and masking patterns are used extensively in perceptual audio coders and quality assessment algorithms. Numerous algorithms for calculating these patterns have been proposed. This paper provides comparisons among the patterns gene...

S. Voran; C. Sholl, "Perception-based Objective Estimators of Speech Quality," Proceedings of the 1995 IEEE Workshop on Speech Coding for Telecommunications, pp. 13-14, Annapolis, MD, September 20-22, 1995. doi: 10.1109/SCFT.1995.658103

Four proposed perception-based techniques for objectively estimating speech quality and three traditional estimators are applied to coded speech samples. Agreement between objective estimates and corresponding subjective test scores is reported. Seve...

S. Voran, "Techniques for Comparing Objective and Subjective Speech Quality Tests," Proceedings of the Speech Quality Assessment Workshop at Ruhr-Universtät Bochum, Germany, pp. 59-64, November 1994.

Objective (or instrumental) tests of speech quality have been proposed as ways to reduce the need for expensive and time-consuming subjective (or auditory) tests. Both types of tests attempt to quantify the range of opinions that listeners express in...

Stephen Voran; Stephen Wolf, "Proposed Framework for Subjective Audiovisual Testing," ANSI T1A1 Contribution T1A1.5/93-151, November 8, 1993.

Working Group T1A1.5 is supporting ITU-T Study Group 12 in developing subjective audiovisual testing methods under Question 22/12 which addresses audiovisual quality in multimedia services. A previous contribution from Bellcore, T1A1.5/93-104, descri...

S. Voran, "Observations on the T-Reference Condition for Speech Coder Evaluation," Contribution to CCITT SG XII Experts Group on Speech Quality, Document Number SQ 13.92, February 1992.

In a Study Group XII Contribution dated September 1991, John Rosenberger and Bill Cotton of Bellcore introduced an algorithm for generating temporally correlated distortion on 8 KHz sampled speech data. This distortion is parameterized by a single in...