Institute for Telecommunication Sciences
the research laboratory of the National Telecommunications and Information Administration

Institute for Telecommunication Sciences / Research Topics / Audio Quality Research / Audio Home

ITS Audio Quality Research Program

The ITS Audio Quality Research Program addresses selected open questions in digital speech and audio enhancement, compression, transmission, and quality assessment. Our contributions are most easily appreciated via our Publications & Talks page.

The Latest: We are developing no-reference (NR) waveform-based convolutional neural network (CNN) architectures that can accurately estimate speech quality or speech intelligibility. Our latest work produced Wideband Audio Waveform Evaluation Networks, or WAWEnets.  The paper was presented at ICASSP 2020, and software is available here.

WAWEnets achieve very high correlation to established full-reference (FR) quality and intelligibility estimators (PESQ, POLQA, PEMO, STOI). Network training is made possible by using FR estimates as training targets.  Each target value is associated with a three-second segment of speech and we are now using over 300,000 such segments (250 hours of speech). Our earlier work using narrowband speech produced Narrowband Audio Waveform Evaluation Networks, or NAWEnets, described here

Recently:  Other recent work addresses optimal frame durations for separation of audio signals in the context of oracle binary masking and oracle magnitude restoration. We can now offer a set of Audio Demos for Frame Duration Study that supports a draft paper.

In the draft paper we demonstrate that the optimal processing frame duration in oracle binary masking and oracle magnitude restoration is determined by joint minimization of two antagonistic artifacts: temporal blurring (which increases with frame duration) and log-spectral-error change per unit time (which deceases with frame duration). These effects are related to the stationarity of the signals but saying that “stationarity determines optimal frame duration” falls far short of describing the true nature and complexity of the interaction.

The Bigger Picture: The quality of speech sent over a telecommunication system depends on a variety of factors, such as the background noise in the environment, the algorithms used to digitally enhance and code the speech signal, the bandwidth used in transmitting the speech signal, and others. The ITS Audio Quality Research Program supports community-wide efforts towards robust and adaptable telecommunication speech services and equipment with high quality and intelligibility.

In the program we identify and address open issues in these areas and we develop and characterize algorithm innovations as well. In particular we seek to improve tools and techniques for quantitatively characterizing the user experience of speech quality and speech intelligibility, both through subjective testing and by means of signal processing algorithms.