Institute for Telecommunication Sciences / Resources / Audio Quality Research /
ITS Audio Quality Research Program
The ITS Audio Quality Research Program addresses selected open questions in digital speech and audio enhancement, compression, transmission, and quality assessment. Our contributions are most easily appreciated via our Publications & Talks page.
The Latest: We are developing no-reference (NR) waveform-based convolutional neural network (CNN) architectures that can accurately estimate speech quality or speech intelligibility.
These Audio Waveform Evaluation Networks achieve very high correlation to established full-reference (FR) quality and intelligibility estimators (PESQ, POLQA, PEMO, STOI). Network training is made possible by using FR estimates as training targets. Each target value is associated with a three-second segment of speech and over 300,000 such segments (250 hours of speech) are now in use. Our initial efforts produced the Narrowband Audio Waveform Evaluation Networks, or NAWEnets, described here. Our follow-on work led to Wideband Audio Waveform Evaluation Networks, or WAWEnets, which will be presented in early May at ICASSP 2020.
Recently: Other recent work addresses optimal frame durations for separation of audio signals in the context of oracle binary masking and oracle magnitude restoration. We can now offer a set of Audio Demos for Frame Duration Study that supports a draft paper.
In the draft paper we demonstrate that the optimal processing frame duration in oracle binary masking and oracle magnitude restoration is determined by joint minimization of two antagonistic artifacts: temporal blurring (which increases with frame duration) and log-spectral-error change per unit time (which deceases with frame duration). These effects are related to the stationarity of the signals but saying that “stationarity determines optimal frame duration” falls far short of describing the true nature and complexity of the interaction.
The Bigger Picture: The quality of speech sent over a telecommunication system depends on a variety of factors, such as the background noise in the environment, the algorithms used to digitally enhance and code the speech signal, the bandwidth used in transmitting the speech signal, as well as others. The ITS Audio Quality Research Program supports community-wide efforts towards robust and adaptable telecommunication speech services and equipment with high quality and intelligibility.
In the program we identify and address open issues in these areas and we develop and characterize algorithm innovations as well. In particular we seek to improve tools and techniques for quantitatively characterizing the user experience of speech quality and speech intelligibility, both through subjective testing and by means of signal processing algorithms.