High-quality Speech Coding with Sample RNN
Janusz Klejsa,Per Hedelin,Cong Zhou,Fejgin Roy M,Lars Villemoes +4 more
- 12 May 2019
- pp 7155-7159
TL;DR: A speech coding scheme employing a generative model based on SampleRNN that, while operating at significantly lower bitrates, matches or surpasses the perceptual quality of state-of-the-art classic wide-band codecs is provided.
read more
Abstract: We provide a speech coding scheme employing a generative model based on SampleRNN that, while operating at significantly lower bitrates, matches or surpasses the perceptual quality of state-of-the-art classic wide-band codecs. Moreover, it is demonstrated that the proposed scheme can provide a meaningful rate-distortion trade-off without retraining. We evaluate the proposed scheme in a series of listening tests and discuss limitations of the approach.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric
Michael Chinen,Felicia S. C. Lim,Jan Skoglund,Nikita Gureev,Feargus O'Gorman,Andrew Hines +5 more
- 26 May 2020
TL;DR: ViSQOL as discussed by the authors is an open source C++ library or binary with permissive licensing that can now be deployed beyond the research context into production usage, and feedback from internal production teams at Google has helped to improve this new release, and serves to show cases where it is most applicable, as well as to highlight limitations.
Generative Speech Coding with Predictive Variance Regularization
W. Bastiaan Kleijn,Andrew Storus,Michael Chinen,Tom Denton,Felicia S. C. Lim,Alejandro Luebs,Jan Skoglund,Hengchin Yeh +7 more
- 06 Jun 2021
TL;DR: In this paper, predictive variance regularization was proposed to reduce the sensitivity of the maximum likelihood criterion to outliers and the ineffectiveness of modeling a sum of independent signals with a single autoregressive model.
82
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Jean-Marc Valin,Jan Skoglund +1 more
- 15 Sep 2019
TL;DR: It is demonstrated that LPCNet operating at 1.6 kb/s achieves significantly higher quality than MELP and that uncompressed LPC net can exceed the quality of a waveform codec operating at low bitrate, opening the way for new codec designs based on neural synthesis models.
70
Cascaded Cross-Module Residual Learning Towards Lightweight End-to-End Speech Coding.
Kai Zhen,Jongmo Sung,Mi Suk Lee,Seungkwon Beack,Minje Kim +4 more
- 15 Sep 2019
TL;DR: In this paper, a cross-module residual learning (CMRL) pipeline is proposed as a module carrier with each module reconstructing the residual from its preceding modules, which shows better objective performance than AMR-WB and OPUS.
53
Improving Opus Low Bit Rate Quality with Neural Speech Synthesis
Jan Skoglund,Jean-Marc Valin +1 more
- 25 Oct 2020
TL;DR: In this article, the authors propose a backward compatible way of improving low bit rate Opus quality by re-synthesizing speech from the decoded parameters of the Opus decoder.
35
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
82.5K
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
- 01 Jan 2014
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
•Book
Vector Quantization and Signal Compression
Allen Gersho,Robert M. Gray +1 more
- 01 Jan 1991
TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
8K
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord,Sander Dieleman,Heiga Zen,Karen Simonyan,Oriol Vinyals,Alex Graves,Nal Kalchbrenner,Andrew W. Senior,Koray Kavukcuoglu +8 more
- 12 Sep 2016
TL;DR: WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
5.2K