High-quality Speech Coding with Sample RNN

doi:10.1109/ICASSP.2019.8682435

Open AccessProceedings Article10.1109/ICASSP.2019.8682435

High-quality Speech Coding with Sample RNN

Janusz Klejsa, +4 more

- 12 May 2019

- pp 7155-7159

47

TL;DR: A speech coding scheme employing a generative model based on SampleRNN that, while operating at significantly lower bitrates, matches or surpasses the perceptual quality of state-of-the-art classic wide-band codecs is provided.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/QOMEX48832.2020.9123150

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

Michael Chinen, +5 more

- 26 May 2020

TL;DR: ViSQOL as discussed by the authors is an open source C++ library or binary with permissive licensing that can now be deployed beyond the research context into production usage, and feedback from internal production teams at Google has helped to improve this new release, and serves to show cases where it is most applicable, as well as to highlight limitations.

...read moreread less

83

•Proceedings Article•10.1109/ICASSP39728.2021.9415120

Generative Speech Coding with Predictive Variance Regularization

W. Bastiaan Kleijn, +7 more

- 06 Jun 2021

TL;DR: In this paper, predictive variance regularization was proposed to reduce the sensitivity of the maximum likelihood criterion to outliers and the ineffectiveness of modeling a sum of independent signals with a single autoregressive model.

...read moreread less

82

•Proceedings Article•10.21437/INTERSPEECH.2019-1255

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

Jean-Marc Valin, +1 more

- 15 Sep 2019

TL;DR: It is demonstrated that LPCNet operating at 1.6 kb/s achieves significantly higher quality than MELP and that uncompressed LPC net can exceed the quality of a waveform codec operating at low bitrate, opening the way for new codec designs based on neural synthesis models.

...read moreread less

70

•Proceedings Article•10.21437/INTERSPEECH.2019-1816

Cascaded Cross-Module Residual Learning Towards Lightweight End-to-End Speech Coding.

Kai Zhen, +4 more

- 15 Sep 2019

TL;DR: In this paper, a cross-module residual learning (CMRL) pipeline is proposed as a module carrier with each module reconstructing the residual from its preceding modules, which shows better objective performance than AMR-WB and OPUS.

...read moreread less

53

•Proceedings Article•10.21437/INTERSPEECH.2020-2939

Improving Opus Low Bit Rate Quality with Neural Speech Synthesis

Jan Skoglund, +1 more

- 25 Oct 2020

TL;DR: In this article, the authors propose a backward compatible way of improving low bit rate Opus quality by re-synthesizing speech from the decoded parameters of the Opus decoder.

...read moreread less

35

...

Expand

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Proceedings Article•10.3115/V1/D14-1179

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

- 01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

28.6K

•Book

Vector Quantization and Signal Compression

Allen Gersho, +1 more

- 01 Jan 1991

TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.

...read moreread less

8K

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, +8 more

- 12 Sep 2016

TL;DR: WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

...read moreread less

5.2K