Speech and Audio Processing

doi:10.1016/B978-1-85617-678-1.00012-0

Book Chapter10.1016/B978-1-85617-678-1.00012-0

Speech and Audio Processing

Hazarathaiah Malepati

- 01 Jan 2010

pp 595-635

53

TL;DR: This chapter provides the discussion of sound and audio signals, and then explores how audio data is presented to the processor from a variety of audio converters.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

On Improving Deep Reinforcement Learning for POMDPs

Pengfei Zhu, +3 more

- 26 Apr 2017

- arXiv: Learning

TL;DR: This work proposes a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains and demonstrates the effectiveness of the new architecture in several partially observable domains, including flickering Atari games.

...read moreread less

130

•Proceedings Article•10.1109/ASRU.2015.7404803

Multilingual representations for low resource speech recognition and keyword search

Jia Cui, +18 more

- 11 Sep 2015

TL;DR: This paper examines the impact of multilingual acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program and shows that these multilingual representations significantly improve ASR and KWS performance.

...read moreread less

105

•Journal Article•10.5120/20249-2617

Speech Recognition System: A Review

Nitin Washani, +1 more

- 22 Apr 2015

- International Journal of Computer Applic...

TL;DR: This paper presents the advances made as well as highlights the pressing problems for a speech recognition system and classifies the system into Front End and Back End for better understanding and representation of speech Recognition system in each part.

...read moreread less

52

Proceedings Article•10.1109/ICCUBEA.2015.135

Analysis of Speech Features for Emotion Detection: A Review

Rode Snehal Sudhakar, +1 more

- 26 Feb 2015

TL;DR: Emotion detection of speech in human machine interaction is very important, that includes various modules performing actions like speech to text conversion, feature extraction, feature selection and classification of those features to identify the emotions.

...read moreread less

29

Proceedings Article•10.1109/ICTTA.2006.1684487

Speech Recognition for Disabilities People

B. Ben Mosbah

- 16 Oct 2006

TL;DR: The work developed consists in adapting some of the existing systems of speech recognition to the people who have articulator handicaps, using a dynamic approach of training which makes it possible the system progressively to adapt to the users during his use.

...read moreread less

26

...

Expand

References

•Proceedings Article

Full-Gradient Representation for Neural Network Visualization

Suraj Srinivas, +1 more

- 01 Jan 2019

TL;DR: In this article, the authors propose to decompose the neural network response into input sensitivity and per-neuron sensitivity components, which is called full-gradients, and then combine these components to obtain an approximate saliency map representation.

...read moreread less

200

•Proceedings Article•10.1109/ICB45273.2019.8987375

Vulnerability assessment and detection of Deepfake videos

Pavel Korshunov, +1 more

- 04 Jun 2019

TL;DR: This paper presents the first publicly available set of Deepfake videos generated from videos of VidTIMIT database, and demonstrates that GAN-generated Deep fake videos are challenging for both face recognition systems and existing detection methods.

...read moreread less

145

•Posted Content

On Improving Deep Reinforcement Learning for POMDPs

Pengfei Zhu, +3 more

- 26 Apr 2017

- arXiv: Learning

TL;DR: This work proposes a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains and demonstrates the effectiveness of the new architecture in several partially observable domains, including flickering Atari games.

...read moreread less

130

•Journal Article•10.1109/TIFS.2020.3013214

Learning One Class Representations for Face Presentation Attack Detection Using Multi-Channel Convolutional Neural Networks

Anjith George, +1 more

- 01 Jan 2021

- IEEE Transactions on Information Forensi...

TL;DR: A new framework for PAD is proposed using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network (MCCNN) and a novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks.

...read moreread less

128

•Journal Article•10.1109/TBIOM.2020.3010312

Deep Models and Shortwave Infrared Information to Detect Face Presentation Attacks

Guillaume Heusch, +4 more

- 22 Jul 2020

TL;DR: The best proposed approach is able to almost perfectly detect all impersonation attacks while ensuring low bonafide classification errors, and obtained results show that obfuscation attacks are more difficult to detect.

...read moreread less

123