Book Chapter10.1016/B978-1-85617-678-1.00012-0
Speech and Audio Processing
Hazarathaiah Malepati
- 01 Jan 2010
pp 595-635
53
TL;DR: This chapter provides the discussion of sound and audio signals, and then explores how audio data is presented to the processor from a variety of audio converters.
read more
Abstract: This chapter provides the discussion of sound and audio signals, and then explores how audio data is presented to the processor from a variety of audio converters. Also describes the formats in which audio data is stored and processed and reviews the compromises associated with selecting data sizes. Audio and speech coding is used in digital audio broadcasting (DAB), VoIP phone, media players, military applications, cinema, home entertainment systems, and distance learning, among many other applications. Sound is a longitudinal displacement wave that propagates through a medium, such as air. Speech signals can be considered a subset of audio signals. Speech signals contain information about the time-varying characteristics of the excitation source and the vocal tract system. Speech signals are nonstationary and at best they can be considered quasistationary over short time periods Sound waves are defined in terms of amplitude and frequency attributes. Amplitude describes the sound pressure displacement above and below the equilibrium atmospheric level. In other words, the amplitude of a soundwave is a gauge of pressure change, measured in decibels (dB). The lowest sound amplitude that the human ear can perceive is called the “threshold of hearing,” denoted by 0 dBSPL.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
On Improving Deep Reinforcement Learning for POMDPs
TL;DR: This work proposes a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains and demonstrates the effectiveness of the new architecture in several partially observable domains, including flickering Atari games.
130
Multilingual representations for low resource speech recognition and keyword search
Jia Cui,Brian Kingsbury,Bhuvana Ramabhadran,Abhinav Sethy,Kartik Audhkhasi,Xiaodong Cui,Ellen Kislal,Lidia Mangu,Markus Nussbaum-Thom,Michael Picheny,Zoltán Tüske,Pavel Golik,Ralf Schlüter,Hermann Ney,Mark J. F. Gales,Kate Knill,Anton Ragni,Haipeng Wang,P.C. Woodland +18 more
- 11 Sep 2015
TL;DR: This paper examines the impact of multilingual acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program and shows that these multilingual representations significantly improve ASR and KWS performance.
Speech Recognition System: A Review
Nitin Washani,Sandeep Sharma +1 more
TL;DR: This paper presents the advances made as well as highlights the pressing problems for a speech recognition system and classifies the system into Front End and Back End for better understanding and representation of speech Recognition system in each part.
Analysis of Speech Features for Emotion Detection: A Review
Rode Snehal Sudhakar,Manjare Chandraprabha Anil +1 more
- 26 Feb 2015
TL;DR: Emotion detection of speech in human machine interaction is very important, that includes various modules performing actions like speech to text conversion, feature extraction, feature selection and classification of those features to identify the emotions.
29
Speech Recognition for Disabilities People
B. Ben Mosbah
- 16 Oct 2006
TL;DR: The work developed consists in adapting some of the existing systems of speech recognition to the people who have articulator handicaps, using a dynamic approach of training which makes it possible the system progressively to adapt to the users during his use.
26
References
•Proceedings Article
Full-Gradient Representation for Neural Network Visualization
Suraj Srinivas,François Fleuret +1 more
- 01 Jan 2019
TL;DR: In this article, the authors propose to decompose the neural network response into input sensitivity and per-neuron sensitivity components, which is called full-gradients, and then combine these components to obtain an approximate saliency map representation.
Vulnerability assessment and detection of Deepfake videos
Pavel Korshunov,Sébastien Marcel +1 more
- 04 Jun 2019
TL;DR: This paper presents the first publicly available set of Deepfake videos generated from videos of VidTIMIT database, and demonstrates that GAN-generated Deep fake videos are challenging for both face recognition systems and existing detection methods.
•Posted Content
On Improving Deep Reinforcement Learning for POMDPs
TL;DR: This work proposes a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains and demonstrates the effectiveness of the new architecture in several partially observable domains, including flickering Atari games.
130
Learning One Class Representations for Face Presentation Attack Detection Using Multi-Channel Convolutional Neural Networks
Anjith George,Sébastien Marcel +1 more
TL;DR: A new framework for PAD is proposed using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network (MCCNN) and a novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks.
128
Deep Models and Shortwave Infrared Information to Detect Face Presentation Attacks
Guillaume Heusch,Anjith George,David Geissbühler,Zohreh Mostaani,Sébastien Marcel +4 more
- 22 Jul 2020
TL;DR: The best proposed approach is able to almost perfectly detect all impersonation attacks while ensuring low bonafide classification errors, and obtained results show that obfuscation attacks are more difficult to detect.
123
Related Papers (5)
Liang Junbin
- 04 Aug 2017
Ren Chao
- 09 Mar 2016
Matthew I. Lloyd,Trausti Kristjansson +1 more
- 13 Jun 2011