Top 1104 papers published in the topic of Speech processing in 2006

Showing papers on "Speech processing published in 2006"

Journal Article•10.1016/J.SPECOM.2006.04.003•

Emotional speech recognition: Resources, features, and methods

[...]

Dimitrios Ververidis¹, Constantine Kotropoulos¹•Institutions (1)

01 Sep 2006-Speech Communication

TL;DR: This paper overviews emotional speech recognition having in mind three goals to provide an up-to-date record of the available emotional speech data collections, and examines separately classification techniques that exploit timing information from which that ignore it.

...read moreread less

1,015 citations

Journal Article•10.1121/1.2166600•

A glimpsing model of speech perception in noise.

[...]

Martin Cooke¹•Institutions (1)

University of Sheffield¹

28 Feb 2006-Journal of the Acoustical Society of America

TL;DR: An automatic speech recognition system, adapted for use with partially specified inputs, to identify consonants in noise revealed that cues to voicing are degraded more in the model than in human auditory processing.

...read moreread less

Abstract: Do listeners process noisy speech by taking advantage of "glimpses"-spectrotemporal regions in which the target signal is least affected by the background? This study used an automatic speech recognition system, adapted for use with partially specified inputs, to identify consonants in noise. Twelve masking conditions were chosen to create a range of glimpse sizes. Several different glimpsing models were employed, differing in the local signal-to-noise ratio (SNR) used for detection, the minimum glimpse size, and the use of information in the masked regions. Recognition results were compared with behavioral data. A quantitative analysis demonstrated that the proportion of the time-frequency plane glimpsed is a good predictor of intelligibility. Recognition scores in each noise condition confirmed that sufficient information exists in glimpses to support consonant identification. Close fits to listeners' performance were obtained at two local SNR thresholds: one at around 8 dB and another in the range -5 to -2 dB. A transmitted information analysis revealed that cues to voicing are degraded more in the model than in human auditory processing.

...read moreread less

791 citations

Journal Article•10.1016/J.ACHA.2005.07.001•

On signal reconstruction without phase

[...]

Radu Balan¹, Peter G. Casazza², Dan Edidin²•Institutions (2)

Princeton University¹, University of Missouri²

01 May 2006-Applied and Computational Harmonic Analysis

TL;DR: In this paper, the authors construct new classes of Parseval frames for a Hilbert space which allow signal reconstruction from the absolute value of the frame coefficients without using phase or its estimation.

...read moreread less

744 citations

Journal Article•10.1109/TSA.2005.860851•

New insights into the noise reduction Wiener filter

[...]

Jingdong Chen¹, Jacob Benesty², Yiteng Huang³, Simon Doclo⁴•Institutions (4)

Bell Labs¹, Institut national de la recherche scientifique², Alcatel-Lucent³, Katholieke Universiteit Leuven⁴

01 Jul 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper studies the quantitative performance behavior of the Wiener filter in the context of noise reduction and shows that in the single-channel case the a posteriori signal-to-noise ratio (SNR) is greater than or equal to the a priori SNR (defined before theWiener filter), indicating that the Wieners filter is always able to achieve noise reduction.

...read moreread less

Abstract: The problem of noise reduction has attracted a considerable amount of research attention over the past several decades. Among the numerous techniques that were developed, the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches, which has been delineated in different forms and adopted in various applications. Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (appreciable or even significant degradation in quality or intelligibility), few efforts have been reported to show the inherent relationship between noise reduction and speech distortion. By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated, this paper studies the quantitative performance behavior of the Wiener filter in the context of noise reduction. We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter), indicating that the Wiener filter is always able to achieve noise reduction. However, the amount of noise reduction is in general proportional to the amount of speech degradation. This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion. Fortunately, we show that speech distortion can be better managed in three different ways. If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal, this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion. When no a priori knowledge is available, we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter, resulting in a suboptimal Wiener filter. In case that we have multiple microphone sensors, the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion

...read moreread less

725 citations

Journal Article•10.1016/J.SPECOM.2005.08.005•

A noise-estimation algorithm for highly non-stationary environments

[...]

Sundarrajan Rangachari¹, Philipos C. Loizou¹•Institutions (1)

University of Texas at Dallas¹

01 Feb 2006-Speech Communication

TL;DR: The proposed noise-estimation algorithm when integrated in speech enhancement was preferred over other noise-ESTimation algorithms, indicating that the local minimum estimation algorithm adapts very quickly to highly non-stationary noise environments.

...read moreread less

497 citations

Book•

Digital Speech Transmission: Enhancement, Coding and Error Concealment

[...]

Peter Vary, Rainer Martin

3 Mar 2006

TL;DR: This chapter discusses models of Speech Production and Hearing, performance of the Auditory Organs, and statistical properties of Speech Signals in the DFT Domain.

...read moreread less

Abstract: 1 Introduction. 2 Models of Speech Production and Hearing. 2.1 Organs of Speech Production. 2.2 Characteristics of Speech Signals. 2.3 Model of Speech Production. 2.4 Anatomy of Hearing. 2.5 Performance of the Auditory Organs. Bibliography. 3 Spectral Transformations. 3.1 Fourier Transform of Continuous Signals. 3.2 Fourier Transform of Discrete Signals. 3.3 Linear Shift Invariant Systems. 3.4 The z-Transform. 3.5 The Discrete Fourier Transform. 3.6 Fast Convolution. 3.7 Cepstral Analysis. Bibliography. 4 Filter Banks for Spectral Analysis and Synthesis. 4.1 Spectral Analysis Using Narrow-Band Filters. 4.2 Polyphase Network Filter Banks. 4.3 QuadratureMirror Filter Banks. Bibliography. 5 Stochastic Signals and Estimation. 5.1 Basic Concepts. 5.2 Expectations andMoments. 5.3 Bivariate Statistics. 5.4 Probability and Information. 5.5 Multivariate Statistics. 5.6 Stochastic Processes. 5.7 Estimation of Statistical Quantities by Time Averages. 5.8 Power Spectral Densities. 5.9 Estimation of the Power Spectral Density. 5.10 Statistical Properties of Speech Signals. 5.11 Statistical Properties of DFT Coe.cients. 5.12 Optimal Estimation. Bibliography. 6 Linear Prediction. 6.1 Vocal TractModels and Short-TermPrediction. 6.2 Optimal Prediction Coe.cients for Stationary Signals. 6.3 Predictor Adaptation. 6.4 Long-TermPrediction. Bibliography. 7 Quantization. 7.1 Analog Samples and Digital Presentation. 7.2 Uniform Quantization. 7.3 Non-uniformQuantization. 7.4 OptimalQuantization. 7.5 Adaptive Quantization. 7.6 Vector Quantization. 7.6.1 Principle. Bibliography. 8 Speech Coding. 8.1 Classi.cation of Speech Coding Algorithms. 8.2 Model-Based Predictive Coding. 8.3 Di.erentialWaveform Coding. 8.4 Parametric Coding. 8.5 Hybrid Coding. 8.6 Adaptive Post.ltering. Bibliography. 9 Error Concealment and Softbit Decoding. 9.1 Hardbit Source Decoding. 9.2 Conventional Error Concealment. 9.3 Softbits and L-Values. 9.4 Softbit Source Decoding (SD). 9.5 Application toModel Parameters. 9.6 Further Improvements. Bibliography. 10 Bandwidth Extension of Speech Signals (BWE). 10.1 Narrowband versusWideband Telephony. 10.2 Speech Coding with Integrated BWE. 10.3 BWE without Auxiliary Transmission. Bibliography. 11 Single and Dual Channel Noise Reduction. 11.1 Introduction. 11.2 LinearMMSE Estimators. 11.3 Speech Enhancement in the DFT Domain. 11.4 Optimal Non-Linear Estimators. 11.5 Joint Optimum Detection and Estimation of Speech. 11.6 Computation of Likelihood Ratios. 11.7 Estimation of the A Priory Probability of Speech Presence. 11.8 VAD and Noise Estimation Techniques. 11.9 Dual-Channel Noise Reduction. Bibliography. 12 Multi-Channel Noise Reduction. 12.1 Introduction. 12.2 Spatial Sampling of Sound Fields. 12.3 Beamforming. 12.4 PerformanceMeasures and Spatial Aliasing. 12.5 Design of Fixed Beamformers. 12.6 Adaptive Beamformers. Bibliography. 13 Acoustic Echo Control. 13.1 The Echo Control Problem. 13.2 Evaluation Criteria. 13.3 TheWiener Solution. 13.4 The LMS and NLMS Algorithm. 13.5 Convergence Analysis and Control of the LMS Algorithm. 13.6 Geometric Projection Interpretation of the NLMS Algorithm. 13.7 The A.ne Projection Algorithm. 13.8 Least-Squares and Recursive Least-Squares Algorithms. 13.9 Block Processing and Frequency-Domain Adaptive Filters. 13.9.1 Block LMS Algorithm. 13.10 Additional Measures for Echo Control. 13.11 Stereophonic Acoustic Echo Control. A Codec Standards. B Speech Quality Assessment. Bibliography.

...read moreread less

364 citations

Journal Article•10.1037/0012-1649.42.4.643•

Infants' early ability to segment the conversational speech signal predicts later language development: a retrospective analysis.

[...]

Rochelle S. Newman¹, Nan Bernstein Ratner¹, Ann Marie Jusczyk², Peter W. Jusczyk², Kathy Ayala Dow¹ - Show less +1 more•Institutions (2)

University of Maryland, College Park¹, Johns Hopkins University²

01 Jul 2006-Developmental Psychology

TL;DR: Analysis of relationships between infants' early speech processing performance and later language and cognitive outcomes suggests speech segmentation ability is an important prerequisite for successful language development, and measures to detect language impairment at an earlier age offer potential.

...read moreread less

Abstract: Two studies examined relationships between infants' early speech processing performance and later language and cognitive outcomes. Study 1 found that performance on speech segmentation tasks before 12 months of age related to expressive vocabulary at 24 months. However, performance on other tasks was not related to 2-year vocabulary. Study 2 assessed linguistic and cognitive skills at 4-6 years of age for children who had participated in segmentation studies as infants. Children who had been able to segment words from fluent speech scored higher on language measures, but not general IQ, as preschoolers. Results suggest that speech segmentation ability is an important prerequisite for successful language development, and they offer potential for developing measures to detect language impairment at an earlier age.

...read moreread less

348 citations

Journal Article•10.1250/AST.27.349•

STRAIGHT, exploitation of the other aspect of VOCODER : Perceptually isomorphic decomposition of speech sounds

[...]

Hideki Kawahara¹•Institutions (1)

Wakayama University¹

01 Nov 2006-Acoustical Science and Technology

TL;DR: This review outlines historical backgrounds, architecture, underlying principles, and representative applications of STRAIGHT.

...read moreread less

Abstract: STRAIGHT, a speech analysis, modification synthesis system, is an extension of the classical channel VOCODER that exploits the advantages of progress in information processing technologies and a new conceptualization of the role of repetitive structures in speech sounds. This review outlines historical backgrounds, architecture, underlying principles, and representative applications of STRAIGHT.

...read moreread less

328 citations

Journal Article•10.1121/1.2188377•

Cue weighting in auditory categorization: implications for first and second language acquisition.

[...]

Lori L. Holt¹, Andrew J. Lotto²•Institutions (2)

Carnegie Mellon University¹, University of Texas at Austin²

27 Apr 2006-Journal of the Acoustical Society of America

TL;DR: The results demonstrate that even when equally informative and discriminable, acoustic cues are not necessarily equally weighted in categorization; listeners exhibit biases when integrating multiple acoustic dimensions.

...read moreread less

Abstract: The ability to integrate and weight information across dimensions is central to perception and is particularly important for speech categorization. The present experiments investigate cue weighting by training participants to categorize sounds drawn from a two-dimensional acoustic space defined by the center frequency (CF) and modulation frequency (MF) of frequency-modulated sine waves. These dimensions were psychophysically matched to be equally discriminable and, in the first experiment, were equally informative for accurate categorization. Nevertheless, listeners' category responses reflected a bias for use of CF. This bias remained even when the informativeness of CF was decreased by shifting distributions to create more overlap in CF. A reversal of weighting (MF over CF) was obtained when distribution variance was increased for CF. These results demonstrate that even when equally informative and discriminable, acoustic cues are not necessarily equally weighted in categorization; listeners exhibit biases when integrating multiple acoustic dimensions. Moreover, changes in weighting strategies can be affected by changes in input distribution parameters. This methodology provides potential insights into acquisition of speech sound categories, particularly second language categories. One implication is that ineffective cue weighting strategies for phonetic categories may be alleviated by manipulating variance of uninformative dimensions in training stimuli.

...read moreread less

324 citations

Journal Article•10.1121/1.2213572•

Effects of language experience and stimulus complexity on the categorical perception of pitch direction

[...]

Yisheng Xu¹, Jackson T. Gandour, Alexander L. Francis•Institutions (1)

Purdue University¹

08 Aug 2006-Journal of the Acoustical Society of America

TL;DR: Results lead this cross-language study of the categorical nature of tone perception to adopt a memory-based, multistore model of perception in which categorization is domain-general but influenced by long-term categorical representations.

...read moreread less

Abstract: Whether or not categorical perception results from the operation of a special, language-specific, speech mode remains controversial. In this cross-language (Mandarin Chinese, English) study of the categorical nature of tone perception, we compared native Mandarin and English speakers’ perception of a physical continuum of fundamental frequency contours ranging from a level to rising tone in both Mandarin speech and a homologous (nonspeech) harmonic tone. This design permits us to evaluate the effect of language experience by comparing Chinese and English groups; to determine whether categorical perception is speech-specific or domain-general by comparing speech to nonspeech stimuli for both groups; and to examine whether categorical perception involves a separate categorical process, distinct from regions of sensory discontinuity, by comparing speech to nonspeech stimuli for English listeners. Results show evidence of strong categorical perception of speech stimuli for Chinese but not English listeners. Categorical perception of nonspeech stimuli was comparable to that for speech stimuli for Chinese but weaker for English listeners, and perception of nonspeech stimuli was more categorical for English listeners than was perception of speech stimuli. These findings lead us to adopt a memory-based, multistore model of perception in which categorization is domain-general but influenced by long-term categorical representations.

...read moreread less

304 citations

Journal Article•10.1037/0096-1523.32.5.1276•

Does a regional accent perturb speech processing

[...]

Caroline Floccia¹, Jeremy Goslin², Frédérique Girard³, Gabrielle Konopczynski³•Institutions (3)

Centre national de la recherche scientifique¹, University of Plymouth², University of Franche-Comté³

01 Jan 2006-Journal of Experimental Psychology: Human Perception and Performance

TL;DR: The findings of these experiments indicate that regional accent normalization involves a short-term adjustment mechanism that develops as a certain amount of accented signal is available, resulting in a temporary perturbation in speech processing.

...read moreread less

Abstract: The processing costs involved in regional accent normalization were evaluated by measuring differences in lexical decision latencies for targets placed at the end of sentences with different French regional accents. Over a series of 6 experiments, the authors examined the time course of comprehension disruption by manipulating the duration and presentation conditions of accented speech. Taken together, the findings of these experiments indicate that regional accent normalization involves a short-term adjustment mechanism that develops as a certain amount of accented signal is available, resulting in a temporary perturbation in speech processing.

...read moreread less

Journal Article•10.1080/14992020600782956•

Benefits of bilateral cochlear implants and/or hearing aids in children

[...]

Ruth Y. Litovsky¹, Patti M. Johnstone¹, Shelly Godar¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2006-International Journal of Audiology

TL;DR: Functional benefits from bilateral stimulation in 20 children ages 4–14 show that both groups perform similarly when speech reception thresholds are evaluated, but there appears to be benefit from wearing two devices compared with a single device that is significantly greater in the group with two CI than in the bimodal group.

...read moreread less

Abstract: This study evaluated functional benefits from bilateral stimulation in 20 children ages 4-14, 10 use two CIs and 10 use one CI and one HA. Localization acuity was measured with the minimum audible angle (MAA). Speech intelligibility was measured in quiet, and in the presence of 2-talker competing speech using the CRISP forced-choice test. Results show that both groups perform similarly when speech reception thresholds are evaluated. However, there appears to be benefit (improved MAA and speech thresholds) from wearing two devices compared with a single device that is significantly greater in the group with two CI than in the bimodal group. Individual variability also suggests that some children perform similarly to normal-hearing children, while others clearly do not. Future advances in binaural fitting strategies and improved speech processing schemes that maximize binaural sensitivity will no doubt contribute to increasing the binaurally-driven advantages in persons with bilateral CIs.

...read moreread less

Journal Article•10.1121/1.2217714•

Adaptive control of vowel formant frequency: evidence from real-time formant manipulation.

[...]

David W. Purcell¹, Kevin G. Munhall•Institutions (1)

Queen's University¹

08 Aug 2006-Journal of the Acoustical Society of America

TL;DR: In two studies the first formant of monosyllabic consonant-vowel-consonant words was shifted electronically and fed back to the participant very quickly so that participants perceived the modified speech as their own productions and appeared to more actively stabilize their productions from trial-to-trial.

...read moreread less

Abstract: Auditory feedback during speech production is known to play a role in speech sound acquisition and is also important for the maintenance of accurate articulation. In two studies the first formant (F1) of monosyllabic consonant-vowel-consonant words (CVCs) was shifted electronically and fed back to the participant very quickly so that participants perceived the modified speech as their own productions. When feedback was shifted up (experiment 1 and 2) or down (experiment 1) participants compensated by producing F1 in the opposite frequency direction from baseline. The threshold size of manipulation that initiated a compensation in F1 was usually greater than 60Hz. When normal feedback was returned, F1 did not return immediately to baseline but showed an exponential deadaptation pattern. Experiment 1 showed that this effect was not influenced by the direction of the F1 shift, with both raising and lowering of F1 exhibiting the same effects. Experiment 2 showed that manipulating the number of trials that F1 ...

...read moreread less

Journal Article•10.1016/J.SPECOM.2006.09.003•

Binary and Ratio Time-frequency Masks for Robust Speech Recognition

[...]

Soundararajan Srinivasan¹, Nicoleta Roman¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Nov 2006-Speech Communication

TL;DR: In this article, a time-varying Wiener filter is used to specify the ratio of a target signal and a noisy mixture in a local time-frequency unit, which is then fed to a conventional speech recognizer operating in the cepstral domain.

...read moreread less

Journal Article•10.1121/1.2173514•

Compensation following real-time manipulation of formants in isolated vowels

[...]

David W. Purcell¹, Kevin G. Munhall•Institutions (1)

Queen's University¹

28 Mar 2006-Journal of the Acoustical Society of America

TL;DR: The rapid formant compensations found here suggest that auditory feedback control is similar for both F0 and formants.

...read moreread less

Abstract: Auditory feedback influences human speech production, as demonstrated by studies using rapid pitch and loudness changes. Feedback has also been investigated using the gradual manipulation of formants in adaptation studies with whispered speech. In the work reported here, the first formant of steady-state isolated vowels was unexpectedly altered within trials for voiced speech. This was achieved using a real-time formant tracking and filtering system developed for this purpose. The first formant of vowel /epsilon/ was manipulated 100% toward either /ae/ or /I/, and participants responded by altering their production with average Fl compensation as large as 16.3% and 10.6% of the applied formant shift, respectively. Compensation was estimated to begin <460 ms after stimulus onset. The rapid formant compensations found here suggest that auditory feedback control is similar for both F0 and formants.

...read moreread less

Journal Article•10.1109/TSP.2006.874403•

Voice activity detection based on multiple statistical models

[...]

Joon-Hyuk Chang¹, Nam Soo Kim², Sanjit K. Mitra³•Institutions (3)

Inha University¹, Seoul National University², University of California, Santa Barbara³

01 Jun 2006-IEEE Transactions on Signal Processing

TL;DR: This paper proposes a class of VAD algorithms based on several statistical models based on the Gaussian model, and incorporates the complex Laplacian and Gamma probability density functions to the analysis of statistical properties.

...read moreread less

Abstract: One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. In this paper, we propose a class of VAD algorithms based on several statistical models. In addition to the Gaussian model, we also incorporate the complex Laplacian and Gamma probability density functions to our analysis of statistical properties. With a goodness-of-fit tests, we analyze the statistical properties of the DFT spectra of the noisy speech under various noise conditions. Based on the statistical analysis, the likelihood ratio test under the given statistical models is established for the purpose of VAD. Since the statistical characteristics of the speech signal are differently affected by the noise types and levels, to cope with the time-varying environments, our approach is aimed at finding adaptively an appropriate statistical model in an online fashion. The performance of the proposed VAD approaches in both the stationary and nonstationary noise environments is evaluated with the aid of an objective measure.

...read moreread less

Journal Article•10.1109/TSA.2005.858055•

Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

[...]

Nima Mesgarani¹, Malcolm Slaney², Shihab A. Shamma¹•Institutions (2)

University of Maryland, College Park¹, IBM²

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds is described.

...read moreread less

Abstract: We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.

...read moreread less

Journal Article•10.1121/1.2188331•

Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei

[...]

Matthew P. Aylett, Alice Turk¹•Institutions (1)

University of Edinburgh¹

27 Apr 2006-Journal of the Acoustical Society of America

TL;DR: Tests of the smooth signal redundancy hypothesis with a very high-quality corpus collected for speech synthesis confirm the duration/language redundancy results achieved in previous work, and show a significant relationship between language redundancy factors and the first two formants, although these results vary considerably by vowel.

...read moreread less

Abstract: The language redundancy of a syllable, measured by its predictability given its context and inherent frequency, has been shown to have a strong inverse relationship with syllabic duration. This relationship is predicted by the smooth signal redundancy hypothesis, which proposes that robust communication in a noisy environment can be achieved with an inverse relationship between language redundancy and the predictability given acoustic observations (acoustic redundancy). A general version of the hypothesis predicts similar relationships between the spectral characteristics of speech and language redundancy. However, investigating this claim is hampered by difficulties in measuring the spectral characteristics of speech within large conversational corpora, and difficulties in forming models of acoustic redundancy based on these spectral characteristics. This paper addresses these difficulties by testing the smooth signal redundancy hypothesis with a very high-quality corpus collected for speech synthesis, and presents both durational and spectral data from vowel nuclei on a vowel-by-vowel basis. Results confirm the duration/ language redundancy results achieved in previous work, and show a significant relationship between language redundancy factors and the first two formants, although these results vary considerably by vowel. In general, however, vowels show increased centralization with increased language redundancy.

...read moreread less

Book Chapter•10.1159/000094648•

Speech processing in vocoder-centric cochlear implants

[...]

Philipos C. Loizou¹•Institutions (1)

University of Texas at Dallas¹

13 Jul 2006-Advances in oto-rhino-laryngology

TL;DR: An overview of the various vocoder-centric processing strategies proposed for cochlear implants since the late 1990s is provided including the strategies used in different commercially available implant processors.

...read moreread less

Abstract: The principles of the most recent cochlear implant processors are similar to that of the channel vocoder, originally used for transmitting speech over telephone lines with much less bandwidth than that required for transmitting the unprocessed speech signal. An overview of the various vocoder-centric processing strategies proposed for cochlear implants since the late 1990s is provided including the strategies used in different commercially available implant processors. Special emphasis is placed on reviewing the strategies designed to enhance pitch information for potentially better music perception. The various noise suppression strategies proposed over the years based on multi-microphone and single-microphone inputs are also described.

...read moreread less

Journal Article•10.1109/TSA.2005.858066•

A two-stage algorithm for one-microphone reverberant speech enhancement

[...]

Mingyang Wu¹, DeLiang Wang²•Institutions (2)

FICO¹, Ohio State University²

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A comparison with a recent enhancement algorithm is made on a corpus of speech utterances in a number of reverberant conditions, and the results show that the proposed algorithm performs substantially better.

...read moreread less

Abstract: Under noise-free conditions, the quality of reverberant speech is dependent on two distinct perceptual components: coloration and long-term reverberation. They correspond to two physical variables: signal-to-reverberant energy ratio (SRR) and reverberation time, respectively. Inspired by this observation, we propose a two-stage reverberant speech enhancement algorithm using one microphone. In the first stage, an inverse filter is estimated to reduce coloration effects or increase SRR. The second stage employs spectral subtraction to minimize the influence of long-term reverberation. The proposed algorithm significantly improves the quality of reverberant speech. A comparison with a recent enhancement algorithm is made on a corpus of speech utterances in a number of reverberant conditions, and the results show that our algorithm performs substantially better.

...read moreread less

Patent•

Robust separation of speech signals in a noisy environment

[...]

Erik Visser¹, Jeremy Toman¹, Kwokleung Chan¹•Institutions (1)

Qualcomm¹

21 Jul 2006

TL;DR: In this paper, a method for improving the quality of a speech signal extracted from a noisy acoustic environment is provided, where a signal separation process (180) is associated with a voice activity detector (185).

...read moreread less

Abstract: A method for improving the quality of a speech signal extracted from a noisy acoustic environment is provided. In one approach, a signal separation process (180) is associated with a voice activity detector (185). The voice activity detector (185) is a two-channel (178,182) detector, which enables a particularly robust and accurate detection of voice activity. When a speech is detected, the voice activity detector generates a control signal (411). The control signal (411) is used to activate, adjust, or control signal separation processes or post -processing operations (195) to improve the quality of the resulting speech signal. In another approach, a signal separation process (180) is provided as a learning stage (752) and an output stage (756). The learning stage (752) aggressively adjus to current acoustic conditions and passes coefficients to the output stage (756). The output stage (756) adapts more slowly and generates a speech-content signal (181,770) and a noise dominant signal (407,773). When the learning stage (752) becomes unstable only the learning stage (752) is reset, allowing the output stage (756) to continue outputting a high quality speech signal.

...read moreread less

Patent•

Method and apparatus for improved estimation of non-stationary noise for speech enhancement

[...]

David Zhao, Willem Bastiaan Kleijn, Alexander Ypma, De Vries Bert

23 Aug 2006

TL;DR: In this article, the authors proposed a speech enhancement system that is able to suppress highly non-stationary noise, which can be adapted to a hearing aid or a headset, using a speech model and a noise model having at least one shape and gain.

...read moreread less

Abstract: A central aspect of the invention relates to a method of enhancing speech, the method comprising the steps of, receiving noisy speech comprising a clean speech component and a non-stationary noise component, providing a speech model, providing a noise model having at least one shape and a gain, dynamically modifying the noise model based on the speech model and the received noisy speech, enhancing the noisy speech at least based on the modified noise model Hereby is achieved a method of speech enhancement that is able to suppress highly non-stationary noise Another aspect of the invention relates to a speech enhancement system that may be adapted to be used in a hearing system, such as a hearing aid or a headset

...read moreread less

Journal Article•10.1016/J.TINS.2006.05.011•

Nature and nurture in language acquisition: anatomical and functional brain-imaging studies in infants.

[...]

Ghislaine Dehaene-Lambertz¹, Lucie Hertz-Pannier², Jessica Dubois•Institutions (2)

French Institute of Health and Medical Research¹, Necker-Enfants Malades Hospital²

01 Jul 2006-Trends in Neurosciences

TL;DR: Researchers examine how the infant brain processes verbal stimuli before learning to reveal a structural and functional organization close to what is described in adults and suggest a strong bias for speech processing in these regions that might guide infants as they discover the properties of their native language.

...read moreread less

Journal Article•10.1121/1.2179657•

Comparing the rhythm and melody of speech and music: The case of British English and French

[...]

Aniruddh D. Patel¹, John R. Iversen, Jason C. Rosenberg•Institutions (1)

The Neurosciences Institute¹

27 Apr 2006-Journal of the Acoustical Society of America

TL;DR: This study applies quantitative methods to the speech and music of England and France to reveal that music reflects patterns of durational contrast between successive vowels in spoken sentences, as well as patterns of pitch interval variability in speech.

...read moreread less

Abstract: For over half a century, musicologists and linguists have suggested that the prosody of a culture's native language is reflected in the rhythms and melodies of its instrumental music. Testing this idea requires quantitative methods for comparing musical and spoken rhythm and melody. This study applies such methods to the speech and music of England and France. The results reveal that music reflects patterns of durational contrast between successive vowels in spoken sentences, as well as patterns of pitch interval variability in speech. The methods presented here are suitable for studying speech-music relations in a broad range of cultures.

...read moreread less

Journal Article•10.1109/TASL.2006.874669•

A Dynamic Compressive Gammachirp Auditory Filterbank

[...]

Toshio Irino¹, Roy D. Patterson•Institutions (1)

Wakayama University¹

01 Nov 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A fast-acting level control circuit for the cGC filter is described and it is shown how psychophysical data involving two-tone suppression and compression can be used to estimate the parameter values for this dynamic version of the c GC filter (referred to as the "dcGC" filter).

...read moreread less

Abstract: It is now common to use knowledge about human auditory processing in the development of audio signal processors. Until recently, however, such systems were limited by their linearity. The auditory filter system is known to be level-dependent as evidenced by psychophysical data on masking, compression, and two-tone suppression. However, there were no analysis/synthesis schemes with nonlinear filterbanks. This paper describe 18300060s such a scheme based on the compressive gammachirp (cGC) auditory filter. It was developed to extend the gammatone filter concept to accommodate the changes in psychophysical filter shape that are observed to occur with changes in stimulus level in simultaneous, tone-in-noise masking. In models of simultaneous noise masking, the temporal dynamics of the filtering can be ignored. Analysis/synthesis systems, however, are intended for use with speech sounds where the glottal cycle can be long with respect to auditory time constants, and so they require specification of the temporal dynamics of auditory filter. In this paper, we describe a fast-acting level control circuit for the cGC filter and show how psychophysical data involving two-tone suppression and compression can be used to estimate the parameter values for this dynamic version of the cGC filter (referred to as the "dcGC" filter). One important advantage of analysis/synthesis systems with a dcGC filterbank is that they can inherit previously refined signal processing algorithms developed with conventional short-time Fourier transforms (STFTs) and linear filterbanks

...read moreread less

Jacquemot C, Scott SK. What is the relationship between phonological short-term memory and speech processing?

[...]

Charlotte Jacquemot, Sophie K. Scott

1 Jan 2006

TL;DR: It is proposed that pSTM arises from the cycling of information between two phonological buffers, one involved in speech perception and one in speech production, and the understanding of their neural bases will benefit from incorporating them.

...read moreread less

Abstract: Traditionally, models of speech comprehension and production do not depend on concepts and processes from the phonological short-term memory (pSTM) literature. Likewise, in working memory research, pSTM is considered to be a language-independent system that facilitates language acquisition rather than speech processing per se. We discuss couplings between pSTM, speech perception and speech production, and we propose that pSTM arises from the cycling of information between two phonological buffers, one involved in speech perception and one in speech production. We discuss the specific role of these processes in speech processing, and argue that models of speech perception and production, and our understanding of their neural bases, will benefit from incorporating them.

...read moreread less

Journal Article•10.1109/TSA.2005.860774•

The ATR Multilingual Speech-to-Speech Translation System

[...]

Satoshi Nakamura, Konstantin Markov, Hiromi Nakaiwa, Genichiro Kikui, Hisashi Kawai, Takatoshi Jitsuhiro, Jinsong Zhang, H. Yamamoto, Eiichiro Sumita, Seiichi Yamamoto - Show less +6 more

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages, uses a parallel multilingual database consisting of over 600 000 sentences that cover a broad range of travel-related conversations.

...read moreread less

Abstract: In this paper, we describe the ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages (Japanese and Chinese). There are three main modules of our S2ST system: large-vocabulary continuous speech recognition, machine text-to-text (T2T) translation, and text-to-speech synthesis. All of them are multilingual and are designed using state-of-the-art technologies developed at ATR. A corpus-based statistical machine learning framework forms the basis of our system design. We use a parallel multilingual database consisting of over 600 000 sentences that cover a broad range of travel-related conversations. Recent evaluation of the overall system showed that speech-to-speech translation quality is high, being at the level of a person having a Test of English for International Communication (TOEIC) score of 750 out of the perfect score of 990.

...read moreread less

Journal Article•10.1111/J.1467-7687.2006.00482.X•

Names in frames: infants interpret words in sentence frames faster than words in isolation

[...]

Anne Fernald¹, Nereyda Hurtado¹•Institutions (1)

Stanford University¹

01 May 2006-Developmental Science

TL;DR: Online measures of speech processing are used in a looking-while-listening procedure and suggest familiar frames may enable the infant to 'listen ahead' more efficiently for the focused word at the end of the sentence.

...read moreread less

Abstract: In child-directed speech (CDS), adults often use utterances with very few words; many include short, frequently used sentence frames, while others consist of a single word in isolation. Do such features of CDS provide perceptual advantages for the child? Based on descriptive analyses of parental speech, some researchers argue that isolated words should help infants in word recognition by facilitating segmentation, while others predict no advantage. To address this question directly, we used online measures of speech processing in a looking-while-listening procedure. In two experiments, 18-month-olds were presented with familiar object names in isolation and in a sentence frame. Infants were 120 ms slower to interpret target words in isolation than when the same words were preceded by a familiar carrier phrase, suggesting that the sentence frame facilitated word recognition. Familiar frames may enable the infant to ‘listen ahead’ more efficiently for the focused word at the end of the sentence.

...read moreread less

Journal Article•10.1016/J.NEUROIMAGE.2005.10.002•

Perceiving identical sounds as speech or non-speech modulates activity in the left posterior superior temporal sulcus

[...]

Riikka Möttönen¹, Gemma A. Calvert², Iiro P. Jääskeläinen¹, Iiro P. Jääskeläinen³, Paul M. Matthews⁴, Thomas Thesen², Jyrki Tuomainen⁵, Jyrki Tuomainen¹, Mikko Sams¹ - Show less +5 more•Institutions (5)

Helsinki University of Technology¹, University of Bath², Massachusetts Institute of Technology³, University of Oxford⁴, University of Turku⁵

01 Apr 2006-NeuroImage

TL;DR: The present findings suggest that activation of the neural speech representations in the left STSp might be a pre-requisite for hearing sounds as speech.

...read moreread less

Journal Article•10.1159/000095616•

Acoustic plus electric speech processing: preliminary results of a multicenter clinical trial of the Iowa/Nucleus Hybrid implant.

[...]

Bruce J. Gantz¹, Christopher W. Turner, Kate Gfeller•Institutions (1)

Roy J. and Lucille A. Carver College of Medicine¹

01 Jan 2006-Audiology and Neuro-otology

TL;DR: The improvement of speech in noise and melody recognition is linked to the ability to distinguish fine pitch differences as the result of preserved residual low-frequency acoustic hearing.

...read moreread less

Abstract: Aim: This communication details the latest preliminary results from an ongoing multicenter single-subject design clinical trial of the Iowa/Nucleus Hybrid 10-mm cochlear implant. Selection criteria, surgical strategies used for hearing preservation, and the benefits of preserved residual low-frequency hearing, improved word understanding in noise, and music appreciation are described. Patients and Methods: The device has been implanted in 48 individuals with residual low-frequency hearing. Results:Hearing preservation has been accomplished in 46/48 subjects. Acoustic speech perception has also been preserved. Combined acoustic plus electric speech processing has enabled most of this group of volunteers to gain improved word understanding as compared to their preoperative hearing with bilateral hearing aids. A subset of subjects with 12 months or more experience demonstrates CNC word understanding continues to improve more than 24 months after implantation. Improved word understanding in noise is also a benefit of acoustic plus electric speech processing. Conclusions:The improvement of speech in noise and melody recognition is linked to the ability to distinguish fine pitch differences as the result of preserved residual low-frequency acoustic hearing. Both of these measures are very important in real life to the hearing impaired. Preservation of residual low-frequency hearing should be considered when expanding candidate selection criteria for standard cochlear implants.

...read moreread less

...

Expand