Reconstructing Speech from Human Auditory Cortex
Brian N. Pasley,Stephen V. David,Nima Mesgarani,Nima Mesgarani,Adeen Flinker,Shihab A. Shamma,Nathan E. Crone,Robert T. Knight,Robert T. Knight,Edward F. Chang +9 more
TL;DR: Direct brain recordings from neurosurgical patients listening to speech reveal that the acoustic speech signals can be reconstructed from neural activity in auditory cortex.
read more
Abstract: How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Cortical tracking of hierarchical linguistic structures in connected speech
Nai Ding,Lucia Melloni,Lucia Melloni,Lucia Melloni,Hang Zhang,Xing Tian,David Poeppel,David Poeppel +7 more
TL;DR: It is found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences.
Selective cortical representation of attended speaker in multi-talker speech perception
Nima Mesgarani,Edward F. Chang +1 more
TL;DR: It is demonstrated that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone.
Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party”
Elana Zion Golumbic,Elana Zion Golumbic,Nai Ding,Stephan Bickel,Stephan Bickel,Peter Lakatos,Catherine A. Schevon,Guy M. McKhann,Robert R. Goodman,Ronald G. Emerson,Ashesh D. Mehta,Ashesh D. Mehta,Jonathan Z. Simon,David Poeppel,Charles E. Schroeder,Charles E. Schroeder +15 more
TL;DR: It is found that brain activity dynamically tracks speech streams using both low-frequency phase and high-frequency amplitude fluctuations and that optimal encoding likely combines the two.
922
Emergence of neural encoding of auditory objects while listening to competing speakers
Nai Ding,Jonathan Z. Simon +1 more
TL;DR: Recording from subjects selectively listening to one of two competing speakers using magnetoencephalography indicates that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.
883
Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG
James O’Sullivan,Alan J. Power,Nima Mesgarani,Siddharth Rajaram,John J. Foxe,Barbara G. Shinn-Cunningham,Malcolm Slaney,Shihab A. Shamma,Edmund C. Lalor +8 more
TL;DR: It is shown that single-trial unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment and a significant correlation between the EEG-based measure of attention and performance on a high-level attention task is shown.
830
References
The elements of statistical learning. 2001
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
17.2K
The Elements of Statistical Learning
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
15.5K
Receptive fields, binocular interaction and functional architecture in the cat's visual cortex
David H. Hubel,Torsten N. Wiesel +1 more
TL;DR: This method is used to examine receptive fields of a more complex type and to make additional observations on binocular interaction and this approach is necessary in order to understand the behaviour of individual cells, but it fails to deal with the problem of the relationship of one cell to its neighbours.
14.3K
•Book
Fundamentals of speech recognition
Lawrence R. Rabiner,Biing-Hwang Juang +1 more
- 01 Jan 1993
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
9.4K
The cortical organization of speech processing
Gregory Hickok,David Poeppel +1 more
TL;DR: A dual-stream model of speech processing is outlined that assumes that the ventral stream is largely bilaterally organized — although there are important computational differences between the left- and right-hemisphere systems — and that the dorsal stream is strongly left- Hemisphere dominant.
5.1K