TL;DR: It is proposed that pleasure in music arises from interactions between cortical loops that enable predictions and expectancies to emerge from sound patterns and subcortical systems responsible for reward and valuation.
Abstract: Music has existed in human societies since prehistory, perhaps because it allows expression and regulation of emotion and evokes pleasure. In this review, we present findings from cognitive neuroscience that bear on the question of how we get from perception of sound patterns to pleasurable responses. First, we identify some of the auditory cortical circuits that are responsible for encoding and storing tonal patterns and discuss evidence that cortical loops between auditory and frontal cortices are important for maintaining musical information in working memory and for the recognition of structural regularities in musical patterns, which then lead to expectancies. Second, we review evidence concerning the mesolimbic striatal system and its involvement in reward, motivation, and pleasure in other domains. Recent data indicate that this dopaminergic system mediates pleasure associated with music; specifically, reward value for music can be coded by activity levels in the nucleus accumbens, whose functional connectivity with auditory and frontal areas increases as a function of increasing musical reward. We propose that pleasure in music arises from interactions between cortical loops that enable predictions and expectancies to emerge from sound patterns and subcortical systems responsible for reward and valuation.
TL;DR: The data suggest that selective attention generates a dynamically evolving model of attended auditory stimulus streams in the form of modulatory subthreshold oscillations across tonotopically organized neuronal ensembles in A1 that enhances the representation of attended stimuli.
TL;DR: The fundamental perceptual unit in hearing is the 'auditory object', which is the computational result of the auditory system's capacity to detect, extract, segregate and group spectrotemporal regularities in the acoustic environment.
Abstract: The fundamental perceptual unit in hearing is the 'auditory object'. Similar to visual objects, auditory objects are the computational result of the auditory system's capacity to detect, extract, segregate and group spectrotemporal regularities in the acoustic environment; the multitude of acoustic stimuli around us together form the auditory scene. However, unlike the visual scene, resolving the component objects within the auditory scene crucially depends on their temporal structure. Neural correlates of auditory objects are found throughout the auditory system. However, neural responses do not become correlated with a listener's perceptual reports until the level of the cortex. The roles of different neural structures and the contribution of different cognitive states to the perception of auditory objects are not yet fully understood.
TL;DR: The results suggest that, in a complex listening environment, auditory cortex can selectively encode a speech stream in a background insensitive manner, and this stable neural representation of speech provides a plausible basis for background-invariant recognition of speech.
Abstract: Speech recognition is remarkably robust to the listening background, even when the energy of background sounds strongly overlaps with that of speech. How the brain transforms the corrupted acoustic signal into a reliable neural representation suitable for speech recognition, however, remains elusive. Here, we hypothesize that this transformation is performed at the level of auditory cortex through adaptive neural encoding, and we test the hypothesis by recording, using MEG, the neural responses of human subjects listening to a narrated story. Spectrally matched stationary noise, which has maximal acoustic overlap with the speech, is mixed in at various intensity levels. Despite the severe acoustic interference caused by this noise, it is here demonstrated that low-frequency auditory cortical activity is reliably synchronized to the slow temporal modulations of speech, even when the noise is twice as strong as the speech. Such a reliable neural representation is maintained by intensity contrast gain control and by adaptive processing of temporal modulations at different time scales, corresponding to the neural δ and θ bands. Critically, the precision of this neural synchronization predicts how well a listener can recognize speech in noise, indicating that the precision of the auditory cortical representation limits the performance of speech recognition in noise. Together, these results suggest that, in a complex listening environment, auditory cortex can selectively encode a speech stream in a background insensitive manner, and this stable neural representation of speech provides a plausible basis for background-invariant recognition of speech.
TL;DR: It is shown that misophonia is a disorder that produces distinct autonomic effects not seen in typically developed individuals, and heightened ratings and skin conductance responses to auditory, but not visual stimuli, relative to a group of typically developed controls.
Abstract: Misophonia is a relatively unexplored chronic condition in which a person experiences autonomic arousal (analogous to an involuntary “fight-or-flight” response) to certain innocuous or repetitive sounds such as chewing, pen clicking and lip smacking Misophonics report anxiety, panic and rage when exposed to trigger sounds, compromising their ability to complete everyday tasks and engage in healthy and normal social interactions Across two experiments, we measured behavioral and physiological characteristics of the condition Interviews (Experiment 1) with misophonics showed that the most problematic sounds are generally related to other people's behavior (pen clicking, chewing sounds) Misophonics are however not bothered when they produce these “trigger” sounds themselves, and some report mimicry as a coping strategy Next, (Experiment 2) we tested the hypothesis that misophonics’ subjective experiences evoke an anomalous physiological response to certain auditory stimuli Misophonic individuals showed heightened ratings and skin conductance responses to auditory, but not visual stimuli, relative to a group of typically developed controls, supporting this general viewpoint and indicating that misophonia is a disorder that produces distinct autonomic effects not seen in typically developed individuals
TL;DR: Findings show that measures of frequency, rise time, and duration discrimination as well as amplitude modulation and frequency modulation detection were most often impaired in individuals with dyslexia.
Abstract: A review of research that uses behavioral, electroencephalographic, and/or magnetoencephalographic methods to investigate auditory processing deficits in individuals with dyslexia is presented. Findings show that measures of frequency, rise time, and duration discrimination as well as amplitude modulation and frequency modulation detection were most often impaired in individuals with dyslexia. Less consistent findings were found for intensity and gap perception. Additional factors that mediate auditory processing deficits in individuals with dyslexia and their implications are discussed.
TL;DR: The notion that tone language speakers and musically trained individuals have higher performance than English-speaking listeners for the perceptual-cognitive processing necessary for basic auditory as well as complex music perception is supported.
Abstract: Psychophysiological evidence suggests that music and language are intimately coupled such that experience/training in one domain can influence processing required in the other domain. While the influence of music on language processing is now well-documented, evidence of language-to-music effects have yet to be firmly established. Here, using a cross-sectional design, we compared the performance of musicians to that of tone-language (Cantonese) speakers on tasks of auditory pitch acuity, music perception, and general cognitive ability (e.g., fluid intelligence, working memory). While musicians demonstrated superior performance on all auditory measures, comparable perceptual enhancements were observed for Cantonese participants, relative to English-speaking nonmusicians. These results provide evidence that tone-language background is associated with higher auditory perceptual performance for music listening. Musicians and Cantonese speakers also showed superior working memory capacity relative to nonmusician controls, suggesting that in addition to basic perceptual enhancements, tone-language background and music training might also be associated with enhanced general cognitive abilities. Our findings support the notion that tone language speakers and musically trained individuals have higher performance than English-speaking listeners for the perceptual-cognitive processing necessary for basic auditory as well as complex music perception. These results illustrate bidirectional influences between the domains of music and language.
TL;DR: Evidence is provided that the auditory system summarizes the temporal details of sounds using time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration, indicating that once these sounds are of moderate length, the brain's representation is limited to time-aversaged statistics.
Abstract: Sensory signals are transduced at high resolution, but their structure must be stored in a more compact format. Here we provide evidence that the auditory system summarizes the temporal details of sounds using time-averaged statistics. We measured discrimination of 'sound textures' that were characterized by particular statistical properties, as normally result from the superposition of many acoustic features in auditory scenes. When listeners discriminated examples of different textures, performance improved with excerpt duration. In contrast, when listeners discriminated different examples of the same texture, performance declined with duration, a paradoxical result given that the information available for discrimination grows with duration. These results indicate that once these sounds are of moderate length, the brain's representation is limited to time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration. Such statistical representations produce good categorical discrimination, but limit the ability to discern temporal detail.
TL;DR: The influence of top-down cognitive control on 2 putatively distinct forms of distraction was investigated in this article, where focal-task engagement was promoted either by increasing the difficulty of encoding the visual to-be-remembered stimuli (by reducing their perceptual discriminability) or providing foreknowledge of an imminent deviation (Experiment 2).
Abstract: The influence of top-down cognitive control on 2 putatively distinct forms of distraction was investigated. Attentional capture by a task-irrelevant auditory deviation (e.g., a female-spoken token following a sequence of male-spoken tokens)—as indexed by its disruption of a visually presented recall task—was abolished when focal-task engagement was promoted either by increasing the difficulty of encoding the visual to-be-remembered stimuli (by reducing their perceptual discriminability; Experiments 1 and 2) or by providing foreknowledge of an imminent deviation (Experiment 2). In contrast, distraction from continuously changing auditory stimuli (“changing-state effect”) was not modulated by task-difficulty or foreknowledge (Experiment 3). We also confirmed that individual differences in working memory capacity—typically associated with maintaining task-engagement in the face of distraction—predict the magnitude of the deviation effect, but not the changing-state effect. This convergence of experimental and psychometric data strongly supports a duplex-mechanism account of auditory distraction: Auditory attentional capture (deviation effect) is open to top-down cognitive control, whereas auditory distraction caused by direct conflict between the sound and focal-task processing (changing-state effect) is relatively immune to such control.
TL;DR: This paper showed that language-specific tone preference emerges even earlier for lexical tones, at 4 to 9 months of age, compared to 6 and 12 months for vowels and consonants.
TL;DR: It is shown, in the context of a dual-pathway model, that internal simulation shapes perception in a context-dependent manner.
Abstract: The computational role of efference copies is widely appreciated in action and perception research, but their properties for speech processing remain murky. We tested the functional specificity of auditory efference copies using magnetoencephalography recordings in an unconventional pairing: We used a classical cognitive manipulation mental imagery-to elicit internal simulation and estimation with a well-established experimental paradigm one shot repetition-to assess neuronal specificity. Participants performed tasks that differentially implicated internal prediction of sensory consequences overt speaking, imagined speaking, and imagined hearing and their modulatory effects on the perception of an auditory syllable probe were assessed. Remarkably, the neural responses to overt syllable probes vary systematically, both in terms of directionality suppression, enhancement and temporal dynamics early, late, as a function of the preceding covert mental imagery adaptor. We show, in the context of a dual-pathway model, that internal simulation shapes perception in a context-dependent manner.
TL;DR: It is found that music synchronizes brain responses across listeners in bilateral auditory midbrain and thalamus, primary auditory and auditory association cortex, right‐lateralized structures in frontal and parietal cortex, and motor planning regions of the brain.
Abstract: Music is a cultural universal and a rich part of the human experience. However, little is known about common brain systems that support the processing and integration of extended, naturalistic ‘real-world’ music stimuli. We examined this question by presenting extended excerpts of symphonic music, and two pseudomusical stimuli in which the temporal and spectral structure of the Natural Music condition were disrupted, to non-musician participants undergoing functional brain imaging and analysing synchronized spatiotemporal activity patterns between listeners. We found that music synchronizes brain responses across listeners in bilateral auditory midbrain and thalamus, primary auditory and auditory association cortex, right-lateralized structures in frontal and parietal cortex, and motor planning regions of the brain. These effects were greater for natural music compared to the pseudo-musical control conditions. Remarkably, inter-subject synchronization in the inferior colliculus and medial geniculate nucleus was also greater for the natural music condition, indicating that synchronization at these early stages of auditory processing is not simply driven by spectro-temporal features of the stimulus. Increased synchronization during music listening was also evident in a right-hemisphere fronto-parietal attention network and bilateral cortical regions involved in motor planning. While these brain structures have previously been implicated in various aspects of musical processing, our results are the first to show that these regions track structural elements of a musical stimulus over extended time periods lasting minutes. Our results show that a hierarchical distributed network is synchronized between individuals during the processing of extended musical sequences, and provide new insight into the temporal integration of complex and biologically salient auditory sequences.
TL;DR: Tapping performance related to reading, attention, and backward masking, which motivates future research investigating whether beat synchronization training can improve not only reading ability, but potentially executive function and auditory processing as well.
TL;DR: A population of neurons in the zebra finch auditory cortex are described that represent vocalizations with a sparse code and that maintain their vocalization-like firing patterns in levels of background sound that permit behavioral recognition.
TL;DR: Multisensory integration of somatosensory, visual, auditory, and olfactory stimuli by the migraine brain may be an important concept for understanding migraine.
Abstract: Purpose of review Migraine attacks consist of head pain and hypersensitivities to somatosensory, visual, auditory, and olfactory stimuli. Investigating how the migraine brain simultaneously processes and responds to multiple incoming stimuli may yield insights into migraine pathophysiology and migraine symptoms. Recent findings The presence and intensity of hypersensitivity to one stimulus type are positively associated with the presence and intensity of hypersensitivities to other stimuli and to headache intensity. Furthermore, exposure to visual, auditory, and olfactory stimuli can trigger migraine attacks. These relationships suggest a role for multisensory integration in migraine. Summary Multisensory integration of somatosensory, visual, auditory, and olfactory stimuli by the migraine brain may be an important concept for understanding migraine.
TL;DR: Current evidence suggests that music-based rehabilitation can be effective in many developmental, psychiatric, and neurological disorders, such as autism, depression, schizophrenia, and stroke, as well as in many chronic somatic illnesses that cause pain and anxiety.
Abstract: Music is a highly versatile form of art and communication that has been an essential part of human society since its early days. Neuroimaging studies indicate that music is a powerful stimulus also for the human brain, engaging not just the auditory cortex but also a vast, bilateral network of temporal, frontal, parietal, cerebellar, and limbic brain areas that govern auditory perception, syntactic and semantic processing, attention and memory, emotion and mood control, and motor skills. Studies of amusia, a severe form of musical impairment, highlight the right temporal and frontal cortices as the core neural substrates for adequate perception and production of music. Many of the basic auditory and musical skills, such as pitch and timbre perception, start developing already in utero, and babies are born with a natural preference for music and singing. Music has many important roles and functions throughout life, ranging from emotional self-regulation, mood enhancement, and identity formation to promoting the development of verbal, motor, cognitive, and social skills and maintaining their healthy functioning in old age. Music is also used clinically as a part of treatment in many illnesses, which involve affective, attention, memory, communication, or motor deficits. Although more research is still needed, current evidence suggests that music-based rehabilitation can be effective in many developmental, psychiatric, and neurological disorders, such as autism, depression, schizophrenia, and stroke, as well as in many chronic somatic illnesses that cause pain and anxiety. WIREs Cogn Sci 2013, 4:441-451. doi: 10.1002/wcs.1237 The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.
TL;DR: Findings show that disruptions within, but not outside, the articulatory motor cortex impair automatic auditory discrimination of speech sounds, providing evidence for the importance of auditory-motor processes in efficient neural analysis of speechSounds.
Abstract: The motor regions that control movements of the articulators activate during listening to speech and contribute to performance in demanding speech recognition and discrimination tasks. Whether the articulatory motor cortex modulates auditory processing of speech sounds is unknown. Here, we aimed to determine whether the articulatory motor cortex affects the auditory mechanisms underlying discrimination of speech sounds in the absence of demanding speech tasks. Using electroencephalography, we recorded responses to changes in sound sequences, while participants watched a silent video. We also disrupted the lip or the hand representation in left motor cortex using transcranial magnetic stimulation. Disruption of the lip representation suppressed responses to changes in speech sounds, but not piano tones. In contrast, disruption of the hand representation had no effect on responses to changes in speech sounds. These findings show that disruptions within, but not outside, the articulatory motor cortex impair automatic auditory discrimination of speech sounds. The findings provide evidence for the importance of auditory-motor processes in efficient neural analysis of speech sounds.
TL;DR: To investigate whether primary auditory cortex is able to tune into attended frequency channels and can switch channels on demand, high-resolution fMRI was used to map the fine-scale frequency-tuning of primary auditory areas A1 and R in six human participants.
Abstract: Cocktail parties, busy streets, and other noisy environments pose a difficult challenge to the auditory system: how to focus attention on selected sounds while ignoring others? Neurons of primary auditory cortex, many of which are sharply tuned to sound frequency, could help solve this problem by filtering selected sound information based on frequency-content. To investigate whether this occurs, we used high-resolution fMRI at 7 tesla to map the fine-scale frequency-tuning (1.5 mm isotropic resolution) of primary auditory areas A1 and R in six human participants. Then, in a selective attention experiment, participants heard low (250 Hz)- and high (4000 Hz)-frequency streams of tones presented at the same time (dual-stream) and were instructed to focus attention onto one stream versus the other, switching back and forth every 30 s. Attention to low-frequency tones enhanced neural responses within low-frequency-tuned voxels relative to high, and when attention switched the pattern quickly reversed. Thus, like a radio, human primary auditory cortex is able to tune into attended frequency channels and can switch channels on demand.
TL;DR: It is proposed that beyond the well known cortical tonotopic organization, multipeaked spectral tuning amplifies selected combinations of frequency bands to serve to detect behaviorally relevant and complex sound features, aid in segregating auditory scenes, and explain prominent perceptual phenomena such as octave invariance.
Abstract: We examine the mechanisms by which the human auditory cortex processes the frequency content of natural sounds. Through mathematical modeling of ultra-high field (7 T) functional magnetic resonance imaging responses to natural sounds, we derive frequency-tuning curves of cortical neuronal populations. With a data-driven analysis, we divide the auditory cortex into five spatially distributed clusters, each characterized by a spectral tuning profile. Beyond neuronal populations with simple single-peaked spectral tuning (grouped into two clusters), we observe that ∼60% of auditory populations are sensitive to multiple frequency bands. Specifically, we observe sensitivity to multiple frequency bands (1) at exactly one octave distance from each other, (2) at multiple harmonically related frequency intervals, and (3) with no apparent relationship to each other. We propose that beyond the well known cortical tonotopic organization, multipeaked spectral tuning amplifies selected combinations of frequency bands. Such selective amplification might serve to detect behaviorally relevant and complex sound features, aid in segregating auditory scenes, and explain prominent perceptual phenomena such as octave invariance.
TL;DR: The three experiments reported here demonstrated a cross-modal influence of an auditory rhythm on the temporal allocation of visual attention and support a general entrainment perspective on attention to events over time.
Abstract: The three experiments reported here demonstrated a cross-modal influence of an auditory rhythm on the temporal allocation of visual attention. In Experiment 1, participants moved their eyes to a test dot with a temporal onset that was either synchronous or asynchronous with a preceding auditory rhythm. Saccadic latencies were faster for the synchronous condition than for the asynchronous conditions. In Experiment 2, the effect was replicated in a condition in which the auditory context stopped prior to the onset of the test dot, and the effect did not occur in a condition in which auditory tones were presented at irregular intervals. Experiment 3 replicated the effect using an accuracy measure within a nontimed visual task. Together, the experiments’ findings support a general entrainment perspective on attention to events over time.
TL;DR: This proof-of-concept study shows that the gait of older adults may be manipulated using auditory stimuli, and which structures of auditory stimuli lead to improvements in functional status in older adults.
Abstract: Gait variability in the context of a deterministic dynamical system may be quantified using nonlinear time series analyses that characterize the complexity of the system. Pathological gait exhibits altered gait variability. It can be either too periodic and predictable, or too random and disordered, as is the case with aging. While gait therapies often focus on restoration of linear measures such as gait speed or stride length, we propose that the goal of gait therapy should be to restore optimal gait variability, which exhibits chaotic fluctuations and is the balance between predictability and complexity. In this context, our purpose was to investigate how listening to different auditory stimuli affects gait variability. Twenty-seven young and 27 elderly subjects walked on a treadmill for 5 min while listening to white noise, a chaotic rhythm, a metronome, and with no auditory stimulus. Stride length, step width, and stride intervals were calculated for all conditions. Detrended Fluctuation Analysis was then performed on these time series. A quadratic trend analysis determined that an idealized inverted-U shape described the relationship between gait variability and the structure of the auditory stimuli for the elderly group, but not for the young group. This proof-of-concept study shows that the gait of older adults may be manipulated using auditory stimuli. Future work will investigate which structures of auditory stimuli lead to improvements in functional status in older adults.
TL;DR: It is concluded that beat gestures are integrated with speech early on in time and modulate sensory/phonological levels of processing and support the possible role of beats as a highlighter, helping the listener to direct the focus of attention to important information and modulates the parsing of the speech stream.
TL;DR: The aim of the study was to statistically integrate studies investigating language lateralization in schizophrenia patients using dichotic listening, and effect sizes suggest that reduced languageateralization is a weak trait marker for schizophrenia as such and a strong trait markers for the experience of auditory hallucinations within the schizophrenia population.
Abstract: Reduced left-hemispheric language lateralization has been proposed to be a trait marker for schizophrenia, but the empirical evidence is ambiguous. Recent studies suggest that auditory hallucinations are critical for whether a patient shows reduced language lateralization. Therefore, the aim of the study was to statistically integrate studies investigating language lateralization in schizophrenia patients using dichotic listening. To this end, two meta-analyses were conducted, one comparing schizophrenia patients with healthy controls (n = 1407), the other comparing schizophrenia patients experiencing auditory hallucinations with non-hallucinating controls (n = 407). Schizophrenia patients showed weaker language lateralization than healthy controls but the effect size was small (g = -0.26). When patients with auditory hallucinations were compared to non-hallucinating controls, the effect size was substantially larger (g = -0.45). These effect sizes suggest that reduced language lateralization is a weak trait marker for schizophrenia as such and a strong trait marker for the experience of auditory hallucinations within the schizophrenia population.
TL;DR: Until greater consensus is reached, any diagnosis of (C)APD should be qualified by an explicit statement of the criteria used, and calls to abandon the use of ( C)APD as a global label should also be supported.
Abstract: Purpose To quantify how 9 different diagnostic criteria affected potential (central) auditory processing disorder ([C]APD) diagnoses in a large sample of children referred for (central) auditory pr...
TL;DR: Sustained oscillatory patterns associated with voluntary engagement of auditory spatial attention are revealed, with the frontoparietal and temporal gamma increases being best predictors of subsequent behavioral performance.
Abstract: In everyday listening situations, we need to constantly switch between alternative sound sources and engage attention according to cues that match our goals and expectations. The exact neuronal bases of these processes are poorly understood. We investigated oscillatory brain networks controlling auditory attention using cortically constrained fMRI-weighted magnetoencephalography/EEG source estimates. During consecutive trials, participants were instructed to shift attention based on a cue, presented in the ear where a target was likely to follow. To promote audiospatial attention effects, the targets were embedded in streams of dichotically presented standard tones. Occasionally, an unexpected novel sound occurred opposite to the cued ear to trigger involuntary orienting. According to our cortical power correlation analyses, increased frontoparietal/temporal 30–100 Hz gamma activity at 200–1400 msec after cued orienting predicted fast and accurate discrimination of subsequent targets. This sustained corre...
TL;DR: Evidence is found that the temporal modulation transfer function (TMTF) of human auditory perception is not simply low-pass in nature, but rather exhibits a peak in sensitivity in the syllabic range (∼2-5 Hz).
TL;DR: Results show that significant compensation among totally blind listeners for virtual auditory spatial distance leads to benefits across a range of simulated acoustic environments, and no significant differences in performance were observed between listeners with partial non-correctable visual losses and sighted controls.
Abstract: Totally blind listeners often demonstrate better than normal capabilities when performing spatial hearing tasks. Accurate representation of three-dimensional auditory space requires the processing of available distance information between the listener and the sound source; however, auditory distance cues vary greatly depending upon the acoustic properties of the environment, and it is not known which distance cues are important to totally blind listeners. Our data show that totally blind listeners display better performance compared to sighted age-matched controls for distance discrimination tasks in anechoic and reverberant virtual rooms simulated using a room-image procedure. Totally blind listeners use two major auditory distance cues to stationary sound sources, level and direct-to-reverberant ratio, more effectively than sighted controls for many of the virtual distances tested. These results show that significant compensation among totally blind listeners for virtual auditory spatial distance leads to benefits across a range of simulated acoustic environments. No significant differences in performance were observed between listeners with partial non-correctable visual losses and sighted controls, suggesting that sensory compensation for virtual distance does not occur for listeners with partial vision loss.
TL;DR: In about a quarter of the world's languages, grammatical evidentials express means of perception as mentioned in this paper, and cognition is associated with a verb of auditory perception, touch, or smell.
Abstract: Every language has a way of talking about seeing, hearing, smelling, tasting and touching. In about a quarter of the world's languages, grammatical evidentials express means of perception. In some languages verbs of vision subsume cognitive meanings. In others, cognition is associated with a verb of auditory perception, touch, or smell. 'Vision' is not the universally preferred means of perception. In numerous cultures, taboos are associated with forbidden visual experience. Vision may be considered intrusive and aggressive, and linked with power. In contrast, 'hearing' and 'listening'.
TL;DR: The temporal dynamics of the dorsal auditory pathway described here offer a new understanding of its functional organization and demonstrate that temporal information is essential to resolve neural circuits underlying complex behaviors.
Abstract: Neuroanatomical models hypothesize a role for the dorsal auditory pathway in phonological processing as a feedforward efferent system (Davis and Johnsrude, 2007; Rauschecker and Scott, 2009; Hickok et al., 2011). But the functional organization of the pathway, in terms of time course of interactions between auditory, somatosensory, and motor regions, and the hemispheric lateralization pattern is largely unknown. Here, ambiguous duplex syllables, with elements presented dichotically at varying interaural asynchronies, were used to parametrically modulate phonological processing and associated neural activity in the human dorsal auditory stream. Subjects performed syllable and chirp identification tasks, while event-related potentials and functional magnetic resonance images were concurrently collected. Joint independent component analysis was applied to fuse the neuroimaging data and study the neural dynamics of brain regions involved in phonological processing with high spatiotemporal resolution. Results revealed a highly interactive neural network associated with phonological processing, composed of functional fields in posterior temporal gyrus (pSTG), inferior parietal lobule (IPL), and ventral central sulcus (vCS) that were engaged early and almost simultaneously (at 80–100 ms), consistent with a direct influence of articulatory somatomotor areas on phonemic perception. Left hemispheric lateralization was observed 250 ms earlier in IPL and vCS than pSTG, suggesting that functional specialization of somatomotor (and not auditory) areas determined lateralization in the dorsal auditory pathway. The temporal dynamics of the dorsal auditory pathway described here offer a new understanding of its functional organization and demonstrate that temporal information is essential to resolve neural circuits underlying complex behaviors.