TL;DR: This work reviews the cognitive neuroscience literature of both motor and auditory domains, highlighting the value of studying interactions between these systems in a musical context, and proposes some ideas concerning the role of the premotor cortex in integration of higher order features of music with appropriately timed and organized actions.
Abstract: Music performance is both a natural human activity, present in all societies, and one of the most complex and demanding cognitive challenges that the human mind can undertake. Unlike most other sensory-motor activities, music performance requires precise timing of several hierarchically organized actions, as well as precise control over pitch interval production, implemented through diverse effectors according to the instrument involved. We review the cognitive neuroscience literature of both motor and auditory domains, highlighting the value of studying interactions between these systems in a musical context, and propose some ideas concerning the role of the premotor cortex in integration of higher order features of music with appropriately timed and organized actions.
TL;DR: Watching videos of speech and music enhanced temporal and frequency encoding in the auditory brainstem, particularly in musicians, demonstrating practice-related changes in the early sensory encoding of auditory and audiovisual information.
Abstract: Musical training is known to modify cortical organization. Here, we show that such modifications extend to subcortical sensory structures and generalize to processing of speech. Musicians had earlier and larger brainstem responses than nonmusician controls to both speech and music stimuli presented in auditory and audiovisual conditions, evident as early as 10 ms after acoustic onset. Phase-locking to stimulus periodicity, which likely underlies perception of pitch, was enhanced in musicians and strongly correlated with length of musical practice. In addition, viewing videos of speech (lip-reading) and music (instrument being played) enhanced temporal and frequency encoding in the auditory brainstem, particularly in musicians. These findings demonstrate practice-related changes in the early sensory encoding of auditory and audiovisual information.
TL;DR: Current research seeks to unravel the complex interactions of pre-attentive and attentive processing of the acoustic scene, the role of auditory attention in mediating receptive-field plasticity in both auditory spatial and auditory feature processing, the contrasts and parallels between auditory and visual attention pathways and mechanisms.
TL;DR: It is concluded that cochlear damage, or similar types of deafferentation from peripheral input, triggers reorganization in the central auditory system that produces permanent alterations in the ongoing oscillatory dynamics at the higher layers of the auditory hierarchical stream.
Abstract: Tinnitus is defined by an auditory perception in the absence of an external source of sound. This condition provides the distinctive possibility of extracting neural coding of perceptual representation. Previously, we had established that tinnitus is characterized by enhanced magnetic slow-wave activity (approximately 4 Hz) in perisylvian or putatively auditory regions. Because of works linking high-frequency oscillations to conscious sensory perception and positive symptoms in a variety of disorders, we examined gamma band activity during brief periods of marked enhancement of slow-wave activity. These periods were extracted from 5 min of resting spontaneous magnetoencephalography activity in 26 tinnitus and 21 control subjects. Results revealed the following, particularly within a frequency range of 50-60 Hz: (1) Both groups showed significant increases in gamma band activity after onset of slow waves. (2) Gamma is more prominent in tinnitus subjects than in controls. (3) Activity at approximately 55 Hz determines the laterality of the tinnitus perception. Based on present and previous results, we have concluded that cochlear damage, or similar types of deafferentation from peripheral input, triggers reorganization in the central auditory system. This produces permanent alterations in the ongoing oscillatory dynamics at the higher layers of the auditory hierarchical stream. The change results in enhanced slow-wave activity reflecting altered corticothalamic and corticolimbic interplay. Such enhancement facilitates and sustains gamma activity as a neural code of phantom perception, in this case auditory.
TL;DR: This article investigated whether producing a visual beat leads to changes in how acoustic prominence is realized in speech, and whether it leads to change in how prominence is perceived by observers, and found that visual beats have a significant effect on the perceived prominence of the target words.
TL;DR: Temporal correspondence between auditory and visual streams affects a network of both multisensory (mSTS) and sensory-specific areas in humans, including even primary visual and auditory cortex, with stronger responses for corresponding and thus related audiovisual inputs.
Abstract: The brain should integrate related but not unrelated information from different senses. Temporal patterning of inputs to different modalities may provide critical information about whether those inputs are related or not. We studied effects of temporal correspondence between auditory and visual streams on human brain activity with functional magnetic resonance imaging (fMRI). Streams of visual flashes with irregularly jittered, arrhythmic timing could appear on right or left, with or without a stream of auditory tones that coincided perfectly when present (highly unlikely by chance), were noncoincident with vision (different erratic, arrhythmic pattern with same temporal statistics), or an auditory stream appeared alone. fMRI revealed blood oxygenation level-dependent (BOLD) increases in multisensory superior temporal sulcus (mSTS), contralateral to a visual stream when coincident with an auditory stream, and BOLD decreases for noncoincidence relative to unisensory baselines. Contralateral primary visual cortex and auditory cortex were also affected by audiovisual temporal correspondence or noncorrespondence, as confirmed in individuals. Connectivity analyses indicated enhanced influence from mSTS on primary sensory areas, rather than vice versa, during audiovisual correspondence. Temporal correspondence between auditory and visual streams affects a network of both multisensory (mSTS) and sensory-specific areas in humans, including even primary visual and auditory cortex, with stronger responses for corresponding and thus related audiovisual inputs.
TL;DR: In this paper, the authors investigated whether the "unity assumption" in which an observer assumes that two different sensory signals refer to the same underlying multisensory event, influences the multisensor integration of audiovisual speech stimuli.
Abstract: We investigated whether the “unity assumption,” according to which an observer assumes that two different sensory signals refer to the same underlying multisensory event, influences the multisensory integration of audiovisual speech stimuli. Syllables (Experiments 1, 3, and 4) or words (Experiment 2) were presented to participants at a range of different stimulus onset asynchronies using the method of constant stimuli. Participants made unspeeded temporal order judgments regarding which stream (either auditory or visual) had been presented first. The auditory and visual speech stimuli in Experiments 1–3 were either gender matched (i.e., a female face presented together with a female voice) or else gender mismatched (i.e., a female face presented together with a male voice). In Experiment 4, different utterances from the same female speaker were used to generate the matched and mismatched speech video clips. Measuring in terms of the just noticeable difference the participants in all four experiments found it easier to judge which sensory modality had been presented first when evaluating mismatched stimuli than when evaluating the matched-speech stimuli. These results therefore provide the first empirical support for the “unity assumption” in the domain of the multisensory temporal integration of audiovisual speech stimuli.
TL;DR: Emotional and neutral sounds rated for valence and arousal were used to investigate the influence of emotions on timing in reproduction and verbal estimation tasks, suggesting that both activation and attentional processes modulate the timing of emotional events.
Abstract: Emotional and neutral sounds rated for valence and arousal were used to investigate the influence of emotions on timing in reproduction and verbal estimation tasks with durations from 2 s to 6 s. Results revealed an effect of emotion on temporal judgment, with emotional stimuli judged to be longer than neutral ones for a similar arousal level. Within scalar expectancy theory (J. Gibbon, R. Church, & W. Meck, 1984), this suggests that emotion-induced activation generates an increase in pacemaker rate, leading to a longer perceived duration. A further exploration of self-assessed emotional dimensions showed an effect of valence and arousal. Negative sounds were judged to be longer than positive ones, indicating that negative stimuli generate a greater increase of activation. High-arousing stimuli were perceived to be shorter than low-arousing ones. Consistent with attentional models of timing, this seems to reflect a decrease of attention devoted to time, leading to a shorter perceived duration. These effects, robust across the 2 tasks, are limited to short intervals and overall suggest that both activation and attentional processes modulate the timing of emotional events.
TL;DR: The role of memory in auditory perception is discussed in this paper, where the effects of Harmonicity and Regularity on the perception of sound sources are discussed. But they do not consider the effect of sound source identification.
Abstract: Perceiving Sound Sources.- Human Sound Source Identification.- Size Information in the Production and Perception of Communication Sounds.- The Role of Memory in Auditory Perception.- Auditory Attention and Filters.- Informational Masking.- Effects of Harmonicity and Regularity on the Perception of Sound Sources.- Spatial Hearing and Perceiving Sources.- Envelope Processing and Sound-Source Perception.- Speech as a Sound Source.- Sound Source Perception and Stream Segregation in Nonhuman Vertebrate Animals.
TL;DR: The MMN literature offers tentative support for the hypothesis that auditory temporal processing is impaired in language and literacy disorders, but the field is plagued by methodological inconsistencies, low reliability of measures, and low statistical power.
Abstract: A popular theoretical account of developmental language and literacy disorders implicates poor auditory temporal processing in their etiology, but evidence from studies using behavioral measures has yielded inconsistent results. The mismatch negativity (MMN) component of the auditory event-related potential has been recommended as an alternative, relatively objective, measure of the brain's ability to discriminate sounds that is suitable for children with limited attention or motivation. A literature search revealed 26 studies of the MMN in individuals with dyslexia or specific language impairment and 4 studies of infants or children at familial risk of these disorders. Findings were highly inconsistent. Overall, attenuation of the MMN and atypical lateralization in the clinical group were most likely to be found in studies using rapidly presented stimuli, including nonverbal sounds. The MMN literature offers tentative support for the hypothesis that auditory temporal processing is impaired in language and literacy disorders, but the field is plagued by methodological inconsistencies, low reliability of measures, and low statistical power. The article concludes with recommendations for improving this state of affairs.
TL;DR: Evidence is provided that perception of the illusory second flash is based on a very rapid dynamic interplay between auditory and visual cortical areas that is triggered by the second sound.
Abstract: When a single flash of light is presented interposed between two brief auditory stimuli separated by 60-100 ms, subjects typically report perceiving two flashes (Shams et al., 2000, 2002). We investigated the timing and localization of the cortical processes that underlie this illusory flash effect in 34 subjects by means of 64-channel recordings of event-related potentials (ERPs). A difference ERP calculated to isolate neural activity associated with the illusory second flash revealed an early modulation of visual cortex activity at 30-60 ms after the second sound, which was larger in amplitude in subjects who saw the illusory flash more frequently. These subjects also showed this early modulation in response to other combinations of auditory and visual stimuli, thus pointing to consistent individual differences in the neural connectivity that underlies cross-modal integration. The overall pattern of cortical activity associated with the cross-modally induced illusory flash, however, differed markedly from that evoked by a real second flash. A trial-by-trial analysis showed that short-latency ERP activity localized to auditory cortex and polymodal cortex of the temporal lobe, concurrent with gamma bursts in visual cortex, were associated with perception of the double-flash illusion. These results provide evidence that perception of the illusory second flash is based on a very rapid dynamic interplay between auditory and visual cortical areas that is triggered by the second sound.
TL;DR: In this article, a model of how the attentional system controls the flow of bottom-up auditory information with regard to ongoing task demands to organize goal-oriented behavior is presented.
Abstract: It has been proposed that the functional role of the mismatch negativity (MMN) generating process is to issue a call for focal attention toward any auditory change violating the preceding acoustic regularity. This paper reviews the evidence supporting such a functional role and outlines a model of how the attentional system controls the flow of bottom-up auditory information with regard to ongoing-task demands to organize goal-oriented behavior. Specifically, the data obtained in auditory-auditory and auditory-visual distraction paradigms demonstrated that the unexpected occurrence of deviant auditory stimuli or novel sounds captures attention involuntarily, as they distract current task performance. These data indicate that such a process of distraction takes place in three successive stages associated, respectively, to MMN, P3a/novelty-P3, and reorienting negativity (RON), and that the latter two are modulated by the demands of the task at hand.
TL;DR: A central capacity supplemented by modality- or code-specific storage is suggested and avenues for further research on the role of processing in central storage are pointed to.
Abstract: If working memory is limited by central capacity (e.g., the focus of attention; N. Cowan, 2001), then storage limits for information in a single modality should apply also to the simultaneous storage of information from different modalities. The authors investigated this by combining a visual-array comparison task with a novel auditory-array comparison task in 5 experiments. Participants were to remember only the visual, only the auditory (unimodal memory conditions), or both arrays (bimodal memory conditions). Experiments 1 and 2 showed significant dual-task tradeoffs for visual but not for auditory capacity. In Experiments 3-5, the authors eliminated modality-specific memory by using postperceptual masks. Dual-task costs occurred for both modalities, and the number of auditory and visual items remembered together was no more than the higher of the unimodal capacities (visual: 3-4 items). The findings suggest a central capacity supplemented by modality- or code-specific storage and point to avenues for further research on the role of processing in central storage.
TL;DR: This article selectively reviews psychophysical and computational studies of streaming and comprehensively reviews more recent neurophysiological studies that have provided important insights into the mechanisms of streaming.
Abstract: Auditory stream segregation (or streaming) is a phenomenon in which 2 or more repeating sounds differing in at least 1 acoustic attribute are perceived as 2 or more separate sound sources (i.e., streams). This article selectively reviews psychophysical and computational studies of streaming and comprehensively reviews more recent neurophysiological studies that have provided important insights into the mechanisms of streaming. On the basis of these studies, segregation of sounds is likely to occur beginning in the auditory periphery and continuing at least to primary auditory cortex for simple cues such as pure-tone frequency but at stages as high as secondary auditory cortex for more complex cues such as periodicity pitch. Attention-dependent and perception-dependent processes are likely to take place in primary or secondary auditory cortex and may also involve higher level areas outside of auditory cortex. Topographic maps of acoustic attributes, stimulus-specific suppression, and competition between representations are among the neurophysiological mechanisms that likely contribute to streaming. A framework for future research is proposed.
TL;DR: A closer inspection of the individual data indicates that the core of the literacy problem is situated at the level of higher-order phonological processing, and results are interpreted as evidence for dysfunctional processing along the auditory-to-articulation stream that is implied in phonologicalprocessing.
TL;DR: It is demonstrated that many important features of sequential streaming can be explained relatively simply based on neural responses in the auditory cortex.
TL;DR: Synchronization of neural activity preceding self-generated actions may reflect the operation of the forward model, which acts to dampen sensations resulting from those actions, if this is true, pre-action synchrony should be related to subsequent sensory suppression.
Abstract: Objective: Synchronization of neural activity preceding self-generated actions may reflect the operation of the forward model, which acts to dampen sensations resulting from those actions. If this is true, pre-action synchrony should be related to subsequent sensory suppression. Deficits in this mechanism may be characteristic of schizophrenia and related to positive symptoms, such as auditory hallucinations. If so, schizophrenia patients should have reduced neural synchrony preceding movements, especially patients with severe hallucinations. Method: In 24 patients with schizophrenia or schizoaffective disorder and 25 healthy comparison subjects, the authors related prespeech neural synchrony to subsequent auditory cortical responsiveness to the spoken sound, compared prespeech neural synchrony in schizophrenia patients and healthy comparison subjects, and related prespeech neural synchrony to auditory hallucination severity in patients. To assess neural synchrony, phase coherence of single-trial EEG prec...
TL;DR: Experimental results that demonstrate experience-dependent plasticity in the central auditory representations of sound frequency, level and temporal sequence, as well as in the representations of binaural localization cues in both developing and adult animals are reviewed.
TL;DR: The results from two experiments utilizing a cued discrimination task demonstrate that selective attention to a single sensory modality prevents the integration of matching multisensory stimuli that is normally observed when attention is divided between sensory modalities.
Abstract: Stimuli occurring in multiple sensory modalities that are temporally synchronous or spatially coincident can be integrated together to enhance perception. Additionally, the semantic content or meaning of a stimulus can influence cross-modal interactions, improving task performance when these stimuli convey semantically congruent or matching information, but impairing performance when they contain non-matching or distracting information. Attention is one mechanism that is known to alter processing of sensory stimuli by enhancing perception of task-relevant information and suppressing perception of task-irrelevant stimuli. It is not known, however, to what extent attention to a single sensory modality can minimize the impact of stimuli in the unattended sensory modality and reduce the integration of stimuli across multiple sensory modalities. Our hypothesis was that modality-specific selective attention would limit processing of stimuli in the unattended sensory modality, resulting in a reduction of performance enhancements produced by semantically matching multisensory stimuli, and a reduction in performance decrements produced by semantically non-matching multisensory stimuli. The results from two experiments utilizing a cued discrimination task demonstrate that selective attention to a single sensory modality prevents the integration of matching multisensory stimuli that is normally observed when attention is divided between sensory modalities. Attention did not reliably alter the amount of distraction caused by non-matching multisensory stimuli on this task; however, these findings highlight a critical role for modality-specific selective attention in modulating multisensory integration.
TL;DR: Findings suggest that auditory stimulation, auditory selective attention and cross-modal effects of visual stimulation each cause transient excitatory and (surround) inhibitory modulations in the auditory cortex that could support auditory sensory memory, pre-attentive detection of sound novelty, and enhanced perception during selective attention.
TL;DR: The data clearly indicate the usefulness of Arg3.1/arc and BDNF for monitoring trauma-induced activity changes and the associated putative plasticity responses in the auditory system.
TL;DR: The results confirm the notion that some auditory system processing mechanisms are impaired in children with dyslexia and that audiovisual training can diminish these deficits.
Abstract: Reading disability is associated with phonological problems which might originate in auditory processing disorders. The aim of the present study was 2-fold: first, the perceptual skills of average-reading children and children with dyslexia were compared in a categorical perception task assessing the processing of a phonemic contrast based on voice onset time (VOT). The medial olivocochlear (MOC) system, an inhibitory pathway functioning under central control, was also explored. Secondly, we investigated whether audiovisual training focusing on voicing contrast could modify VOT sensitivity and, in parallel, induce MOC system plasticity. The results showed an altered voicing sensitivity in some children with dyslexia, and that the most severely impaired children presented the most severe reading difficulties. These deficits in VOT perception were sometimes accompanied by MOC function abnormalities, in particular a reduction in or even absence of the asymmetry in favour of the right ear found in average-reading children. Audiovisual training significantly improved reading and shifted the categorical perception curve of certain children with dyslexia towards the average-reading children's pattern of voicing sensitivity. Likewise, in certain children MOC functioning showed increased asymmetry in favour of the right ear following audiovisual training. The training-related improvements in reading score were greatest in children presenting the greatest changes in MOC lateralization. Taken together, these results confirm the notion that some auditory system processing mechanisms are impaired in children with dyslexia and that audiovisual training can diminish these deficits.
TL;DR: It is argued here that this occurs by rapid central adaptation to background odours combined with a pattern-matching system to recognise discrete sets of spatial and temporal olfactory features—an odour object.
Abstract: Object recognition is a crucial component of both visual and auditory perception. It is also critical for olfaction. Most odours are composed of 10s or 100s of volatile components, yet they are perceived as unitary perceptual events against a continually shifting olfactory background (i.e. figure-ground segregation). We argue here that this occurs by rapid central adaptation to background odours combined with a pattern-matching system to recognise discrete sets of spatial and temporal olfactory features-an odour object. We present supporting neuropsychological, learning, and developmental evidence and then describe the neural circuitry which underpins this. The vagaries of an object-recognition approach are then discussed, with emphasis on the putative importance of memory, multimodal representations, and top-down processing.
TL;DR: The results reveal that many A1 single-neuron responses closely follow the illusory percept, and A1 neurons represented the missing segment of occluded tonal foregrounds by responding to discontinuous foregrounds interrupted by intense noise as if they were responding to the complete foregrounds.
TL;DR: The MMN can serve as an index of pitch features that are differentially weighted depending on a listener's experience with lexical tones and their acoustic correlates within a particular tone space.
Abstract: Purpose: An auditory electrophysiological study was conducted to explore the influence of language experience on the saliency of dimensions underlying cortical pitch processing. Methods: Mismatch negativity (MMN) responses to Mandarin tones were recorded in Chinese and English participants (n = 10 per group) using a passive oddball paradigm. Stimuli consisted of three tones (T1: high level; T2: high rising; T3: low falling-rising). There were three oddball conditions (standard/deviant): T1/T2, T1/T3, T2/T3. In the T1/T2 and T1/T3 conditions, each tonal pair represented a contrast between a level and a contour tone; the T2/T3 condition, a contrast between two contour tones. Twenty dissimilarity matrices were created using the MMN mean amplitude measured from the Fz location for each condition per participant, and analyzed by an individual differences multidimensional scaling model. Results: Two pitch dimensions were revealed, interpretively labeled as 'height' and 'contour'. The latter was found to be more important for Chinese than English subjects. Using individual weights on the contour dimension, a discriminant function showed that 17 out of 20 participants were correctly classified into their respective language groups. Conclusions: The MMN can serve as an index of pitch features that are differentially weighted depending on a listener's experience with lexical tones and their acoustic correlates within a particular tone space.
TL;DR: The hypothesis that auditory processing is less domain-specific in autism than in typical development is tested and findings are largely consistent with perceptual theories of autism, which propose that a processing bias towards featural/low-level information characterizes the disorder.
Abstract: Neurological and behavioral findings indicate that atypical auditory processing characterizes autism. The present study tested the hypothesis that auditory processing is less domain-specific in autism than in typical development. Participants with autism and controls completed a pitch sequence discrimination task in which same/different judgments of music and/or speech stimulus pairs were made. A signal detection analysis showed no difference in pitch sensitivity across conditions in the autism group, while controls exhibited significantly poorer performance in conditions incorporating speech. The results are largely consistent with perceptual theories of autism, which propose that a processing bias towards featural/low-level information characterizes the disorder, as well as supporting the notion that such individuals exhibit selective attention to a limited number of simultaneously presented cues.
TL;DR: Functional magnetic resonance imaging in nine healthy subjects revealed activations in the same superior and inferior parietal, and posterior prefrontal areas in the auditory and visual orienting tasks when these tasks were compared with the corresponding maintenance tasks.
Abstract: We studied orienting and maintenance of spatial attention in audition and vision. Functional magnetic resonance imaging (fMRI) in nine healthy subjects revealed activations in the same superior and inferior parietal, and posterior prefrontal areas in the auditory and visual orienting tasks when these tasks were compared with the corresponding maintenance tasks. Attention-related activations in the thalamus and cerebellum were observed during the auditory orienting and maintenance tasks and during the visual orienting task. In addition to the supratemporal auditory cortices, auditory orienting, and maintenance produced stronger activity than the respective visual tasks in the inferior parietal and prefrontal cortices, whereas only the occipital visual cortex and the superior parietal cortex showed stronger activity during the visual tasks than during the auditory tasks. Differences between the brain networks involved in auditory and visual spatial attention could be, for example, due to different encoding of auditory and visual spatial information or differences in stimulus-driven (bottom-up triggered) and voluntary (top-down controlled) attention between the auditory and visual modalities, or both.
TL;DR: T tone pairs were used that were perceptually bistable with regard to the direction of the pitch change between the two tones and predicted that they would hear the pitch more often as rising with an L-R key-press order than with an R-L key- Press order.
Abstract: According to common-coding theory (Hommel, Miisseler, Aschersleben, & Prinz, 2001), actions are coded in terms of their perceptual effects. The related theory of internal models (e.g., Wolpert & Kawato, 1998) assumes that forward models automatically generate predictions of the sensory consequences of actions and compare them with the actual sensory input. Both theories predict not only effects of perception on action, but also effects of action on perception. Influences of action on visual perception have indeed been found (Hamilton, Wolpert, & Frith, 2004; Miall et al., 2006; Schubo, Aschersleben, & Prinz, 2001; Wiihr & Miisseler, 2001). For example, Wohlschlager (2000) required participants to turn a knob or to press keys in a left-to-right (L-R) or right-to-left (R-L) order while observing perceptually bistable rotating visual displays. The direction of the manual action significantly biased the perceived direction of rotation: It made participants see the display differently. Can action also influence auditory perception? We used tone pairs (derived from research on the tritone paradox; see Deutsch, Kuyper, & Fisher, 1987) that were perceptually bistable with regard to the direction of the pitch change between the two tones. Skilled pianists were required to play these tone pairs by depressing keys on a keyboard and to judge whether the pitch went up or down. We predicted that they would hear the pitch more often as rising with an L-R key-press order than with an R-L key-press order.
TL;DR: Speech parameters were significantly related to phonological awareness and low-level auditory measures, and the risk group presented a slight but significant deficit in speech-in-noise perception, particularly in the most difficult listening condition.
TL;DR: This book describes three stages of human auditory development and describes the role of attention and other higher-level processes in early audition in infants and children.