TL;DR: This study investigates spatial localization of audio-visual stimuli and finds that for severely blurred visual stimuli, the reverse holds: sound captures vision while for less blurred stimuli, neither sense dominates and perception follows the mean position.
TL;DR: Building on previous findings, the knowledge that is available is reviewed, specific mechanisms for the contextual facilitation of object recognition are proposed, and important open questions are highlighted.
Abstract: We see the world in scenes, where visual objects occur in rich surroundings, often embedded in a typical context with other related objects. How does the human brain analyse and use these common associations? This article reviews the knowledge that is available, proposes specific mechanisms for the contextual facilitation of object recognition, and highlights important open questions. Although much has already been revealed about the cognitive and cortical mechanisms that subserve recognition of individual objects, surprisingly little is known about the neural underpinnings of contextual analysis and scene perception. Building on previous findings, we now have the means to address the question of how the brain integrates individual elements to construct the visual experience.
TL;DR: Functional magnetic resonance imaging is used to suggest that activity in the posterior parietal cortex is tightly correlated with the limited amount of scene information that can be stored in visual short-term memory, and suggests that the posterior PAR cortex is a key neural locus of the authors' impoverished mental representation of the visual world.
Abstract: At any instant, our visual system allows us to perceive a rich and detailed visual world. Yet our internal, explicit representation of this visual world is extremely sparse: we can only hold in mind a minute fraction of the visual scene. These mental representations are stored in visual short-term memory (VSTM). Even though VSTM is essential for the execution of a wide array of perceptual and cognitive functions, and is supported by an extensive network of brain regions, its storage capacity is severely limited. With the use of functional magnetic resonance imaging, we show here that this capacity limit is neurally reflected in one node of this network: activity in the posterior parietal cortex is tightly correlated with the limited amount of scene information that can be stored in VSTM. These results suggest that the posterior parietal cortex is a key neural locus of our impoverished mental representation of the visual world.
TL;DR: The emulation theory of representation is developed and explored as a framework that can revealingly synthesize a wide variety of representational functions of the brain, including reasoning, theory of mind phenomena, and language.
Abstract: The emulation theory of representation is developed and explored as a framework that can revealingly synthesize a wide vari- ety of representational functions of the brain. The framework is based on constructs from control theory (forward models) and signal processing (Kalman filters). The idea is that in addition to simply engaging with the body and environment, the brain constructs neural circuits that act as models of the body and environment. During overt sensorimotor engagement, these models are driven by efference copies in parallel with the body and environment, in order to provide expectations of the sensory feedback, and to enhance and process sensory information. These models can also be run off-line in order to produce imagery, estimate outcomes of different actions, and eval- uate and develop motor plans. The framework is initially developed within the context of motor control, where it has been shown that inner models running in parallel with the body can reduce the effects of feedback delay problems. The same mechanisms can account for motor imagery as the off-line driving of the emulator via efference copies. The framework is extended to account for visual imagery as the off-line driving of an emulator of the motor-visual loop. I also show how such systems can provide for amodal spatial imagery. Per- ception, including visual perception, results from such models being used to form expectations of, and to interpret, sensory input. I close by briefly outlining other cognitive functions that might also be synthesized within this framework, including reasoning, theory of mind phenomena, and language.
TL;DR: Current studies of fixational eye movements have focused on determining how visible perception is encoded by neurons in various visual areas of the brain to elucidate how the brain makes the authors' environment visible.
Abstract: Our eyes continually move even while we fix our gaze on an object. Although these fixational eye movements have a magnitude that should make them visible to us, we are unaware of them. If fixational eye movements are counteracted, our visual perception fades completely as a result of neural adaptation. So, our visual system has a built-in paradox — we must fix our gaze to inspect the minute details of our world, but if we were to fixate perfectly, the entire world would fade from view. Owing to their role in counteracting adaptation, fixational eye movements have been studied to elucidate how the brain makes our environment visible. Moreover, because we are not aware of these eye movements, they have been studied to understand the underpinnings of visual awareness. Recent studies of fixational eye movements have focused on determining how visible perception is encoded by neurons in various visual areas of the brain.
TL;DR: This work has shown how complexity may be managed and ambiguity resolved through the task-dependent, probabilistic integration of prior object knowledge with image features.
Abstract: We perceive the shapes and material properties of objects quickly and reliably despite the complexity and objective ambiguities of natural images. Typical images are highly complex because they consist of many objects embedded in background clutter. Moreover, the image features of an object are extremely variable and ambiguous owing to the effects of projection, occlusion, background clutter, and illumination. The very success of everyday vision implies neural mechanisms, yet to be understood, that discount irrelevant information and organize ambiguous or noisy local image features into objects and surfaces. Recent work in Bayesian theories of visual perception has shown how complexity may be managed and ambiguity resolved through the task-dependent, probabilistic integration of prior object knowledge with image features.
TL;DR: Recent findings and methods employed to uncover the functional properties of the human visual cortex focusing on two themes: functional specialization and hierarchical processing are reviewed.
Abstract: The discovery and analysis of cortical visual areas is a major accomplishment of visual neuroscience. In the past decade the use of noninvasive functional imaging, particularly functional magnetic resonance imaging (fMRI), has dramatically increased our detailed knowledge of the functional organization of the human visual cortex and its relation to visual perception. The fMRI method offers a major advantage over other techniques applied in neuroscience by providing a large-scale neuroanatomical perspective that stems from its ability to image the entire brain essentially at once. This bird's eye view has the potential to reveal large-scale principles within the very complex plethora of visual areas. Thus, it could arrange the entire constellation of human visual areas in a unified functional organizational framework. Here we review recent findings and methods employed to uncover the functional properties of the human visual cortex focusing on two themes: functional specialization and hierarchical processing.
TL;DR: Using police officers and undergraduates as participants, the authors suggest that some associations between social groups and concepts are bidirectional and operate as visual tuning devices--producing shifts in perception and attention of a sort likely to influence decision making and behavior.
Abstract: Using police officers and undergraduates as participants, the authors investigated the influence of stereotypic associations on visual processing in 5 studies. Study 1 demonstrates that Black faces influence participants' ability to spontaneously detect degraded images of crime-relevant objects. Conversely, Studies 2-4 demonstrate that activating abstract concepts (i.e., crime and basketball) induces attentional biases toward Black male faces. Moreover, these processing biases may be related to the degree to which a social group member is physically representative of the social group (Studies 4-5). These studies, taken together, suggest that some associations between social groups and concepts are bidirectional and operate as visual tuning devices--producing shifts in perception and attention of a sort likely to influence decision making and behavior.
TL;DR: The Reverse Hierarchy Theory is extended to describe the dynamics of skill acquisition and interpret recent behavioral and electrophysiological findings.
TL;DR: The emulation theory of representation as mentioned in this paper is a framework that can reveally synthesize a wide variety of representational functions of the brain, including reasoning, theory of mind phenomena, and language.
Abstract: The emulation theory of representation is developed and explored as a framework that can revealingly synthesize a wide variety of representational functions of the brain. The framework is based on constructs from control theory (forward models) and signal processing (Kalman filters). The idea is that in addition to simply engaging with the body and environment, the brain constructs neural circuits that act as models of the body and environment. During overt sensorimotor engagement, these models are driven by efference copies in parallel with the body and environment, in order to provide expectations of the sensory feedback, and to enhance and process sensory information. These models can also be run off-line in order to produce imagery, estimate outcomes of different actions, and evaluate and develop motor plans. The framework is initially developed within the context of motor control, where it has been shown that inner models running in parallel with the body can reduce the effects of feedback delay problems. The same mechanisms can account for motor imagery as the off-line driving of the emulator via efference copies. The framework is extended to account for visual imagery as the off-line driving of air emulator of the motor-visual loop. I also show how such systems can provide for amodal spatial imagery. Perception, including visual perception, results from such models being used to form expectations of, and to interpret, sensory input. I close by briefly outlining other cognitive functions that might also be synthesized within this framework, including reasoning, theory of mind phenomena, and language.
TL;DR: Large-scale regression studies were used to investigate the unique predictive variance of phonological features in the onsets, lexical variables, and semantic variables to investigate visual word recognition, shedding light on recent empirical controversies in the available word recognition literature.
Abstract: Speeded visual word naming and lexical decision performance are reported for 2428 words for young adults and healthy older adults. Hierarchical regression techniques were used to investigate the unique predictive variance of phonological features in the onsets, lexical variables (e.g., measures of consistency, frequency, familiarity, neighborhood size, and length), and semantic variables (e.g. imageahility and semantic connectivity). The influence of most variables was highly task dependent, with the results shedding light on recent empirical controversies in the available word recognition literature. Semantic-level variables accounted for unique variance in both speeded naming and lexical decision performance, level with the latter task producing the largest semantic-level effects. Discussion focuses on the utility of large-scale regression studies in providing a complementary approach to the standard factorial designs to investigate visual word recognition.
TL;DR: It is concluded that learning to read results in the progressive development of an inferotemporal region increasingly responsive to visual words, which is aptly named the visual word form area (VWFA).
TL;DR: Although the vast majority of activated voxels were activated during both conditions, the spatial overlap was neither complete nor uniform; the overlap was much more pronounced in frontal and parietal regions than in temporal and occipital regions, which may indicate that cognitive control processes function comparably in both imagery and perception.
TL;DR: It is suggested that pSTS/MTG is specialized for integrating different types of information both within modalities (e.g., visual form, visual motion) and acrossmodalities (auditory and visual).
TL;DR: Three studies suggest that affect has a surprisingly physical basis, and reveal that, although evaluations activate areas of visual space, spatial positions do not activate evaluations.
Abstract: Metaphors linking spatial location and affect (e.g., feeling up or down) may have subtle, but pervasive, effects on evaluation. In three studies, participants evaluated words presented on a computer. In Study 1, evaluations of positive words were faster when words were in the up rather than the down position, whereas evaluations of negative words were faster when words were in the down rather than the up position. In Study 2, positive evaluations activated higher areas of visual space, whereas negative evaluations activated lower areas of visual space. Study 3 revealed that, although evaluations activate areas of visual space, spatial positions do not activate evaluations. The studies suggest that affect has a surprisingly physical basis.
TL;DR: Richardson et al. as discussed by the authors investigated whether the eye movements of a speaker and a listener to a visual common ground can provide insight into a discourse and found that the strength of this relationship positively correlated with listeners' comprehension.
Abstract: Looking To Understand: The Coupling Between Speakers’ and Listeners’ Eye Movements and its Relationship to Discourse Comprehension Daniel C. Richardson (richardson@psych.stanford.edu) Department of Psychology, Stanford University Stanford, CA 94305, USA Rick Dale (rad28@cornell.edu) Department of Psychology, Cornell University Ithaca, NY 14853, USA Abstract discussing a diagram drawn on a whiteboard, figuring out together how to do something on a computer, or talking during a movie. Uniquely poised between perception and cognition, eye movements can reveal cognitive processes such as speech planning, language comprehension, memory, mental imagery and decision making. The current experiment investigates whether the eye movements of a speaker and a listener to a visual common ground can provide insight into a discourse. While their eye movements were being recorded, participants spoke extemporaneously about a TV show whose cast members they were viewing. Later, other participants listened to these speeches while their eyes were tracked. Within this naturalistic paradigm using spontaneous speech, a number of results linking eye movements to speech comprehension, speech production and memory were replicated. More importantly, a cross- recurrence analysis demonstrated that speaker and listener eye movements were coupled, and that the strength of this relationship positively correlated with listeners’ comprehension. Just as the mental state of a single person can be reflected in patterns of eye movements, the commonality of mental states that is brought about by successful communication is mirrored in a similarity between speaker and listener’s eye movements. Eye movement Research Introduction Imagine standing in front of a painting, discussing it with a friend. As you talk, your eyes will scan across the image, moving approximately three times a second. They will be drawn by characteristics of the image itself, areas of contrast or detail, as well as features of the objects or people portrayed. Eye movements are driven both by properties of the visual world and processes in a person’s mind. Your gaze might also be influenced by what your friend is saying, what you say in reply, what is thought but not said, and where you agree and disagree. If this is so, what is the relationship between your eye movements and those of your friend? How is that relationship related to the flow of conversation between you? Language use often occurs within rich visual contexts such as this, and the interplay between linguistic processes and visual perception is of increasing interest to psycholinguists and vision researchers (Henderson & Ferreira, 2004). As yet, however, such processes have been limited to experiments that examine the eye movements of the speaker or the listener in isolation. Language use, more often than not, occurs within a richer social context as well. Direct eye contact between conversants plays an interesting, crucial role in coordinating a conversation (Bavelas, Coates, & Johnson, 2002), and in conveying various attitudes or social roles (Argyle & Cook, 1976). The focus of the current experiment, however, is cases such as those introduced at the outset, where conversants are not looking at each other, but at some visual scene that is the topic of the conversation. More common examples might be Eye movements of a speaker If a speaker is asked to describe a simple scene, they will fixate the objects in the order in which they are mentioned, around 900ms before naming them (Griffin & Bock, 2000; Meyer, Sleiderink, & Levelt, 1998). Since such pictures can be identified rapidly, it is argued that during this time speakers are not just retrieving words but selecting and planning which to use. Eye movements of a listener Eye movement research has shown that there is a tight interdependence between speech recognition and visual perception. Eye movements to potential referents for a word can provide evidence for a lexical item being recognized before the word is finished being spoken. The link between visual and linguistic processing can also be seen in eye movements that disambiguate syntactic structures (Tanenhaus, Spivey Knowlton, Eberhard, & Sedivy, 1995) and anticipate the future agents of actions (Kamide, Altmann, & Haywood, 2003). Recent studies of the eye- movements of a participant engaged in a conversation with another naive participant reveal a remarkable sensitivity to the referential domains established by the task, the visual context and the preceding conversation (Brown-Schmidt, Campana, & Tanenhaus, 2004). Qualitatively, eye movement research reveals a very close, time-locked integration between visual and linguistic processing (Tanenhaus, Magnuson, Dahan, & Chambers, 2000). Although fixation times are heavily modulated by context, as a very rough quantitative guide, research suggests that listeners will fixate an object around 400-800ms after the name onset.
TL;DR: This investigation revealed that neuronal interactions between occipito-temporal, parietal and frontal regions are task- and stimulus-dependent and mediated by content-sensitive forward connections from early visual areas.
Abstract: Functional magnetic resonance imaging (fMRI) studies have identified category-selective regions in ventral occipito-temporal cortex that respond preferentially to faces and other objects. The extent to which these patterns of activation are modulated by bottom-up or top-down mechanisms is currently unknown. We combined fMRI and dynamic causal modelling to investigate neuronal interactions between occipito-temporal, parietal and frontal regions, during visual perception and visual imagery of faces, houses and chairs. Our results indicate that, during visual perception, category-selective patterns of activation in extrastriate cortex are mediated by content-sensitive forward connections from early visual areas. In contrast, during visual imagery, category-selective activation is mediated by content-sensitive backward connections from prefrontal cortex. Additionally, we report content-unrelated connectivity between parietal cortex and the category-selective regions, during both perception and imagery. Thus, our investigation revealed that neuronal interactions between occipito-temporal, parietal and frontal regions are task- and stimulus-dependent. Sensory representations of faces and objects are mediated by bottom-up mechanisms arising in early visual areas and top-down mechanisms arising in prefrontal cortex, during perception and imagery respectively. Additionally non-selective, top-down processes, originating in superior parietal areas, contribute to the generation of mental images, regardless of their content, and their maintenance in the 'mind's eye'.
TL;DR: It is demonstrated that perceptual learning can improve basic representations within an adult visual system that did not develop during the critical period, and induction of low-level changes might yield significant perceptual benefits that transfer to higher visual tasks.
Abstract: Practicing certain visual tasks leads, as a result of a process termed “perceptual learning,” to a significant improvement in performance. Learning is specific for basic stimulus features such as local orientation, retinal location, and eye of presentation, suggesting modification of neuronal processes at the primary visual cortex in adults. It is not known, however, whether such low-level learning affects higher-level visual tasks such as recognition. By systematic low-level training of an adult visual system malfunctioning as a result of abnormal development (leading to amblyopia) of the primary visual cortex during the “critical period,” we show here that induction of low-level changes might yield significant perceptual benefits that transfer to higher visual tasks. The training procedure resulted in a 2-fold improvement in contrast sensitivity and in letter-recognition tasks. These findings demonstrate that perceptual learning can improve basic representations within an adult visual system that did not develop during the critical period.
TL;DR: The results support the notion that binocular rivalry involves a more automatic, stimulus-driven form of visual competition than Necker cube reversal, and as a consequence, is less easily biased by selective attention.
Abstract: It is debated whether different forms of bistable perception result from common or separate neural mechanisms. Binocular rivalry involves perceptual alternations between competing monocular images, whereas ambiguous figures such as the Necker cube lead to alternations between two possible pictorial interpretations. Previous studies have shown that observers can voluntarily control the alternation rate of both rivalry and Necker cube reversal, perhaps suggesting that bistable perception results from a common mechanism of top-down selection. However, according to the biased competition model of selective attention, attention should be able to enhance the attended percept and suppress the unattended percept. Here, we investigated selective attentional modulation of dominance durations in bistable perception. Observers consistently showed much weaker selective attentional control for rivalry than for Necker cube reversal, even for rivalry displays that maximized the opportunities for feature-, object-, or space-based attentional selection. In contrast, nonselective control of alternation rate was comparably strong for both forms of bistable perception and corresponded poorly with estimates of selective attentional control. Our results support the notion that binocular rivalry involves a more automatic, stimulus-driven form of visual competition than Necker cube reversal, and as a consequence, is less easily biased by selective attention.
TL;DR: In this article, the combined influence of visual and auditory inputs upon object identification was examined by examining the combination of pictures and vocalizations of animals, in which subjects were significantly faster and more accurate at identifying targets when the picture and vocalization were matched (i.e. from the same animal), than when the target was represented in only one sensory modality.
Abstract: Multisensory object-recognition processes were investigated by examining the combined influence of visual and auditory inputs upon object identification--in this case, pictures and vocalizations of animals. Behaviorally, subjects were significantly faster and more accurate at identifying targets when the picture and vocalization were matched (i.e. from the same animal), than when the target was represented in only one sensory modality. This behavioral enhancement was accompanied by a modulation of the evoked potential in the latency range and general topographic region of the visual evoked N1 component, which is associated with early feature processing in the ventral visual stream. High-density topographic mapping and dipole modeling of this multisensory effect were consistent with generators in lateral occipito-temporal cortices, suggesting that auditory inputs were modulating processing in regions of the lateral occipital cortices. Both the timing and scalp topography of this modulation suggests that there are multisensory effects during what is considered to be a relatively early stage of visual object-recognition processes, and that this modulation occurs in regions of the visual system that have traditionally been held to be unisensory processing areas. Multisensory inputs also modulated the visual 'selection-negativity', an attention dependent component of the evoked potential this is usually evoked when subjects selectively attend to a particular feature of a visual stimulus.
TL;DR: Bimodal syllables were identified more rapidly than auditory alone stimuli and the latency of the effect indicates that integration operates at pre‐representational stages of stimulus analysis, probably via feedback projections from visual and/or polymodal areas.
Abstract: While everyone has experienced that seeing lip movements may improve speech perception, little is known about the neural mechanisms by which audiovisual speech information is combined. Event-related potentials (ERPs) were recorded while subjects performed an auditory recognition task among four different natural syllables randomly presented in the auditory (A), visual (V) or congruent bimodal (AV) condition. We found that: (i) bimodal syllables were identified more rapidly than auditory alone stimuli; (ii) this behavioural facilitation was associated with cross-modal [AV-(A+V)] ERP effects around 120-190 ms latency, expressed mainly as a decrease of unimodal N1 generator activities in the auditory cortex. This finding provides evidence for suppressive, speech-specific audiovisual integration mechanisms, which are likely to be related to the dominance of the auditory modality for speech perception. Furthermore, the latency of the effect indicates that integration operates at pre-representational stages of stimulus analysis, probably via feedback projections from visual and/or polymodal areas.
TL;DR: An original paradigm is used to show that seeing the speaker's lips enables the listener to hear better and hence to understand better, and this early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio- visual perception.
TL;DR: The results suggest that medial temporal cortex permits rapid categorization of the visual input, while the frontal cortex is part of a capacity-limited attentional bottleneck to conscious report.
TL;DR: Strong and potentially mechanistic links between the multiple facets of multisensory integration that contribute to the perceptual Gestalt are suggested.
Abstract: The brain integrates information from multiple sensory modalities and, through this process, generates a coherent and apparently seamless percept of the external world. Although multisensory integration typically binds information that is derived from the same event, when multisensory cues are somewhat discordant they can result in illusory percepts such as the "ventriloquism effect." These biases in stimulus localization are generally accompanied by the perceptual unification of the two stimuli. In the current study, we sought to further elucidate the relationship between localization biases, perceptual unification and measures of a participant's uncertainty in target localization (i.e., variability). Participants performed an auditory localization task in which they were also asked to report on whether they perceived the auditory and visual stimuli to be perceptually unified. The auditory and visual stimuli were delivered at a variety of spatial (0 degrees, 5 degrees, 10 degrees, 15 degrees ) and temporal (200, 500, 800 ms) disparities. Localization bias and reports of perceptual unity occurred even with substantial spatial (i.e., 15 degrees ) and temporal (i.e., 800 ms) disparities. Trial-by-trial comparison of these measures revealed a striking correlation: regardless of their disparity, whenever the auditory and visual stimuli were perceived as unified, they were localized at or very near the light. In contrast, when the stimuli were perceived as not unified, auditory localization was often biased away from the visual stimulus. Furthermore, localization variability was significantly less when the stimuli were perceived as unified. Intriguingly, on non-unity trials such variability increased with decreasing disparity. Together, these results suggest strong and potentially mechanistic links between the multiple facets of multisensory integration that contribute to our perceptual Gestalt.
TL;DR: These findings suggest that the perceptual representation of 3D shape involves a relatively abstract data structure that is based primarily on qualitative properties that can be reliably determined from visual information.
TL;DR: Functional MR imaging of human subjects as they performed a task that required simultaneous attention to two briefly displayed and masked targets at locations separated by distractor stimuli reveals retinotopically specific enhanced activation in striate and extrastriate visual cortical representations of the two attended stimuli and no enhancement at the intervening representation of distracting stimuli.
TL;DR: It is concluded that even in natural conditions, when many features have to be processed simultaneously, functional specialization is preserved, and suggested that each specialized area is directly responsible for the creation of a feature‐specific conscious percept (a microconsciousness).
TL;DR: Results confirm that rhythmic movement is more strongly attracted to auditory than to visual rhythms, and to the extent that this is an innate proclivity, it may have been an important factor in the evolution of music.
Abstract: People often move in synchrony with auditory rhythms (e.g., music), whereas synchronization of movement with purely visual rhythms is rare. In two experiments, this apparent attraction of movement to auditory rhythms was investigated by requiring participants to tap their index finger in synchrony with an isochronous auditory (tone) or visual (flashing light) target sequence while a distractor sequence was presented in the other modality at one of various phase relationships. The obtained asynchronies and their variability showed that auditory distractors strongly attracted participants' taps, whereas visual distractors had much weaker effects, if any. This asymmetry held regardless of the spatial congruence or relative salience of the stimuli in the two modalities. When different irregular timing patterns were imposed on target and distractor sequences, participants' taps tended to track the timing pattern of auditory distractor sequences when they were approximately in phase with visual target sequences, but not the reverse. These results confirm that rhythmic movement is more strongly attracted to auditory than to visual rhythms. To the extent that this is an innate proclivity, it may have been an important factor in the evolution of music.
TL;DR: This interpretation supports a fractionation of visuospatial short-term memory into separate visual and spatial components and could be defended against alternative explanations in terms of trade-offs in resource allocation between memory tasks and interference tasks.
Abstract: A visual short-term memory task was more strongly disrupted by visual than spatial interference, and a spatial memory task was simultaneously more strongly disrupted by spatial than visual interference. This double dissociation supports a fractionation of visuospatial short-term memory into separate visual and spatial components. In 6 experiments, this interpretation could be defended against alternative explanations in terms of trade-offs in resource allocation between memory tasks and interference tasks, in terms of an involvement of short-term consolidation and long-term memory, in terms of differential phonological-loop and central-executive involvement, and in terms of similarity-based interference.
TL;DR: The results show that during auditory speech perception, there is increased excitability of motor system underlying speech production and that this increase is significantly correlated with activity in the posterior part of the left inferior frontal gyrus (Broca's area), which is proposed to primes the motor system in response to heard speech even when no speech output is required.
Abstract: Studies in both human and nonhuman primates indicate that motor and premotor cortical regions participate in auditory and visual perception of actions. Previous studies, using transcranial magnetic stimulation (TMS), showed that perceiving visual and auditory speech increased the excitability of the orofacial motor system during speech perception. Such studies, however, cannot tell us which brain regions mediate this effect. In this study, we used the technique of combining positron emission tomography with TMS to identify the brain regions that modulate the excitability of the motor system during speech perception. Our results show that during auditory speech perception, there is increased excitability of motor system underlying speech production and that this increase is significantly correlated with activity in the posterior part of the left inferior frontal gyrus (Broca's area). We propose that this area ''primes'' the motor system in response to heard speech even when no speech output is required and, as such, operates at the interface of perception and action.