TL;DR: In this paper, the authors proposed a method to improve the quality of visual underwater scenes using Generative Adversarial Networks (GANs), with the goal of improving input to vision-driven behaviors further down the autonomy pipeline.
Abstract: Autonomous underwater vehicles (AUVs) rely on a variety of sensors - acoustic, inertial and visual - for intelligent decision making. Due to its non-intrusive, passive nature and high information content, vision is an attractive sensing modality, particularly at shallower depths. However, factors such as light refraction and absorption, suspended particles in the water, and color distortion affect the quality of visual data, resulting in noisy and distorted images. AUVs that rely on visual sensing thus face difficult challenges and consequently exhibit poor performance on vision-driven tasks. This paper proposes a method to improve the quality of visual underwater scenes using Generative Adversarial Networks (GANs), with the goal of improving input to vision-driven behaviors further down the autonomy pipeline. Furthermore, we show how recently proposed methods are able to generate a dataset for the purpose of such underwater image restoration. For any visually-guided underwater robots, this improvement can result in increased safety and reliability through robust visual perception. To that effect, we present quantitative and qualitative data which demonstrates that images corrected through the proposed approach generate more visually appealing images, and also provide increased accuracy for a diver tracking algorithm.
TL;DR: This work investigated the fate of weak visual stimuli in the visual and frontal cortex of awake monkeys trained to report stimulus presence and proposed a model in which stimuli become consciously reportable when they elicit a nonlinear ignition process in higher cortical areas.
Abstract: Why are some visual stimuli consciously detected, whereas others remain subliminal? We investigated the fate of weak visual stimuli in the visual and frontal cortex of awake monkeys trained to report stimulus presence. Reported stimuli were associated with strong sustained activity in the frontal cortex, and frontal activity was weaker and quickly decayed for unreported stimuli. Information about weak stimuli could be lost at successive stages en route from the visual to the frontal cortex, and these propagation failures were confirmed through microstimulation of area V1. Fluctuations in response bias and sensitivity during perception of identical stimuli were traced back to prestimulus brain-state markers. A model in which stimuli become consciously reportable when they elicit a nonlinear ignition process in higher cortical areas explained our results.
TL;DR: In this article, the authors combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis.
Abstract: Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.
TL;DR: Both the group-average and individual-subject results reveal robust signals across much of the brain, including occipital, temporal, parietal, and frontal cortex as well as subcortical areas, and split-half analyses show strong within-subject reliability, further demonstrating the high quality of the data.
Abstract: About a quarter of human cerebral cortex is dedicated mainly to visual processing. The large-scale spatial organization of visual cortex can be measured with functional magnetic resonance imaging (fMRI) while subjects view spatially modulated visual stimuli, also known as "retinotopic mapping." One of the datasets collected by the Human Connectome Project involved ultrahigh-field (7 Tesla) fMRI retinotopic mapping in 181 healthy young adults (1.6-mm resolution), yielding the largest freely available collection of retinotopy data. Here, we describe the experimental paradigm and the results of model-based analysis of the fMRI data. These results provide estimates of population receptive field position and size. Our analyses include both results from individual subjects as well as results obtained by averaging fMRI time series across subjects at each cortical and subcortical location and then fitting models. Both the group-average and individual-subject results reveal robust signals across much of the brain, including occipital, temporal, parietal, and frontal cortex as well as subcortical areas. The group-average results agree well with previously published parcellations of visual areas. In addition, split-half analyses show strong within-subject reliability, further demonstrating the high quality of the data. We make publicly available the analysis results for individual subjects and the group average, as well as associated stimuli and analysis code. These resources provide an opportunity for studying fine-scale individual variability in cortical and subcortical organization and the properties of high-resolution fMRI. In addition, they provide a set of observations that can be compared with other Human Connectome Project measures acquired in these same participants.
TL;DR: While alpha oscillations are strongly associated with reductions in visual attention, they also appear to play important roles in regulating the timing and temporal resolution of perception: top‐down control and may facilitate transmission of predictions to visual cortex.
Abstract: A central feature of human brain activity is the alpha rhythm: a 7-13 Hz oscillation observed most notably over occipitoparietal brain regions during periods of eyes-closed rest. Alpha oscillations covary with changes in visual processing and have been associated with a broad range of neurocognitive functions. In this article, we review these associations and suggest that alpha oscillations can be thought to exhibit at least five distinct 'characters': those of the inhibitor, perceiver, predictor, communicator and stabiliser. In short, while alpha oscillations are strongly associated with reductions in visual attention, they also appear to play important roles in regulating the timing and temporal resolution of perception. Furthermore, alpha oscillations are strongly associated with top-down control and may facilitate transmission of predictions to visual cortex. This is in addition to promoting communication between frontal and posterior brain regions more generally, as well as maintaining ongoing perceptual states. We discuss why alpha oscillations might associate with such a broad range of cognitive functions and suggest ways in which these diverse associations can be studied experimentally.
TL;DR: A new BCI speller based on miniature asymmetric visual evoked potentials (aVEPs), which encodes 32 characters with a space-code division multiple access scheme and decodes EEG features with a discriminative canonical pattern matching algorithm is developed.
Abstract: Goal: Traditional visual brain–computer interfaces (BCIs) preferred to use large-size stimuli to attract the user's attention and elicit distinct electroencephalography (EEG) features. However, the visual stimuli are of no interest to the users as they just serve as the hidden codes behind the characters. Furthermore, using stronger visual stimuli could cause visual fatigue and other adverse symptoms to users. Therefore, it's imperative for visual BCIs to use small and inconspicuous visual stimuli to code characters. Methods: This study developed a new BCI speller based on miniature asymmetric visual evoked potentials (aVEPs), which encodes 32 characters with a space-code division multiple access scheme and decodes EEG features with a discriminative canonical pattern matching algorithm. Notably, the visual stimulus used in this study only subtended 0.5° of visual angle and was placed outside the fovea vision on the lateral side, which could only induce a miniature potential about 0.5 μ V in amplitude and about 16.5 dB in signal-to-noise rate. A total of 12 subjects were recruited to use the miniature aVEP speller in both offline and online tests. Results: Information transfer rates up to 63.33 b/min could be achieved from online tests (online demo URL: https://www.youtube.com/edit?o=U&video_id=kC7btB3mvGY ). Conclusion: Experimental results demonstrate the feasibility of using very small and inconspicuous visual stimuli to implement an efficient BCI system, even though the elicited EEG features are very weak. Significance: The proposed innovative technique can broaden the category of BCIs and strengthen the brain-computer communication.
TL;DR: It is found that the peak frequency of alpha oscillations decreased when visual task demands required temporal integration compared with segregation, and alpha frequency was strategically modulated immediately before and during stimulus processing, suggesting a preparatory top-down source of modulation.
Abstract: Temporal integration in visual perception is thought to occur within cycles of occipital alpha-band (8–12 Hz) oscillations. Successive stimuli may be integrated when they fall within the same alpha cycle and segregated for different alpha cycles. Consequently, the speed of alpha oscillations correlates with the temporal resolution of perception, such that lower alpha frequencies provide longer time windows for perceptual integration and higher alpha frequencies correspond to faster sampling and segregation. Can the brain’s rhythmic activity be dynamically controlled to adjust its processing speed according to different visual task demands? We recorded magnetoencephalography (MEG) while participants switched between task instructions for temporal integration and segregation, holding stimuli and task difficulty constant. We found that the peak frequency of alpha oscillations decreased when visual task demands required temporal integration compared with segregation. Alpha frequency was strategically modulated immediately before and during stimulus processing, suggesting a preparatory top-down source of modulation. Its neural generators were located in occipital and inferotemporal cortex. The frequency modulation was specific to alpha oscillations and did not occur in the delta (1–3 Hz), theta (3–7 Hz), beta (15–30 Hz), or gamma (30–50 Hz) frequency range. These results show that alpha frequency is under top-down control to increase or decrease the temporal resolution of visual perception.
TL;DR: A method for reconstructing visual stimuli from brain activity using a deep convolutional generative adversarial network capable of generating gray scale photos, similar to stimuli presented during two functional magnetic resonance imaging experiments is explored.
TL;DR: This paper shows how a compact convolutional neural network (Compact-CNN), which only requires raw EEG signals for automatic feature extraction, can be used to decode signals from a 12-class SSVEP dataset without the need for user-specific calibration.
Abstract: Steady-State Visual Evoked Potentials (SSVEPs) are neural oscillations from the parietal and occipital regions of the brain that are evoked from flickering visual stimuli. SSVEPs are robust signals measurable in the electroencephalogram (EEG) and are commonly used in brain-computer interfaces (BCIs). However, methods for high-accuracy decoding of SSVEPs usually require hand-crafted approaches that leverage domain-specific knowledge of the stimulus signals, such as specific temporal frequencies in the visual stimuli and their relative spatial arrangement. When this knowledge is unavailable, such as when SSVEP signals are acquired asynchronously, such approaches tend to fail. In this paper, we show how a compact convolutional neural network (Compact-CNN), which only requires raw EEG signals for automatic feature extraction, can be used to decode signals from a 12-class SSVEP dataset without the need for any domain-specific knowledge or calibration data. We report across subject mean accuracy of approximately 80% (chance being 8.3%) and show this is substantially better than current state-of-the-art hand-crafted approaches using canonical correlation analysis (CCA) and Combined-CCA. Furthermore, we analyze our Compact-CNN to examine the underlying feature representation, discovering that the deep learner extracts additional phase and amplitude related features associated with the structure of the dataset. We discuss how our Compact-CNN shows promise for BCI applications that allow users to freely gaze/attend to any stimulus at any time (e.g., asynchronous BCI) as well as provides a method for analyzing SSVEP signals in a way that might augment our understanding about the basic processing in the visual cortex.
TL;DR: The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.
Abstract: Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.
TL;DR: An emotion prioritization effect is discovered: for the authors' images, emotion-eliciting content attracts human attention strongly, but such advantage diminishes dramatically after initial fixation, and the proposed network outperforms the state-of-the-art on three benchmark datasets, by effectively capturing the relative importance of human attention within an image.
Abstract: Image sentiment influences visual perception. Emotion-eliciting stimuli such as happy faces and poisonous snakes are generally prioritized in human attention. However, little research has evaluated the interrelationships of image sentiment and visual saliency. In this paper, we present the first study to focus on the relation between emotional properties of an image and visual attention. We first create the EMOtional attention dataset (EMOd). It is a diverse set of emotion-eliciting images, and each image has (1) eye-tracking data collected from 16 subjects, (2) intensive image context labels including object contour, object sentiment, object semantic category, and high-level perceptual attributes such as image aesthetics and elicited emotions. We perform extensive analyses on EMOd to identify how image sentiment relates to human attention. We discover an emotion prioritization effect: for our images, emotion-eliciting content attracts human attention strongly, but such advantage diminishes dramatically after initial fixation. Aiming to model the human emotion prioritization computationally, we design a deep neural network for saliency prediction, which includes a novel subnetwork that learns the spatial and semantic context of the image scene. The proposed network outperforms the state-of-the-art on three benchmark datasets, by effectively capturing the relative importance of human attention within an image. The code, models, and dataset are available online at https://nus-sesame.top/emotionalattention/.
TL;DR: It is argued that research should focus on how color processing is adapted to the surface properties of objects in the natural environment in order to bridge the gap between the known early stages of color perception and the subjective appearance of color.
Abstract: Color has been scientifically investigated by linking color appearance to colorimetric measurements of the light that enters the eye. However, the main purpose of color perception is not to determine the properties of incident light, but to aid the visual perception of objects and materials in our environment. We review the state of the art on object colors, color constancy, and color categories to gain insight into the functional aspects of color perception. The common ground between these areas of research is that color appearance is tightly linked to the identification of objects and materials and the communication across observers. In conclusion, we argue that research should focus on how color processing is adapted to the surface properties of objects in the natural environment in order to bridge the gap between the known early stages of color perception and the subjective appearance of color.
TL;DR: In this article, the authors propose that a key mechanism is the reorganization of spatiotemporal visual fields, which transiently increases the temporal and spatial uncertainty of visual representations just before and during saccades.
Abstract: The perceptual consequences of eye movements are manifold: Each large saccade is accompanied by a drop of sensitivity to luminance-contrast, low-frequency stimuli, impacting both conscious vision and involuntary responses, including pupillary constrictions. They also produce transient distortions of space, time, and number, which cannot be attributed to the mere motion on the retinae. All these are signs that the visual system evokes active processes to predict and counteract the consequences of saccades. We propose that a key mechanism is the reorganization of spatiotemporal visual fields, which transiently increases the temporal and spatial uncertainty of visual representations just before and during saccades. On one hand, this accounts for the spatiotemporal distortions of visual perception; on the other hand, it implements a mechanism for fusing pre- and postsaccadic stimuli. This, together with the active suppression of motion signals, ensures the stability and continuity of our visual experience.
TL;DR: A group of phenomena such as vection and sensory reweighting are presented that provide information on how visual motion signals are used to maintain balance, taking into account the relationship between visual motion perception and balance control.
Abstract: Falls are the leading cause of accidental injury and death among older adults. One of three adults over the age of 65 years falls annually. As the size of elderly population increases, falls become a major concern for public health and there is a pressing need to understand the causes of falls thoroughly. While it is well documented that visual functions such as visual acuity, contrast sensitivity, and stereo acuity are correlated with fall risks, little attention has been paid to the relationship between falls and the ability of the visual system to perceive motion in the environment. The omission of visual motion perception in the literature is a critical gap because it is an essential function in maintaining balance. In the present article, we first review existing studies regarding visual risk factors for falls and the effect of ageing vision on falls. We then present a group of phenomena such as vection and sensory reweighting that provide information on how visual motion signals are used to maintain balance. We suggest that the current list of visual risk factors for falls should be elaborated by taking into account the relationship between visual motion perception and balance control.
TL;DR: Focusing on the functions of the dorsal stream in the auditory and language system, this work tries to reconcile the various models of Where, How and When into one coherent concept of sensorimotor integration.
TL;DR: This paper studies the problem of predicting head movement, head–eye motion, and scanpath of viewers when they are watching 360 degree images in the commodity HMDs and designs a model to predict the saliency maps for the first two, and the scanpaths for the last one.
Abstract: Estimating salient areas of visual stimuli which are liable to attract viewers’ visual attention is a challenging task because of the high complexity of cognitive behaviors in the brain. Many researchers have been dedicated to this field and obtained many achievements. Some application areas, ranging from computer vision, computer graphics, to multimedia processing, can benefit from saliency detection, considering that the detected saliency has depicted the visual importance of different areas of the visual stimuli. As for the 360 degree visual stimuli, images and videos should record the whole scene in the 3D world, so the resolutions of panoramic images and videos are usually very high. However, when watching 360 degree stimuli, observers can only see part of the scene in the view port, which is presented to the eyes of the observers through the Head Mounted Display (HMD). So sending the whole video, or rendering the whole scene may result in the waste of resources. Thus if we can predict the current field of view, then focuses can be put to the streaming and rendering of the scene in the current field of view. Further more, if we can predict salient areas in the scene, then more fine processing can be done to the visually important areas. The prediction of salient regions for traditional images and videos have been extensively studied. However, conventional saliency prediction methods are not fully adequate for 360 degree contents, because 360 degree stimuli own some unique characteristics. Related study in this area is limited. In this paper, we study the problem of predicting head movement, head–eye motion, and scanpath of viewers when they are watching 360 degree images in the commodity HMDs. Three types of data are specifically analyzed. The first is the head movement data, which can be regarded as the movement of the view port. The second is the head–eye motion data which combines the motion of the head and the movement of the eye within the view port. The third is the scan-paths data of observers in the entire panorama which record the position information as well as the time information. And our model is designed to predict the saliency maps for the first two, and the scanpaths for the last one. Experimental results demonstrate the effectiveness of our model.
TL;DR: Testing the effects on perceptual abilities and visually evoked electroencephalography and fMRI responses found that detection sensitivity, discrimination accuracy, and subjective visibility change in accordance with noradrenaline (NE) levels, whereas decision bias is not affected.
TL;DR: Patients with chronic dysfunction following TBI may require occupational, vestibular, cognitive and other forms of physical therapy and benefit from visual rehabilitation, including reading‐related oculomotor training and the prescribing of spectacles with a variety of tints and prism combinations.
Abstract: Traumatic brain injury (TBI) and its associated concussion are major causes of disability and death. All ages can be affected but children, young adults and the elderly are particularly susceptible. A decline in mortality has resulted in many more individuals living with a disability caused by TBI including those affecting vision. This review describes: (1) the major clinical and pathological features of TBI; (2) the visual signs and symptoms associated with the disorder; and (3) discusses the assessment of quality of life and visual rehabilitation of the patient. Defects in primary vision such as visual acuity and visual fields, eye movement including vergence, saccadic and smooth pursuit movements, and in more complex aspects of vision involving visual perception, motion vision (‘akinopsia’), and visuo-spatial function have all been reported in TBI. Eye movement dysfunction may be an early sign of TBI. Hence, TBI can result in a variety of visual problems, many patients exhibiting multiple visual defects in combination with a decline in overall health. Patients with chronic dysfunction following TBI may require occupational, vestibular, cognitive and other forms of physical therapy. Such patients may also benefit from visual rehabilitation, including reading-related oculomotor training and the prescribing of spectacles with a variety of tints and prism combinations.
TL;DR: This Perspective highlights a series of influential studies over the last five decades examining the role of the posterior parietal cortex in visual perception and motor planning and integrates long-standing views of PPC functions with more recent evidence to propose a more general model framework to explain integrative sensory, motor, and cognitive functions of the PPC.
TL;DR: This work used a novel visual “roving standard” paradigm to elicit mismatch responses in humans by unexpected changes in either color or emotional expression of faces and combined computational modeling and electroencephalography to test whether visual mismatch responses reflected trial-by-trial pwPEs.
Abstract: Predictive coding (PC) posits that the brain uses a generative model to infer the environmental causes of its sensory data and uses precision-weighted prediction errors (pwPEs) to continuously update this model. While supported by much circumstantial evidence, experimental tests grounded in formal trial-by-trial predictions are rare. One partial exception is event-related potential (ERP) studies of the auditory mismatch negativity (MMN), where computational models have found signatures of pwPEs and related model-updating processes. Here, we tested this hypothesis in the visual domain, examining possible links between visual mismatch responses and pwPEs. We used a novel visual "roving standard" paradigm to elicit mismatch responses in humans (of both sexes) by unexpected changes in either color or emotional expression of faces. Using a hierarchical Bayesian model, we simulated pwPE trajectories of a Bayes-optimal observer and used these to conduct a comprehensive trial-by-trial analysis across the time × sensor space. We found significant modulation of brain activity by both color and emotion pwPEs. The scalp distribution and timing of these single-trial pwPE responses were in agreement with visual mismatch responses obtained by traditional averaging and subtraction (deviant-minus-standard) approaches. Finally, we compared the Bayesian model to a more classical change model of MMN. Model comparison revealed that trial-wise pwPEs explained the observed mismatch responses better than categorical change detection. Our results suggest that visual mismatch responses reflect trial-wise pwPEs, as postulated by PC. These findings go beyond classical ERP analyses of visual mismatch and illustrate the utility of computational analyses for studying automatic perceptual processes.SIGNIFICANCE STATEMENT Human perception is thought to rely on a predictive model of the environment that is updated via precision-weighted prediction errors (pwPEs) when events violate expectations. This "predictive coding" view is supported by studies of the auditory mismatch negativity brain potential. However, it is less well known whether visual perception of mismatch relies on similar processes. Here we combined computational modeling and electroencephalography to test whether visual mismatch responses reflected trial-by-trial pwPEs. Applying a Bayesian model to series of face stimuli that violated expectations about color or emotional expression, we found significant modulation of brain activity by both color and emotion pwPEs. A categorical change detection model performed less convincingly. Our findings support the predictive coding interpretation of visual mismatch responses.
TL;DR: A neural signature of serial dependence is demonstrated in numerosity perception emerging early in the visual processing stream even in the absence of an explicit task, which is consistent with the view that these biases smooth out noise from neural signals to establish perceptual continuity.
Abstract: Attractive serial dependence refers to an adaptive change in the representation of sensory information, whereby a current stimulus appears to be similar to a previous one. The nature of this phenomenon is controversial, however, as serial dependence could arise from biased perceptual representations or from biased traces of working memory representation at a decisional stage. Here, we demonstrated a neural signature of serial dependence in numerosity perception emerging early in the visual processing stream even in the absence of an explicit task. Furthermore, a psychophysical experiment revealed that numerosity perception is biased by a previously presented stimulus in an attractive way, not by repulsive adaptation. These results suggest that serial dependence is a perceptual phenomenon starting from early levels of visual processing and occurring independently from a decision process, which is consistent with the view that these biases smooth out noise from neural signals to establish perceptual continuity.
TL;DR: It is shown that, compared to perception, imagery decoding becomes significant later and representations at the start of imagery already overlap with later time points, which suggests that during imagery, the entire visual representation is activated at once or that there are large differences in the timing of imagery between trials.
Abstract: Visual perception and imagery rely on similar representations in the visual cortex. During perception, visual activity is characterized by distinct processing stages, but the temporal dynamics underlying imagery remain unclear. Here, we investigated the dynamics of visual imagery in human participants using magnetoencephalography. Firstly, we show that, compared to perception, imagery decoding becomes significant later and representations at the start of imagery already overlap with later time points. This suggests that during imagery, the entire visual representation is activated at once or that there are large differences in the timing of imagery between trials. Secondly, we found consistent overlap between imagery and perceptual processing around 160 ms and from 300 ms after stimulus onset. This indicates that the N170 gets reactivated during imagery and that imagery does not rely on early perceptual representations. Together, these results provide important insights for our understanding of the neural mechanisms of visual imagery.
TL;DR: Five stages of development for human V1 that start in infancy and continue across the life span are described, compared with visual and anatomical milestones, and implications for translating treatments for visual disorders that depend on neuroplasticity of V1 function are discussed.
Abstract: The primary visual cortex (V1) is the first cortical area that processes visual information. Normal development of V1 depends on binocular vision during the critical period, and age-related losses of vision are linked with neurobiological changes in V1. Animal studies have provided important details about the neurobiological mechanisms in V1 that support normal vision or are changed by visual diseases. There is very little information, however, about those neurobiological mechanisms in human V1. That lack of information has hampered the translation of biologically inspired treatments from preclinical models to effective clinical treatments. We have studied human V1 to characterize the expression of neurobiological mechanisms that regulate visual perception and neuroplasticity. We have identified five stages of development for human V1 that start in infancy and continue across the life span. Here, we describe these stages, compare them with visual and anatomical milestones, and discuss implications for translating treatments for visual disorders that depend on neuroplasticity of V1 function.
TL;DR: The rhesus macaque superior colliculus, a structure instrumental for rapid visual exploration with saccades, detects low spatial frequencies, which are the most prevalent in natural scenes, much more rapidly than high spatial frequencies.
Abstract: Visual brain areas exhibit tuning characteristics well suited for image statistics present in our natural environment However, visual sensation is an active process, and if there are any brain areas that ought to be particularly in tune with natural scene statistics, it would be sensory-motor areas critical for guiding behavior Here we found that the rhesus macaque superior colliculus, a structure instrumental for rapid visual exploration with saccades, detects low spatial frequencies, which are the most prevalent in natural scenes, much more rapidly than high spatial frequencies Importantly, this accelerated detection happens independently of whether a neuron is more or less sensitive to low spatial frequencies to begin with At the population level, the superior colliculus additionally over-represents low spatial frequencies in neural response sensitivity, even at near-foveal eccentricities Thus, the superior colliculus possesses both temporal and response gain mechanisms for efficient gaze realignment in low-spatial-frequency-dominated natural environments
TL;DR: It is demonstrated that mice exhibit visual selective attention, paving the way to use classic attention paradigms in mice to study the genetic and neuronal circuit mechanisms of selective attention.
TL;DR: A practical consequence of the results is that it is important to control for sex in vision research, and that findings of sex differences for cognitive measures using visually based tasks should confirm that their results cannot be explained by baseline sex differences in visual perception.
Abstract: Despite well-established sex differences for cognition, audition, and somatosensation, few studies have investigated whether there are also sex differences in visual perception. We report the results of fifteen perceptual measures (such as visual acuity, visual backward masking, contrast detection threshold or motion detection) for a cohort of over 800 participants. On six of the fifteen tests, males significantly outperformed females. On no test did females significantly outperform males. Given this heterogeneity of the sex effects, it is unlikely that the sex differences are due to any single mechanism. A practical consequence of the results is that it is important to control for sex in vision research, and that findings of sex differences for cognitive measures using visually based tasks should confirm that their results cannot be explained by baseline sex differences in visual perception.
TL;DR: An efficient and robust computational framework to perform Bayesian model comparison of causal inference strategies, which incorporates a number of alternative assumptions about the observers, and investigates whether human observers’ performance in an explicit cause attribution and an implicit heading discrimination task can be modeled as a causal inference process.
Abstract: The precision of multisensory perception improves when cues arising from the same cause are integrated, such as visual and vestibular heading cues for an observer moving through a stationary environment. In order to determine how the cues should be processed, the brain must infer the causal relationship underlying the multisensory cues. In heading perception, however, it is unclear whether observers follow the Bayesian strategy, a simpler non-Bayesian heuristic, or even perform causal inference at all. We developed an efficient and robust computational framework to perform Bayesian model comparison of causal inference strategies, which incorporates a number of alternative assumptions about the observers. With this framework, we investigated whether human observers’ performance in an explicit cause attribution and an implicit heading discrimination task can be modeled as a causal inference process. In the explicit causal inference task, all subjects accounted for cue disparity when reporting judgments of common cause, although not necessarily all in a Bayesian fashion. By contrast, but in agreement with previous findings, data from the heading discrimination task only could not rule out that several of the same observers were adopting a forced-fusion strategy, whereby cues are integrated regardless of disparity. Only when we combined evidence from both tasks we were able to rule out forced-fusion in the heading discrimination task. Crucially, findings were robust across a number of variants of models and analyses. Our results demonstrate that our proposed computational framework allows researchers to ask complex questions within a rigorous Bayesian framework that accounts for parameter and model uncertainty.
TL;DR: By more intimately relating methods and theories from VWM and VLTM to one another, new advances can be made that may shed light on the kinds of representational content and structure supporting human visual memory.
Abstract: The majority of research on visual memory has taken a compartmentalized approach, focusing exclusively on memory over shorter or longer durations, that is, visual working memory (VWM) or visual episodic long-term memory (VLTM), respectively. This tutorial provides a review spanning the two areas, with readers in mind who may only be familiar with one or the other. The review is divided into six sections. It starts by distinguishing VWM and VLTM from one another, in terms of how they are generally defined and their relative functions. This is followed by a review of the major theories and methods guiding VLTM and VWM research. The final section is devoted toward identifying points of overlap and distinction across the two literatures to provide a synthesis that will inform future research in both fields. By more intimately relating methods and theories from VWM and VLTM to one another, new advances can be made that may shed light on the kinds of representational content and structure supporting human visual memory.