TL;DR: Evidence from studies of different varieties of selective attention is discussed and how these varieties alter the processing of stimuli by neurons within the visual system is examined, current knowledge of their causal basis, and methods for assessing attentional dysfunctions are examined.
Abstract: Selective visual attention describes the tendency of visual processing to be confined largely to stimuli that are relevant to behavior. It is among the most fundamental of cognitive functions, particularly in humans and other primates for whom vision is the dominant sense. We review recent progress in identifying the neural mechanisms of selective visual attention. We discuss evidence from studies of different varieties of selective attention and examine how these varieties alter the processing of stimuli by neurons within the visual system, current knowledge of their causal basis, and methods for assessing attentional dysfunctions. In addition, we identify some key questions that remain in identifying the neural mechanisms that give rise to the selective processing of visual information.
TL;DR: Zhang et al. as discussed by the authors designed three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learned a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images.
TL;DR: It is found that post-perceptual decisions about orientation were indeed systematically biased toward previous stimuli and this positive bias did not strongly depend on the spatial location of previous stimuli (replicating previous work), but observers' perception was repelled away from previous stimuli, particularly when previous stimuli were presented at the same spatial location.
TL;DR: Challenging the uni-causal and left-lateralized phonological explanation of dyslexia, the results demonstrate that learning to read depends also on an efficient right neural network for the global analysis of the visual scene.
Abstract: Individuals perceive the wor(l)d hierarchically. Firsty, the global visual scene is processed by the right hemisphere, and later, the local features are perceived by the left hemisphere. Based on this hierarchical analysis, humans evolved unique communication ability: reading. However, for about 10% of people reading acquisition is extremely difficult, they are affected by a heritable neurodevelopmental disorder called dyslexia. Differences in perceiving the wor(l)d might be one of the causes of reading disabilities. Here we show multiple causal links between the global before local perception and learning to read. Five behavioral experiments in 353 children reveal that: (i) a local before global perception characterizes three independent groups of unselected children with dyslexia; (ii) two global before local perception trainings improve reading skills in children with dyslexia; and stringently (iii) pre-reading local before global perception longitudinally predicts future poor readers. Challenging the uni-causal and left-lateralized phonological explanation of dyslexia, our results demonstrate that learning to read depends also on an efficient right neural network for the global analysis of the visual scene. These results provide new insights in learning strategies and pave the way for early identification and possible prevention programs.
TL;DR: A novel convolutional neural networks (CNN) based FR-IQA model, named Deep Image Quality Assessment (DeepQA), where the behavior of the HVS is learned from the underlying data distribution of IQA databases, which achieves the state-of-the-art prediction accuracy among FR- IQA models.
Abstract: Since human observers are the ultimate receivers of digital images, image quality metrics should be designed from a human-oriented perspective. Conventionally, a number of full-reference image quality assessment (FR-IQA) methods adopted various computational models of the human visual system (HVS) from psychological vision science research. In this paper, we propose a novel convolutional neural networks (CNN) based FR-IQA model, named Deep Image Quality Assessment (DeepQA), where the behavior of the HVS is learned from the underlying data distribution of IQA databases. Different from previous studies, our model seeks the optimal visual weight based on understanding of database information itself without any prior knowledge of the HVS. Through the experiments, we show that the predicted visual sensitivity maps agree with the human subjective opinions. In addition, DeepQA achieves the state-of-the-art prediction accuracy among FR-IQA models.
TL;DR: It is shown that the more the neural response during imagery is similar to the Neural response during perception, the more vivid or perception-like the imagery experience is.
Abstract: Research into the neural correlates of individual differences in imagery vividness point to an important role of the early visual cortex. However, there is also great fluctuation of vividness within individuals, such that only looking at differences between people necessarily obscures the picture. In this study, we show that variation in moment-to-moment experienced vividness of visual imagery, within human subjects, depends on the activity of a large network of brain areas, including frontal, parietal, and visual areas. Furthermore, using a novel multivariate analysis technique, we show that the neural overlap between imagery and perception in the entire visual system correlates with experienced imagery vividness. This shows that the neural basis of imagery vividness is much more complicated than studies of individual differences seemed to suggest. SIGNIFICANCE STATEMENT Visual imagery is the ability to visualize objects that are not in our direct line of sight: something that is important for memory, spatial reasoning, and many other tasks. It is known that the better people are at visual imagery, the better they can perform these tasks. However, the neural correlates of moment-to-moment variation in visual imagery remain unclear. In this study, we show that the more the neural response during imagery is similar to the neural response during perception, the more vivid or perception-like the imagery experience is.
TL;DR: The authors found that the visual cortex of 4-6-month-old infants contains regions that respond preferentially to abstract categories (faces and scenes), with a spatial organization similar to adults, but their response profiles and patterns of activity across multiple visual categories differ between infants and adults.
Abstract: How much of the structure of the human mind and brain is already specified at birth, and how much arises from experience? In this article, we consider the test case of extrastriate visual cortex, where a highly systematic functional organization is present in virtually every normal adult, including regions preferring behaviourally significant stimulus categories, such as faces, bodies, and scenes. Novel methods were developed to scan awake infants with fMRI, while they viewed multiple categories of visual stimuli. Here we report that the visual cortex of 4–6-month-old infants contains regions that respond preferentially to abstract categories (faces and scenes), with a spatial organization similar to adults. However, precise response profiles and patterns of activity across multiple visual categories differ between infants and adults. These results demonstrate that the large-scale organization of category preferences in visual cortex is adult-like within a few months after birth, but is subsequently refined through development. Adult visual cortex is organized into regions that respond to categories such as faces and scenes, but it is unclear if this depends on experience. Here, authors measured brain activity in 4–6 month old infants looking at faces and scenes and find that their visual cortex is organized similarly to adults.
TL;DR: The mouse visual system structure, function, and development literature is reviewed and the similarities and differences between the visual system of this and other model species are commented on.
Abstract: Vision is the sense humans rely on most to navigate the world, make decisions, and perform complex tasks. Understanding how humans see thus represents one of the most fundamental and important goals of neuroscience. The use of the mouse as a model for parsing how vision works at a fundamental level started approximately a decade ago, ushered in by the mouse's convenient size, relatively low cost, and, above all, amenability to genetic perturbations. In the course of that effort, a large cadre of new and powerful tools for in vivo labeling, monitoring, and manipulation of neurons were applied to this species. As a consequence, a significant body of work now exists on the architecture, function, and development of mouse central visual pathways. Excitingly, much of that work includes causal testing of the role of specific cell types and circuits in visual perception and behavior-something rare to find in studies of the visual system of other species. Indeed, one could argue that more information is now available about the mouse visual system than any other sensory system, in any species, including humans. As such, the mouse visual system has become a platform for multilevel analysis of the mammalian central nervous system generally. Here we review the mouse visual system structure, function, and development literature and comment on the similarities and differences between the visual system of this and other model species. We also make it a point to highlight the aspects of mouse visual circuitry that remain opaque and that are in need of additional experimentation to enrich our understanding of how vision works on a broad scale.
TL;DR: The directed coupling (effective connectivity) between fronto-parietal and visual areas during perception and imagery is examined to highlight the importance of top-down processing in internally as well as externally driven visual experience.
Abstract: Research suggests that perception and imagination engage neuronal representations in the same visual areas. However, the underlying mechanisms that differentiate sensory perception from imagination remain unclear. Here, we examine the directed coupling (effective connectivity) between fronto-parietal and visual areas during perception and imagery. We found an increase in bottom-up coupling during perception relative to baseline and an increase in top-down coupling during both perception and imagery, with a much stronger increase during imagery. Modulation of the coupling from frontal to early visual areas was common to both perception and imagery. Furthermore, we show that the experienced vividness during imagery was selectively associated with increases in top-down connectivity to early visual cortex. These results highlight the importance of top-down processing in internally as well as externally driven visual experience.
TL;DR: This review describes recent neuroimaging findings regarding the macro- and microscopic anatomical features of the ventral face network, the characteristics of white matter connections, and basic computations performed by population receptive fields within face-selective regions composing this network.
Abstract: Face perception is critical for normal social functioning and is mediated by a network of regions in the ventral visual stream. In this review, we describe recent neuroimaging findings regarding the macro- and microscopic anatomical features of the ventral face network, the characteristics of white matter connections, and basic computations performed by population receptive fields within face-selective regions composing this network. We emphasize the importance of the neural tissue properties and white matter connections of each region, as these anatomical properties may be tightly linked to the functional characteristics of the ventral face network. We end by considering how empirical investigations of the neural architecture of the face network may inform the development of computational models and shed light on how computations in the face network enable efficient face perception.
TL;DR: In this paper, the authors analyzed how the number of visual representations affects the role these representational competencies play during students' learning of content knowledge, and compared two common scenarios: text plus a single type of visual representation (T+SV) and text plus multiple types of VMs (T +MV).
Abstract: Visual representations play a critical role in enhancing science, technology, engineering, and mathematics (STEM) learning. Educational psychology research shows that adding visual representations to text can enhance students’ learning of content knowledge, compared to text-only. But should students learn with a single type of visual representation or with multiple different types of visual representations? This article addresses this question from the perspective of the representation dilemma, namely that students often learn content they do not yet understand from representations they do not yet understand. To benefit from visual representations, students therefore need representational competencies, that is, knowledge about how visual representations depict information about the content. This article reviews literature on representational competencies involved in students’ learning of content knowledge. Building on this review, this article analyzes how the number of visual representations affects the role these representational competencies play during students’ learning of content knowledge. To this end, the article compares two common scenarios: text plus a single type of visual representations (T+SV) and text plus multiple types of visual representations (T+MV). The comparison yields seven hypotheses that describe under which conditions T+MV scenarios are more effective than T+SV scenarios. Finally, the article reviews empirical evidence for each hypothesis and discusses open questions about the representation dilemma.
TL;DR: In this article, the authors use conditional generative adversarial networks to achieve cross-modal audio-visual generation of musical performances, which can generate one modality from the other modality, i.e., visual/audio, to a good extent.
Abstract: Cross-modal audio-visual perception has been a long-lasting topic in psychology and neurology, and various studies have discovered strong correlations in human perception of auditory and visual stimuli. Despite work on computational multimodal modeling, the problem of cross-modal audio-visual generation has not been systematically studied in the literature. In this paper, we make the first attempt to solve this cross-modal generation problem leveraging the power of deep generative adversarial training. Specifically, we use conditional generative adversarial networks to achieve cross-modal audio-visual generation of musical performances. We explore different encoding methods for audio and visual signals, and work on two scenarios: instrument-oriented generation and pose-oriented generation. Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments. Our experiments using both classification and human evaluation demonstrate that our model has the ability to generate one modality, i.e., audio/visual, from the other modality, i.e., visual/audio, to a good extent. Our experiments on various design choices along with the datasets will facilitate future research in this new problem space.
TL;DR: It is suggested that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition.
Abstract: Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis.This article is part of the themed issue 'Auditory and visual scene analysis'.
TL;DR: Performance improvements resulting from reweighting or readout of sensory inputs to decision provide a strong theoretical framework for interpreting perceptual learning and transfer that may prove useful in optimizing learning in real-world applications.
Abstract: Visual perceptual learning through practice or training can significantly improve performance on visual tasks. Originally seen as a manifestation of plasticity in the primary visual cortex, perceptual learning is more readily understood as improvements in the function of brain networks that integrate processes, including sensory representations, decision, attention, and reward, and balance plasticity with system stability. This review considers the primary phenomena of perceptual learning, theories of perceptual learning, and perceptual learning's effect on signal and noise in visual processing and decision. Models, especially computational models, play a key role in behavioral and physiological investigations of the mechanisms of perceptual learning and for understanding, predicting, and optimizing human perceptual processes, learning, and performance. Performance improvements resulting from reweighting or readout of sensory inputs to decision provide a strong theoretical framework for interpreting perceptual learning and transfer that may prove useful in optimizing learning in real-world applications.
TL;DR: Evidence is provided that prestimulus α power influences the level of subjective awareness of threshold visual stimuli but does not influence visual sensitivity when a decision has to be made regarding stimulus features, finding a clear dissociation between the influence of ongoing neural activity on conscious awareness and objective performance.
Abstract: Prestimulus oscillatory neural activity has been linked to perceptual outcomes during performance of psychophysical detection and discrimination tasks. Specifically, the power and phase of low frequency oscillations have been found to predict whether an upcoming weak visual target will be detected or not. However, the mechanisms by which baseline oscillatory activity influences perception remain unclear. Recent studies suggest that the frequently reported negative relationship between α power and stimulus detection may be explained by changes in detection criterion (i.e., increased target present responses regardless of whether the target was present/absent) driven by the state of neural excitability, rather than changes in visual sensitivity (i.e., more veridical percepts). Here, we recorded EEG while human participants performed a luminance discrimination task on perithreshold stimuli in combination with single-trial ratings of perceptual awareness. Our aim was to investigate whether the power and/or phase of prestimulus oscillatory activity predict discrimination accuracy and/or perceptual awareness on a trial-by-trial basis. Prestimulus power (3-28 Hz) was inversely related to perceptual awareness ratings (i.e., higher ratings in states of low prestimulus power/high excitability) but did not predict discrimination accuracy. In contrast, prestimulus oscillatory phase did not predict awareness ratings or accuracy in any frequency band. These results provide evidence that prestimulus α power influences the level of subjective awareness of threshold visual stimuli but does not influence visual sensitivity when a decision has to be made regarding stimulus features. Hence, we find a clear dissociation between the influence of ongoing neural activity on conscious awareness and objective performance.
TL;DR: This paper proposes an image captioning system that exploits the parallel structures between images and sentences and makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image.
Abstract: Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image captioning system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifts among the visual regions—such transitions impose a thread of ordering in visual perception. This alignment characterizes the flow of latent meaning, which encodes what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets, using both automatic evaluation metrics and human evaluation. We show that either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.
TL;DR: Ultra-fast functional magnetic resonance imaging is used to measure BOLD activity at precisely defined receptive field locations in visual cortex (V1) of human volunteers to find that after familiarizing subjects with a spatial sequence, flashing only the starting point of the sequence triggers an activity wave in V1 that resembles the full stimulus sequence.
Abstract: Perception is guided by the anticipation of future events. It has been hypothesized that this process may be implemented by pattern completion in early visual cortex, in which a stimulus sequence is recreated after only a subset of the visual input is provided. Here we test this hypothesis using ultra-fast functional magnetic resonance imaging to measure BOLD activity at precisely defined receptive field locations in visual cortex (V1) of human volunteers. We find that after familiarizing subjects with a spatial sequence, flashing only the starting point of the sequence triggers an activity wave in V1 that resembles the full stimulus sequence. This preplay activity is temporally compressed compared to the actual stimulus sequence and remains present even when attention is diverted from the stimulus sequence. Preplay might therefore constitute an automatic prediction mechanism for temporal sequences in V1.
TL;DR: It is found that behavioral responses made immediately after viewing a stimulus show evidence of adaptation, but not attractive serial dependence, and it is demonstrated that when leading mathematical models of working memory are adjusted to account for these trial-history effects, their fit to behavioral data is substantially improved.
Abstract: Recent experiments have shown that visual cognition blends current input with that from the recent past to guide ongoing decision making. This serial dependence appears to exploit the temporal autocorrelation normally present in visual scenes to promote perceptual stability. While this benefit has been assumed, evidence that serial dependence directly alters stimulus perception has been limited. In the present study, we parametrically vary the delay between stimulus and response in a spatial delayed response task to explore the trajectory of serial dependence from the moment of perception into post-perceptual visual working memory. We find that behavioral responses made immediately after viewing a stimulus show evidence of adaptation, but not attractive serial dependence. Only as the memory period lengthens is a blending of past and present information apparent in behavior, reaching its maximum with a delay of six seconds. These results dovetail with other recent findings to bolster the interpretation that serial dependence is a phenomenon of mnemonic rather than perceptual processes. However, even while this pattern of effects in group-averaged data has now been found consistently, we show that the relative strengths of adaptation and serial dependence vary widely across individuals. Finally, we demonstrate that when leading mathematical models of working memory are adjusted to account for these trial-history effects, their fit to behavioral data is substantially improved.
TL;DR: The literature reviewed revealed no clear evidence for an effect of NH on the development of the eye and optic nerve, and it is currently unknown whether NH affects visual function in mid-to-late childhood when many visual functions reach adult levels.
Abstract: Background: Many newborn babies experience low blood glucose concentrations, a condition referred to as neonatal hypoglycaemia (NH). The effect of NH on visual de
TL;DR: It is shown that locomotion reduces by at least a factor of 3 the time needed for information to accumulate in the visual cortex that allows the distinction of different visual stimuli, and that the effect of locomotion is to increase information in cells of all layers of thevisual cortex.
Abstract: Neurons in mouse primary visual cortex (V1) are selective for particular properties of visual stimuli. Locomotion causes a change in cortical state that leaves their selectivity unchanged but strengthens their responses. Both locomotion and the change in cortical state are thought to be initiated by projections from the mesencephalic locomotor region, the latter through a disinhibitory circuit in V1. By recording simultaneously from a large number of single neurons in alert mice viewing moving gratings, we investigated the relationship between locomotion and the information contained within the neural population. We found that locomotion improved encoding of visual stimuli in V1 by two mechanisms. First, locomotion-induced increases in firing rates enhanced the mutual information between visual stimuli and single neuron responses over a fixed window of time. Second, stimulus discriminability was improved, even for fixed population firing rates, because of a decrease in noise correlations across the population. These two mechanisms contributed differently to improvements in discriminability across cortical layers, with changes in firing rates most important in the upper layers and changes in noise correlations most important in layer V. Together, these changes resulted in a threefold to fivefold reduction in the time needed to precisely encode grating direction and orientation. These results support the hypothesis that cortical state shifts during locomotion to accommodate an increased load on the visual system when mice are moving.SIGNIFICANCE STATEMENT This paper contains three novel findings about the representation of information in neurons within the primary visual cortex of the mouse. First, we show that locomotion reduces by at least a factor of 3 the time needed for information to accumulate in the visual cortex that allows the distinction of different visual stimuli. Second, we show that the effect of locomotion is to increase information in cells of all layers of the visual cortex. Third, we show that the means by which information is enhanced by locomotion differs between the upper layers, where the major effect is the increasing of firing rates, and in layer V, where the major effect is the reduction in noise correlations.
TL;DR: Eye tracking literature in radiology indicates several search patterns are related to high levels of expertise, but teaching novices to search as an expert may not be effective and Experimental research is needed to find out which search strategies can improve image perception in learners.
Abstract: Eye tracking research has been conducted for decades to gain understanding of visual diagnosis such as in radiology. For educational purposes, it is important to identify visual search patterns that are related to high perceptual performance and to identify effective teaching strategies. This review of eye-tracking literature in the radiology domain aims to identify visual search patterns associated with high perceptual performance. Databases PubMed, EMBASE, ERIC, PsycINFO, Scopus and Web of Science were searched using 'visual perception' OR 'eye tracking' AND 'radiology' and synonyms. Two authors independently screened search results and included eye tracking studies concerning visual skills in radiology published between January 1, 1994 and July 31, 2015. Two authors independently assessed study quality with the Medical Education Research Study Quality Instrument, and extracted study data with respect to design, participant and task characteristics, and variables. A thematic analysis was conducted to extract and arrange study results, and a textual narrative synthesis was applied for data integration and interpretation. The search resulted in 22 relevant full-text articles. Thematic analysis resulted in six themes that informed the relation between visual search and level of expertise: (1) time on task, (2) eye movement characteristics of experts, (3) differences in visual attention, (4) visual search patterns, (5) search patterns in cross sectional stack imaging, and (6) teaching visual search strategies. Expert search was found to be characterized by a global-focal search pattern, which represents an initial global impression, followed by a detailed, focal search-to-find mode. Specific task-related search patterns, like drilling through CT scans and systematic search in chest X-rays, were found to be related to high expert levels. One study investigated teaching of visual search strategies, and did not find a significant effect on perceptual performance. Eye tracking literature in radiology indicates several search patterns are related to high levels of expertise, but teaching novices to search as an expert may not be effective. Experimental research is needed to find out which search strategies can improve image perception in learners.
TL;DR: It is shown that training-induced increased sensitivity to a low-level feature, namely low spatial frequency (LSF), alters neural processing of this feature in high-level visual stimuli and suggests that SF discrimination learning transfers from simple stimuli to complex objects.
Abstract: Perception of visual stimuli improves with training, but improvements are specific for trained stimuli rendering the development of generic training programs challenging. It remains unknown to which extent training of low-level visual features transfers to high-level visual perception, and whether this is accompanied by neuroplastic changes. The current event-related potential (ERP) study showed that training-induced increased sensitivity to a low-level feature, namely low spatial frequency (LSF), alters neural processing of this feature in high-level visual stimuli. Specifically, neural activity related to face processing (N170), was decreased for low (trained) but not high (untrained) SF content in faces following LSF training. These novel results suggest that: (1) SF discrimination learning transfers from simple stimuli to complex objects; and that (2) training the use of specific SF information affects neural processing of facial information. These findings may open up a new avenue to improve face recognition skills in individuals with atypical SF processing, such as in cataract or Autism Spectrum Disorder (ASD).
TL;DR: Electroencephalography spectral changes during a sustained attention task within a real classroom environment is investigated to establish a basis for developing a system capable of estimating the level of visual attention during real classroom activities by monitoring changes in the EEG spectra.
Abstract: Sustained attention is a process that enables the maintenance of response persistence and continuous effort over extended periods of time. Performing attention-related tasks in real conditions involves the need to ignore a variety of distractions and inhibit attention shifts to irrelevant activities. This study investigates EEG spectral changes during a sustained attention task within a real classroom environment. Eighteen healthy students were instructed to recognize as fast as possible special visual targets that were displayed during regular university lectures. Sorting their EEG spectra with respect to response times, which indicated the level of visual alertness to randomly introduced visual stimuli, revealed significant changes in the brain oscillation patterns. The results of power-frequency analysis demonstrated a relationship between variations in the EEG spectral dynamics and impaired performance in the sustained attention task. Across subjects and sessions, prolongation of the response time was preceded by an increase in the delta and theta EEG powers over the occipital region, and decrease in the beta power over the occipital and temporal regions. Meanwhile, implementation of the complex attention task paradigm into a real-world classroom setting makes it possible to investigate specific mutual links between brain activities and factors that cause impaired behavioral performance, such as development and manifestation of classroom mental fatigue. The findings of the study set a basis for developing a system capable of estimating the level of visual attention during real classroom activities by monitoring changes in the EEG spectra.
TL;DR: The proposed model utilizes the recent studied attention mechanism to jointly discover the relevant local regions and build a sentiment classifier on top of these local regions, capable of automatically discovering sentimental local regions of given images and outperforms existing state-of-the-art algorithms to visual sentiment analysis.
Abstract: Visual sentiment analysis, which studies the emotional response of humans on visual stimuli such as images and videos, has been an interesting and challenging problem. It tries to understand the high-level content of visual data. The success of current models can be attributed to the development of robust algorithms from computer vision. Most of the existing models try to solve the problem by proposing either robust features or more complex models. In particular, visual features from the whole image or video are the main proposed inputs. Little attention has been paid to local areas, which we believe is pretty relevant to human's emotional response to the whole image. In this work, we study the impact of local image regions on visual sentiment analysis. Our proposed model utilizes the recent studied attention mechanism to jointly discover the relevant local regions and build a sentiment classifier on top of these local regions. The experimental results suggest that 1) our model is capable of automatically discovering sentimental local regions of given images and 2) it outperforms existing state-of-the-art algorithms to visual sentiment analysis.
TL;DR: This paper emphasizes the significance of synthetic data to vision system design and suggests a novel research methodology for perception and understanding of complex scenes.
Abstract: In the study of image and vision computing, the generalization capability of an algorithm often determines whether it is able to work well in complex scenes. The goal of this review article is to survey the use of photorealistic image synthesis methods in addressing the problems of visual perception and understanding. Currently, the ACP Methodology comprising artificial systems, computational experiments, and parallel execution is playing an essential role in modeling and control of complex systems. This paper extends the ACP Methodology into the computer vision field, by proposing the concept and basic framework of Parallel Vision. In this paper, we first review previous works related to Parallel Vision, in terms of synthetic data generation and utilization. We detail the utility of synthetic data for feature analysis, object analysis, scene analysis, and other analyses. Then we propose the basic framework of Parallel Vision, which is composed of an ACP trilogy (artificial scenes, computational experiments, and parallel execution). We also present some in-depth thoughts and perspectives on Parallel Vision. This paper emphasizes the significance of synthetic data to vision system design and suggests a novel research methodology for perception and understanding of complex scenes.
TL;DR: The results indicate that although a lack of mental imagery can be compensated for under some conditions, mental imagery has a functional role in other areas of visual cognition, one of which is high-precision working memory.
TL;DR: This work uses a novel approach in human fMRI and MEG studies to reveal supra-additive scene-object interactions and characterize the functional neuroanatomy and neural dynamics of such scene-based object facilitation.
Abstract: Scenes strongly facilitate object recognition, such as when we make out the shape of a distant boat on the water. Yet, although known to interact in perception, neuroimaging research has primarily provided evidence for separate scene- and object-selective cortical pathways. This raises the question of how these pathways interact to support context-based perception. Here we used a novel approach in human fMRI and MEG studies to reveal supra-additive scene-object interactions. Participants (men and women) viewed degraded objects that were hard to recognize when presented in isolation but easy to recognize within their original scene context, in which no other associated objects were present. fMRI decoding showed that the multivariate representation of the objects' category (animate/inanimate) in object-selective cortex was strongly enhanced by the presence of scene context, even though the scenes alone did not evoke category-selective response patterns. This effect in object-selective cortex was correlated with concurrent activity in scene-selective regions. MEG decoding results revealed that scene-based facilitation of object processing peaked at 320 ms after stimulus onset, 100 ms later than peak decoding of intact objects. Together, results suggest that expectations derived from scene information, processed in scene-selective cortex, feed back to shape object representations in visual cortex. These findings characterize, in space and time, functional interactions between scene- and object-processing pathways.SIGNIFICANCE STATEMENT Although scenes and objects are known to contextually interact in visual perception, the study of high-level vision has mostly focused on the dissociation between their selective neural pathways. The current findings are the first to reveal direct facilitation of object recognition and neural representation by scene background, even in the absence of contextually associated objects. Using a multivariate approach to both fMRI and MEG, we characterize the functional neuroanatomy and neural dynamics of such scene-based object facilitation. Finally, the correlation of this effect with scene-selective activity suggests that, although functionally distinct, scene and object processing pathways do interact at a perceptual level to fill in for insufficient visual detail.
TL;DR: In particular, vehicle and pedestrian detection, lane detection and drivable surface detection are presented as three important applications for visual perception and CPU, GPU, FPGA and ASIC are discussed as the major components to form an efficient hardware platform for real-time operation.
TL;DR: It is proposed that coordinated visual attention between parents and toddlers is primarily a sensory-motor behavior thatemerges from multiple pathways to the same functional end.
TL;DR: This sourcebook for anatomic studies in the neuropsychology of visual perception contains chapters on disorders of visual agnosias, impaired object perception and spatial neglect, and abnormal visual imagery.
Abstract: This sourcebook for anatomic studies in the neuropsychology of visual perception contains chapters on disorders of visual agnosias, impaired object perception and spatial neglect, and abnormal visual imagery. The neurological basis of visual perception and the disorders that result from brain damage are discussed.