TL;DR: In this paper, a large-scale dataset of tens of thousands of units in six cortical and two thalamic regions in the brains of mice responding to a battery of visual stimuli is presented.
Abstract: The anatomy of the mammalian visual system, from the retina to the neocortex, is organized hierarchically1. However, direct observation of cellular-level functional interactions across this hierarchy is lacking due to the challenge of simultaneously recording activity across numerous regions. Here we describe a large, open dataset-part of the Allen Brain Observatory2-that surveys spiking from tens of thousands of units in six cortical and two thalamic regions in the brains of mice responding to a battery of visual stimuli. Using cross-correlation analysis, we reveal that the organization of inter-area functional connectivity during visual stimulation mirrors the anatomical hierarchy from the Allen Mouse Brain Connectivity Atlas3. We find that four classical hierarchical measures-response latency, receptive-field size, phase-locking to drifting gratings and response decay timescale-are all correlated with the hierarchy. Moreover, recordings obtained during a visual task reveal that the correlation between neural activity and behavioural choice also increases along the hierarchy. Our study provides a foundation for understanding coding and signal propagation across hierarchically organized cortical and thalamic visual areas.
TL;DR: This article showed that prefrontal cortex acts as a domain-general controller for both attention and selection in rhesus monkeys, and that attention facilitated behavior by enhancing and transforming the representation of the selected memory or attended stimulus.
Abstract: Cognitive control guides behaviour by controlling what, when, and how information is represented in the brain1. For example, attention controls sensory processing; top-down signals from prefrontal and parietal cortex strengthen the representation of task-relevant stimuli2-4. A similar 'selection' mechanism is thought to control the representations held 'in mind'-in working memory5-10. Here we show that shared neural mechanisms underlie the selection of items from working memory and attention to sensory stimuli. We trained rhesus monkeys to switch between two tasks, either selecting one item from a set of items held in working memory or attending to one stimulus from a set of visual stimuli. Neural recordings showed that similar representations in prefrontal cortex encoded the control of both selection and attention, suggesting that prefrontal cortex acts as a domain-general controller. By contrast, both attention and selection were represented independently in parietal and visual cortex. Both selection and attention facilitated behaviour by enhancing and transforming the representation of the selected memory or attended stimulus. Specifically, during the selection task, memory items were initially represented in independent subspaces of neural activity in prefrontal cortex. Selecting an item caused its representation to transform from its own subspace to a new subspace used to guide behaviour. A similar transformation occurred for attention. Our results suggest that prefrontal cortex controls cognition by dynamically transforming representations to control what and when cognitive computations are engaged.
TL;DR: The previous research and application of visual perception in different industrial fields such as product surface defect detection, intelligent agricultural production, intelligent driving, image synthesis, and event reconstruction are reviewed.
Abstract: Visual perception refers to the process of organizing, identifying, and interpreting visual information in environmental awareness and understanding. With the rapid progress of multimedia acquisition technology, research on visual perception has been a hot topic in the academical field and industrial applications. Especially after the introduction of artificial intelligence theory, intelligent visual perception has been widely used to promote the development of industrial production towards intelligence. In this article, we review the previous research and application of visual perception in different industrial fields such as product surface defect detection, intelligent agricultural production, intelligent driving, image synthesis, and event reconstruction. The applications basically cover most of the intelligent visual perception processing technologies. Through this survey, it will provide a comprehensive reference for research on this direction. Finally, this article also summarizes the current challenges of visual perception and predicts its future development trends.
TL;DR: In this article, the authors introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets and show that existing state-of-the-art short-term models are limited for long-term tasks.
Abstract: Our world offers a never-ending stream of visual stimuli, yet today’s vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.
TL;DR: Findings illustrate the need for communication-friendly face-coverings, and emphasise the need to be communication-aware when wearing a face covering, to understand the impact of face coverings on hearing and communication.
Abstract: To understand the impact of face coverings on hearing and communication. An online survey consisting of closed-set and open-ended questions distributed within the UK to gain insights into experienc...
TL;DR: ArtEmis as mentioned in this paper is a large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language.
Abstract: We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate below, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. Our dataset, termed ArtEmis, contains 455K emotion attributions and explanations from humans, on 80K artworks from WikiArt. Building on this data, we train and demonstrate a series of captioning systems capable of expressing and explaining emotions from visual stimuli. Remarkably, the captions produced by these systems often succeed in reflecting the semantic and abstract content of the image, going well beyond systems trained on existing datasets. The collected dataset and developed methods are available at https://artemisdataset.org.
TL;DR: In this article, the authors analyzed large-scale optical and electrophysiological recordings from six visual cortical areas in behaving mice that were repeatedly presented with the same natural movies and found representational drift over timescales spanning minutes to days across multiple visual areas, cortical layers, and cell types.
TL;DR: The superior colliculus is a conserved sensorimotor structure that integrates visual and other sensory information to drive reflexive behaviors as mentioned in this paper, and the evidence for this is strong and compelling.
TL;DR: A hierarchical visual perception (HVP) module to imitate the primate visual cortex for hierarchical perception learning is proposed, and with the HVP module incorporated, a lightweight SOD network is designed, namely, HVPNet.
Abstract: Recently, salient object detection (SOD) has witnessed vast progress with the rapid development of convolutional neural networks (CNNs). However, the improvement of SOD accuracy comes with the increase in network depth and width, resulting in large network size and heavy computational overhead. This prevents state-of-the-art SOD methods from being deployed into practical platforms, especially mobile devices. To promote the deployment of real-world SOD applications, we aim at developing a lightweight SOD model in this article. Our observation comes from that the primate visual system processes visual signals hierarchically with different receptive fields and eccentricities in different visual cortex areas. Inspired by this, we propose a hierarchical visual perception (HVP) module to imitate the primate visual cortex for hierarchical perception learning. With the HVP module incorporated, we design a lightweight SOD network, namely, HVPNet. Extensive experiments on popular benchmarks demonstrate that HVPNet achieves highly competitive accuracy compared with state-of-the-art SOD methods while running at 4.3 frames/s CPU speed and 333.2 frames/s GPU speed with only 1.23M parameters.
TL;DR: This book discusses the role of attention in perception, the nature and Function of Memory, and theories of Cognition: From Metaphors to Computational Models.
Abstract: Introduction to Cognitive Psychology. Cognitive Processes. Experimental Psychology. Computer Models of Information Processing. Cognitive Neuropsychology. Minds, Brains and Computers. Perception and Attention. The Biological Bases of Perception. Psychological Approaches to Visual Perception. Visual Illusions. Marr's Theory. Object Recognition Processes. Perception: A Summary. Attention. The Role of Attention in Perception. Automaticity. The Spotlight Model of Visual Attention. Visual Attention. Perception, Attention and Consciousness. Disorders of Perception and Attention. Introduction. Blindsight. Unilateral Spatial Neglect. Visual Agnosia. Disorders of Face Processing - Prosopagnosia and Related Conditions. Memory. The Nature and Function of Memory. Multistore Models and Working Memory. Ebbinghaus and the First Long-term Memory Experiments. The Role of Knowledge, Meaning, and Schemas in Memory. Input Processing and Encoding. Retrieval Cues and Feature Overlap. Retrieval Mechanisms in Recall and Recognition. Automatic and Controlled Memory Processes. Memory in Real Life. Disorders of Memory. The Tragic Effects of Amnesia. The Causes of Organic Amnesia. Short-term and Long-term Memory Impairments. Anterograde and Retrograde Amnesia. Memory Functions Preserved in Amnesia. Other Types of Amnesia. Thinking, Problem-solving and Reasoning. Introduction. Early Research on Problem-solving. Problem-space Theory of Problem-solving. Problem-solving and Knowledge. Deductive and Inductive Reasoning. Statistical Reasoning. Everyday Reasoning. Disorders of Thinking. Executive Function and the Frontal Lobes. Introduction. The frontal Lobes. Problem-solving and Reasoning Deficits. The Executive Functions of the Frontal Lobes. Language. Introduction. The Language System. Psychology and Linguistics. Recognising Spoken and Written Words. Production of Spoken Words. Sentence Comprehension. Sentence Production. Discourse Level. Disorders of Language. Introduction. Historical Perspective. The Psycholinguistic. Disruptions to Language Processing at Word Level. Disruption to Processing of Syntax. Disruption to Processing of Discourse. Theories of Cognition: From Metaphors to Computational Models. Symbol-based Systems. Connectionist Systems. Symbols and Neurons Compared.
TL;DR: In this paper, the authors evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis, and show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations.
Abstract: Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis Despite the presence of some CNN-brain correspondence and CNNs’ impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing This indicates some fundamental differences exist in how the brain and CNNs represent visual information Convolutional neural networks are increasingly used to model human vision Here, the authors compare the performance of 14 different CNNs and human fMRI responses to real-world and artificial objects to show some fundamental differences exist between them
TL;DR: In this article, the authors proposed a fully memristor-based artificial visual perception nervous system (AVPNS) which consists of a quantum-dot-based photoelectric memrisor and a nanosheet-based threshold-switching (TS) memrisors.
Abstract: The visual perception system is the most important system for human learning since it receives over 80% of the learning information from the outside world. With the exponential growth of artificial intelligence technology, there is a pressing need for high-energy and area-efficiency visual perception systems capable of processing efficiently the received natural information. Currently, memristors with their elaborate dynamics, excellent scalability, and information (e.g., visual, pressure, sound, etc.) perception ability exhibit tremendous potential for the application of visual perception. Here, we propose a fully memristor-based artificial visual perception nervous system (AVPNS) which consists of a quantum-dot-based photoelectric memristor and a nanosheet-based threshold-switching (TS) memristor. We use a photoelectric and a TS memristor to implement the synapse and leaky integrate-and-fire (LIF) neuron functions, respectively. With the proposed AVPNS we successfully demonstrate the biological image perception, integration and fire, as well as the biosensitization process. Furthermore, the self-regulation process of a speed meeting control system in driverless automobiles can be accurately and conceptually emulated by this system. Our work shows that the functions of the biological visual nervous system may be systematically emulated by a memristor-based hardware system, thus expanding the spectrum of memristor applications in artificial intelligence.
TL;DR: A prototype neuromorphic vision system is proposed and demonstrated by networking a retinomorphic sensor with a memristive crossbar that allows for fast letter recognition and object tracking and indicates the capabilities of image sensing, processing and recognition in the full analog regime.
Abstract: Compared to human vision, conventional machine vision composed of an image sensor and processor suffers from high latency and large power consumption due to physically separated image sensing and processing. A neuromorphic vision system with brain-inspired visual perception provides a promising solution to the problem. Here we propose and demonstrate a prototype neuromorphic vision system by networking a retinomorphic sensor with a memristive crossbar. We fabricate the retinomorphic sensor by using WSe2/h-BN/Al2O3 van der Waals heterostructures with gate-tunable photoresponses, to closely mimic the human retinal capabilities in simultaneously sensing and processing images. We then network the sensor with a large-scale Pt/Ta/HfO2/Ta one-transistor-one-resistor (1T1R) memristive crossbar, which plays a similar role to the visual cortex in the human brain. The realized neuromorphic vision system allows for fast letter recognition and object tracking, indicating the capabilities of image sensing, processing and recognition in the full analog regime. Our work suggests that such a neuromorphic vision system may open up unprecedented opportunities in future visual perception applications.
TL;DR: The ecoset dataset as discussed by the authors is a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans.
Abstract: Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce ecoset, a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.
TL;DR: This work proposes a model, EEG-ChannelNet, to learn a brain manifold for EEG classification and introduces a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations.
Abstract: This work presents a novel method of exploring human brain-visual representations, with a view towards replicating these processes in machines. The core idea is to learn plausible computational and biological representations by correlating human neural activity and natural images. Thus, we first propose a model, EEG-ChannelNet , to learn a brain manifold for EEG classification. After verifying that visual information can be extracted from EEG data, we introduce a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations. We then carry out image classification and saliency detection on the learned manifold. Performance analyses show that our approach satisfactorily decodes visual information from neural signals. This, in turn, can be used to effectively supervise the training of deep learning models, as demonstrated by the high performance of image classification and saliency detection on out-of-training classes. The obtained results show that the learned brain-visual features lead to improved performance and simultaneously bring deep models more in line with cognitive neuroscience work related to visual perception and attention.
TL;DR: In this paper, the authors measured visual acuity at isoeccentric peripheral locations (10 deg eccentricity), every 15° of polar angle, on each trial, observers judged the orientation (± 45°) of one of four equidistant, suprathreshold grating stimuli varying in spatial frequency (SF).
Abstract: Human vision is heterogeneous around the visual field. At a fixed eccentricity, performance is better along the horizontal than the vertical meridian and along the lower than the upper vertical meridian. These asymmetric patterns, termed performance fields, have been found in numerous visual tasks, including those mediated by contrast sensitivity and spatial resolution. However, it is unknown whether spatial resolution asymmetries are confined to the cardinal meridians or whether and how far they extend into the upper and lower hemifields. Here, we measured visual acuity at isoeccentric peripheral locations (10 deg eccentricity), every 15° of polar angle. On each trial, observers judged the orientation (± 45°) of one of four equidistant, suprathreshold grating stimuli varying in spatial frequency (SF). On each block, we measured performance as a function of stimulus SF at 4 of 24 isoeccentric locations. We estimated the 75%-correct SF threshold, SF cutoff point (i.e., chance-level), and slope of the psychometric function for each location. We found higher SF estimates (i.e., better acuity) for the horizontal than the vertical meridian and for the lower than the upper vertical meridian. These asymmetries were most pronounced at the cardinal meridians and decreased gradually as the angular distance from the vertical meridian increased. This gradual change in acuity with polar angle reflected a shift of the psychometric function without changes in slope. The same pattern was found under binocular and monocular viewing conditions. These findings advance our understanding of visual processing around the visual field and help constrain models of visual perception.
TL;DR: The experimental results not only could provide strong support for the modularity theory about the brain cognitive function, but show the superiority of the proposed Bi-LSTM model with attention mechanism again.
TL;DR: In this paper, six empirical studies present examples of how to capture visual perception in the complexity of a classroom lesson, and one theoretical contribution provides the very first model of teachers' cognitions during teaching in relation to their visual perception, which in turn will allow future research to move beyond explorations towards hypothesis testing.
Abstract: Classrooms full of pupils can be very overwhelming, both for teachers and students, as well as for their joint interactions. It is thus crucial that both can distil the relevant information in this complex scenario and interpret it appropriately. This distilling and interpreting happen to a large extent via visual perception, which is the core focus of the current Special Issue. Six empirical studies present examples of how to capture visual perception in the complexity of a classroom lesson. These examples open up new avenues that go beyond studying perception in restricted and artificial laboratory scenarios: some using video recordings from authentic lessons to others studying actual classrooms. This movement towards more realistic scenarios allows to study the visual perception in classrooms from new perspectives, namely that of the teachers, the learners, and their interactions. This in turn enables to shed novel light onto well-established theoretical concepts, namely students’ engagement during actual lessons, teachers’ professional vision while teaching, and establishment of joint attention between teachers and students in a lesson. Additionally, one theoretical contribution provides the very first model of teachers’ cognitions during teaching in relation to their visual perception, which in turn will allow future research to move beyond explorations towards hypothesis testing. However, to fully thrive, this field of research has to address two crucial challenges: (i) the heterogeneity of its methodological approaches (e.g., varying age groups, subjects taught, lesson formats) and (ii) the recording and processing of personal data of many people (often minors). Hence, these new approaches bear not only new chances for insights but also new responsibilities for the researchers.
TL;DR: In this article, the authors developed and tested a computational framework to investigate how aesthetic values are formed, and they showed that it is possible to explain human preferences for a visual art piece based on a mixture of low and high-level features of the image.
Abstract: It is an open question whether preferences for visual art can be lawfully predicted from the basic constituent elements of a visual image. Here, we developed and tested a computational framework to investigate how aesthetic values are formed. We show that it is possible to explain human preferences for a visual art piece based on a mixture of low- and high-level features of the image. Subjective value ratings could be predicted not only within but also across individuals, using a regression model with a common set of interpretable features. We also show that the features predicting aesthetic preference can emerge hierarchically within a deep convolutional neural network trained only for object recognition. Our findings suggest that human preferences for art can be explained at least in part as a systematic integration over the underlying visual features of an image.
TL;DR: The authors showed that even very sophisticated relations display key signatures of automatic visual processing, such as support, fit, cause, chase, and even socially interact, revealing surprisingly rich content in visual perception itself.
TL;DR: In this paper, the authors showed that top-down signals originating in the frontal eye fields causally shape visual cortex activity and perception through mechanisms of oscillatory phase realignment at the beta frequency.
Abstract: Voluntary allocation of visual attention is controlled by top-down signals generated within the Frontal Eye Fields (FEFs) that can change the excitability of lower-level visual areas. However, the mechanism through which this control is achieved remains elusive. Here, we emulated the generation of an attentional signal using single-pulse transcranial magnetic stimulation to activate the FEFs and tracked its consequences over the visual cortex. First, we documented changes to brain oscillations using electroencephalography and found evidence for a phase reset over occipital sites at beta frequency. We then probed for perceptual consequences of this top-down triggered phase reset and assessed its anatomical specificity. We show that FEF activation leads to cyclic modulation of visual perception and extrastriate but not primary visual cortex excitability, again at beta frequency. We conclude that top-down signals originating in FEF causally shape visual cortex activity and perception through mechanisms of oscillatory realignment. Visual attention requires top-down modulation from the frontal eye fields to change cortical excitability of visual cortex. Here, the authors show that these top-down signals shape perception through mechanisms of oscillatory phase realignment at the beta frequency.
TL;DR: This paper conducted a large-scale study of the production and visual perception of facial expressions of emotion in the wild and found that of the 16,384 possible facial configurations that people can theoretically produce, only 35 were successfully used to transmit emotive information across cultures, and only 8 within a smaller number of cultures.
Abstract: Automatic recognition of emotion from facial expressions is an intense area of research, with a potentially long list of important application. Yet, the study of emotion requires knowing which facial expressions are used within and across cultures in the wild, not in controlled lab conditions; but such studies do not exist. Which and how many cross-cultural and cultural-specific facial expressions do people commonly use? And, what affect variables does each expression communicate to observers? If we are to design technology that understands the emotion of users, we need answers to these two fundamental questions. In this paper, we present the first large-scale study of the production and visual perception of facial expressions of emotion in the wild. We find that of the 16,384 possible facial configurations that people can theoretically produce, only 35 are successfully used to transmit emotive information across cultures, and only 8 within a smaller number of cultures. Crucially, we find that visual analysis of cross-cultural expressions yields consistent perception of emotion categories and valence, but not arousal. In contrast, visual analysis of cultural-specific expressions yields consistent perception of valence and arousal, but not of emotion categories. Additionally, we find that the number of expressions used to communicate each emotion is also different, e.g., 17 expressions transmit happiness, but only 1 is used to convey disgust.
TL;DR: This article found that cortical responses to single objects were predicted by the statistical ensembles in which they typically occur, and that this link between objects and their visual contexts was made most strongly in parahippocampal cortex, overlapping with the anterior portion of scene-selective para-paraphrasing place area.
Abstract: A central regularity of visual perception is the co-occurrence of objects in the natural environment. Here we use machine learning and fMRI to test the hypothesis that object co-occurrence statistics are encoded in the human visual system and elicited by the perception of individual objects. We identified low-dimensional representations that capture the latent statistical structure of object co-occurrence in real-world scenes, and we mapped these statistical representations onto voxel-wise fMRI responses during object viewing. We found that cortical responses to single objects were predicted by the statistical ensembles in which they typically occur, and that this link between objects and their visual contexts was made most strongly in parahippocampal cortex, overlapping with the anterior portion of scene-selective parahippocampal place area. In contrast, a language-based statistical model of the co-occurrence of object names in written text predicted responses in neighboring regions of object-selective visual cortex. Together, these findings show that the sensory coding of objects in the human brain reflects the latent statistics of object context in visual and linguistic experience. When people view an object, they can often guess the setting from which it was drawn and the other objects that might be found in that setting. Here the authors identify regions of the human visual system that represent this information about which objects tend to appear together in the world.
TL;DR: A critical review of the related significant aspects is provided and an overview of existing applications of deep learning in computational visual perception is included, which shows that there is a significant improvement in the accuracy using dropout and data augmentation.
Abstract: Computational visual perception, also known as computer vision, is a field of artificial intelligence that enables computers to process digital images and videos in a similar way as biological vision does. It involves methods to be developed to replicate the capabilities of biological vision. The computer vision’s goal is to surpass the capabilities of biological vision in extracting useful information from visual data. The massive data generated today is one of the driving factors for the tremendous growth of computer vision. This survey incorporates an overview of existing applications of deep learning in computational visual perception. The survey explores various deep learning techniques adapted to solve computer vision problems using deep convolutional neural networks and deep generative adversarial networks. The pitfalls of deep learning and their solutions are briefly discussed. The solutions discussed were dropout and augmentation. The results show that there is a significant improvement in the accuracy using dropout and data augmentation. Deep convolutional neural networks’ applications, namely, image classification, localization and detection, document analysis, and speech recognition, are discussed in detail. In-depth analysis of deep generative adversarial network applications, namely, image-to-image translation, image denoising, face aging, and facial attribute editing, is done. The deep generative adversarial network is unsupervised learning, but adding a certain number of labels in practical applications can improve its generating ability. However, it is challenging to acquire many data labels, but a small number of data labels can be acquired. Therefore, combining semisupervised learning and generative adversarial networks is one of the future directions. This article surveys the recent developments in this direction and provides a critical review of the related significant aspects, investigates the current opportunities and future challenges in all the emerging domains, and discusses the current opportunities in many emerging fields such as handwriting recognition, semantic mapping, webcam-based eye trackers, lumen center detection, query-by-string word, intermittently closed and open lakes and lagoons, and landslides.
TL;DR: This paper found that responses of mouse lateral posterior nucleus (LP) neurons projecting to higher visual areas likely derive from feedforward input from primary visual cortex (V1) combined with information from many cortical and subcortical areas, including superior colliculus.
TL;DR: In this article, the authors present a checklist for comparative studies of visual reasoning in humans and machines, highlighting how to overcome potential pitfalls in design and inference and highlight the importance of aligning experimental conditions.
Abstract: With the rise of machines to human-level performance in complex recognition tasks, a growing amount of work is directed toward comparing information processing in humans and machines. These studies are an exciting chance to learn about one system by studying the other. Here, we propose ideas on how to design, conduct, and interpret experiments such that they adequately support the investigation of mechanisms when comparing human and machine perception. We demonstrate and apply these ideas through three case studies. The first case study shows how human bias can affect the interpretation of results and that several analytic tools can help to overcome this human reference point. In the second case study, we highlight the difference between necessary and sufficient mechanisms in visual reasoning tasks. Thereby, we show that contrary to previous suggestions, feedback mechanisms might not be necessary for the tasks in question. The third case study highlights the importance of aligning experimental conditions. We find that a previously observed difference in object recognition does not hold when adapting the experiment to make conditions more equitable between humans and machines. In presenting a checklist for comparative studies of visual reasoning in humans and machines, we hope to highlight how to overcome potential pitfalls in design and inference.
TL;DR: In the VR setting, the orange daylight led to warmer thermal perception in (close-to-) comfortable temperatures, resulting in a color-induced thermal perception and indicating that orange glazing should be used with caution in a slightly warm environment.
Abstract: ObjectiveTemperature–color interaction effects on subjective perception and physiological responses are investigated using a novel hybrid experimental method combining thermal and visual stimuli fr...
TL;DR: Widespread stable visual organization beyond the traditional visual system, in default-mode network and hippocampus is demonstrated, indicating that visual–spatial organization is a fundamental coding principle that structures the communication between distant brain regions.
Abstract: The human visual system is organized as a hierarchy of maps that share the topography of the retina. Known retinotopic maps have been identified using simple visual stimuli under strict fixation, conditions different from everyday vision which is active, dynamic, and complex. This means that it remains unknown how much of the brain is truly visually organized. Here I demonstrate widespread stable visual organization beyond the traditional visual system, in default-mode network and hippocampus. Detailed topographic connectivity with primary visual cortex during movie-watching, resting-state, and retinotopic-mapping experiments revealed that visual-spatial representations throughout the brain are warped by cognitive state. Specifically, traditionally visual regions alternate with default-mode network and hippocampus in preferentially representing the center of the visual field. This visual role of default-mode network and hippocampus would allow these regions to interface between abstract memories and concrete sensory impressions. Together, these results indicate that visual-spatial organization is a fundamental coding principle that structures the communication between distant brain regions.
TL;DR: In this article, the authors used high-field 7T proton Magnetic Resonance Spectroscopy (1H-MRS) to study the effect of reduced occipital GABA on visual perception and symptom severity in acute major depressive disorder.
Abstract: Major depressive disorder (MDD) is a complex state-dependent psychiatric illness for which biomarkers linking psychophysical, biochemical, and psychopathological changes remain yet elusive, though. Earlier studies demonstrate reduced GABA in lower-order occipital cortex in acute MDD leaving open its validity and significance for higher-order visual perception, though. The goal of our study is to fill that gap by combining psychophysical investigation of visual perception with measurement of GABA concentration in middle temporal visual area (hMT+) in acute depressed MDD. Psychophysically, we observe a highly specific deficit in visual surround motion suppression in a large sample of acute MDD subjects which, importantly, correlates with symptom severity. Both visual deficit and its relation to symptom severity are replicated in the smaller MDD sample that received MRS. Using high-field 7T proton Magnetic resonance spectroscopy (1H-MRS), acute MDD subjects exhibit decreased GABA concentration in visual MT+ which, unlike in healthy subjects, no longer correlates with their visual motion performance, i.e., impaired SI. In sum, our combined psychophysical-biochemical study demonstrates an important role of reduced occipital GABA for altered visual perception and psychopathological symptoms in acute MDD. Bridging the gap from the biochemical level of occipital GABA over visual-perceptual changes to psychopathological symptoms, our findings point to the importance of the occipital cortex in acute depressed MDD including its role as candidate biomarker.
TL;DR: In this article, the brain simultaneously represents multiple successive images at each time instant by multiplexing them along a neural cascade, which can be explained by a hierarchy of neural assemblies that continuously propagate multiple visual contents.
Abstract: The human brain continuously processes streams of visual input. Yet, a single image typically triggers neural responses that extend beyond 1s. To understand how the brain encodes and maintains successive images, we analyzed with electroencephalography the brain activity of human subjects while they watched ∼5000 visual stimuli presented in fast sequences. First, we confirm that each stimulus can be decoded from brain activity for ∼1s, and we demonstrate that the brain simultaneously represents multiple images at each time instant. Second, we source localize the corresponding brain responses in the expected visual hierarchy and show that distinct brain regions represent, at each time instant, different snapshots of past stimulations. Third, we propose a simple framework to further characterize the dynamical system of these traveling waves. Our results show that a chain of neural circuits, which each consist of (1) a hidden maintenance mechanism and (2) an observable update mechanism, accounts for the dynamics of macroscopic brain representations elicited by visual sequences. Together, these results detail a simple architecture explaining how successive visual events and their respective timings can be simultaneously represented in the brain.SIGNIFICANCE STATEMENT Our retinas are continuously bombarded with a rich flux of visual input. Yet, how our brain continuously processes such visual streams is a major challenge to neuroscience. Here, we developed techniques to decode and track, from human brain activity, multiple images flashed in rapid succession. Our results show that the brain simultaneously represents multiple successive images at each time instant by multiplexing them along a neural cascade. Dynamical modeling shows that these results can be explained by a hierarchy of neural assemblies that continuously propagate multiple visual contents. Overall, this study sheds new light on the biological basis of our visual experience.