Topic

Visual learning

About: Visual learning is a research topic. Over the lifetime, 2235 publications have been published within this topic receiving 59823 citations. The topic is also known as: GO:0008542.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

1 / 2

Papers

Posted Content•

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

[...]

Jeff Donahue¹, Lisa Anne Hendricks¹, Marcus Rohrbach¹, Subhashini Venugopalan², Sergio Guadarrama¹, Kate Saenko³, Trevor Darrell¹ - Show less +3 more•Institutions (3)

University of California, Berkeley¹, University of Texas at Austin², University of Massachusetts Lowell³

17 Nov 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep"' in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

5,438 citations

Proceedings Article•10.1109/CVPR.2015.7298878•

Long-term recurrent convolutional networks for visual recognition and description

[...]

Jeff Donahue¹, Lisa Anne Hendricks¹, Sergio Guadarrama¹, Marcus Rohrbach¹, Subhashini Venugopalan², Trevor Darrell¹, Kate Saenko³ - Show less +3 more•Institutions (3)

University of California, Berkeley¹, University of Texas at Austin², University of Massachusetts Lowell³

7 Jun 2015

...read moreread less

Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”. Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

5,088 citations

Proceedings Article•10.1109/CVPR.2008.4587756•

Learning realistic human actions from movies

[...]

Ivan Laptev¹, Marcin Marszalek¹, Cordelia Schmid¹, B. Rozenfeld²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Bar-Ilan University²

23 Jun 2008

TL;DR: A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset.

...read moreread less

Abstract: The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.

...read moreread less

4,182 citations

Journal Article•10.1037/0033-295X.96.4.523•

A Distributed, Developmental Model of Word Recognition and Naming.

[...]

Mark S. Seidenberg¹, James L. McClelland•Institutions (1)

McGill University¹

14 Jul 1989-Psychological Review

TL;DR: A parallel distributed processing model of visual word recognition and pronunciation is described, which consists of sets of orthographic and phonological units and an interlevel of hidden units and which early in the learning phase corresponds to that of children acquiring word recognition skills.

...read moreread less

Abstract: A parallel distributed processing model of visual word recognition and pronunciation is described. The model consists of sets of orthographic and phonological units and an interlevel of hidden units. Weights on connections between units were modified during a training phase using the back-propagation learning algorithm. The model simulates many aspects of human performance, including (a) differences between words in terms of processing difficulty, (b) pronunciation of novel items, (c) differences between readers in terms of word recognition skill, (d) transitions from beginning to skilled reading, and (e) differences in performance on lexical decision and naming tasks. The model's behavior early in the learning phase corresponds to that of children acquiring word recognition skills. Training with a smaller number of hidden units produces output characteristic of many dyslexic readers. Naming is simulated without pronunciation rules, and lexical decisions are simulated without accessing word-level representations. The performance of the model is largely determined by three factors: the nature of the input, a significant fragment of written English; the learning rule, which encodes the implicit structure of the orthography in the weights on connections; and the architecture of the system, which influences the scope of what can be learned.

...read moreread less

3,886 citations

Journal Article•10.1109/34.598227•

Probabilistic visual learning for object representation

[...]

Baback Moghaddam¹, Alex Pentland¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An unsupervised technique for visual learning is presented, which is based on density estimation in high-dimensional spaces using an eigenspace decomposition and is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and nonrigid objects.

...read moreread less

Abstract: We present an unsupervised technique for visual learning, which is based on density estimation in high-dimensional spaces using an eigenspace decomposition. Two types of density estimates are derived for modeling the training data: a multivariate Gaussian (for unimodal distributions) and a mixture-of-Gaussians model (for multimodal distributions). Those probability densities are then used to formulate a maximum-likelihood estimation framework for visual search and target detection for automatic object recognition and coding. Our learning technique is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and nonrigid objects, such as hands.

...read moreread less

1,694 citations

...

Expand

Performance Metrics

2,324

Papers

11,311

Citations

No. of papers in the topic in previous years
Year	Papers
2025	9
2024	8
2023	29
2022	38
2021	81
2020	125

Visual learning

Topic Tools

Papers published on a yearly basis

Papers

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Long-term recurrent convolutional networks for visual recognition and description

Learning realistic human actions from movies

A Distributed, Developmental Model of Word Recognition and Naming.

Probabilistic visual learning for object representation

Related Topics (5)

Performance Metrics