HOGgles: Visualizing Object Detection Features
Carl Vondrick,Aditya Khosla,Tomasz Malisiewicz,Antonio Torralba +3 more
- 01 Dec 2013
- pp 1-8
TL;DR: Algorithms to visualize feature spaces used by object detectors allow a human to put on 'HOG goggles' and perceive the visual world as a HOG based object detector sees it, and allow us to analyze object detection systems in new ways and gain new insight into the detector's failures.
read more
Abstract: We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on 'HOG goggles' and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and indicates that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures
![Figure 1: An image from PASCAL and a high scoring car detection from DPM [8]. Why did the detector fail?](/figures/figure-1-an-image-from-pascal-and-a-high-scoring-car-3ot397rg.png)
Figure 1: An image from PASCAL and a high scoring car detection from DPM [8]. Why did the detector fail? 
Figure 2: We show the crop for the false car detection from Figure 1. On the right, we show our visualization of the HOG features for the same patch. Our visualization reveals that this false alarm actually looks like a car in HOG space. 
Table 1: We evaluate the performance of our inversion algorithm by comparing the inverse to the ground truth image using the mean normalized cross correlation. Higher is better; a score of 1 is perfect. See supplemental for full table. ![Table 2: We evaluate visualization performance across twenty PASCAL VOC categories by asking MTurk workers to classify our inversions. Numbers are percent classified correctly; higher is better. Chance is 0.05. Glyph refers to the standard black-and-white HOG diagram popularized by [3]. Paired dictionary learning provides the best visualizations for humans. Expert refers to MIT PhD students in computer vision performing the same visualization challenge with HOG glyphs. See supplemental for full table.](/figures/table-2-we-evaluate-visualization-performance-across-twenty-39bmr4l6.png)
Table 2: We evaluate visualization performance across twenty PASCAL VOC categories by asking MTurk workers to classify our inversions. Numbers are percent classified correctly; higher is better. Chance is 0.05. Glyph refers to the standard black-and-white HOG diagram popularized by [3]. Paired dictionary learning provides the best visualizations for humans. Expert refers to MIT PhD students in computer vision performing the same visualization challenge with HOG glyphs. See supplemental for full table. 
Figure 13: HOG inversion reveals the world that object detectors see. The left shows a man standing in a dark room. If we compute HOG on this image and invert it, the previously dark scene behind the man emerges. Notice the wall structure, the lamp post, and the chair in the bottom right hand corner. 
Figure 4: In this paper, we present algorithms to visualize HOG features. Our visualizations are perceptually intuitive for humans to understand.
Citations
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Ross Girshick,Jeff Donahue,Trevor Darrell,Jitendra Malik +3 more
- 23 Jun 2014
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
Ramprasaath R. Selvaraju,Michael Cogswell,Abhishek Das,Ramakrishna Vedantam,Devi Parikh,Dhruv Batra +5 more
- 01 Oct 2017
TL;DR: This work combines existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and applies it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures.
14.7K
•Posted Content
Rich feature hierarchies for accurate object detection and semantic segmentation
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
13.1K
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Justin Johnson,Alexandre Alahi,Li Fei-Fei +2 more
- 08 Oct 2016
TL;DR: In this paper, the authors combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image style transfer, where a feedforward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
•Posted Content
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
TL;DR: This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
8.3K
References
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
- 20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
The Pascal Visual Object Classes (VOC) Challenge
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Object recognition from local scale-invariant features
David G. Lowe
- 20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Object Detection with Discriminatively Trained Part-Based Models
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
The Pascal Visual Object Classes Challenge: A Retrospective
TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.