HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection
Tao Kong,Anbang Yao,Yurong Chen,Fuchun Sun +3 more
- 27 Jun 2016
- pp 845-853
TL;DR: HyperNet as discussed by the authors is based on an elaborately designed Hyper Feature which aggregates hierarchical feature maps first and then compresses them into a uniform space, thus enabling them to construct HyperNet by sharing them both in generating proposals and detecting objects via an end to end joint training strategy.
read more
Abstract: Almost all of the current top-performing object detection networks employ region proposals to guide the search for object instances. State-of-the-art region proposal methods usually need several thousand proposals to get high recall, thus hurting the detection efficiency. Although the latest Region Proposal Network method gets promising detection accuracy with several hundred proposals, it still struggles in small-size object detection and precise localization (e.g., large IoU thresholds), mainly due to the coarseness of its feature maps. In this paper, we present a deep hierarchical network, namely HyperNet, for handling region proposal generation and object detection jointly. Our HyperNet is primarily based on an elaborately designed Hyper Feature which aggregates hierarchical feature maps first and then compresses them into a uniform space. The Hyper Features well incorporate deep but highly semantic, intermediate but really complementary, and shallow but naturally high-resolution features of the image, thus enabling us to construct HyperNet by sharing them both in generating proposals and detecting objects via an end-to-end joint training strategy. For the deep VGG16 model, our method achieves completely leading recall and state-of-the-art object detection accuracy on PASCAL VOC 2007 and 2012 using only 100 proposals per image. It runs with a speed of 5 fps (including all steps) on a GPU, thus having the potential for real-time processing.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
FFESSD: An Accurate and Efficient Single-Shot Detector for Target Detection
Wenxu Shi,Shengli Bao,Dailun Tan +2 more
TL;DR: This work can be widely used in classification and recognition of image targets, especially for the tasks that need to detect the smaller targets and meet the requirement of real-time detection.
38
A comparison of CNN-based face and head detectors for real-time video surveillance applications
Le Thanh Nguyen-Meidine,Eric Granger,Madhu Kiran,Louis-Antoine Blais-Morin +3 more
- 01 Nov 2017
TL;DR: In this article, the authors compared the accuracy and complexity of state-of-the-art CNN architectures that are suitable for face and head detection in real-time video surveillance applications.
38
Every Feature Counts: An Improved One-Stage Detector in Thermal Imagery
Yu Cao,Tong Zhou,Xinhua Zhu,Yan Su +3 more
- 01 Dec 2019
TL;DR: This work proposes an DNN-based, one-stage detector namely ThermalDet, which inherits the architecture of RefineDet and further improves it and demonstrates that ThermalDet performs better than the state-of-art methods such as MMTOD-UNIT, M MTOD-CG.
38
Embedding Visual Hierarchy With Deep Networks for Large-Scale Visual Recognition
TL;DR: By learning the tree classifier, the deep network and the visual hierarchy adaptation jointly in an end-to-end manner, the LMM algorithm can achieve higher accuracy rates on hierarchical visual recognition.
38
CenterNet++ for Object Detection
Kailiang Duan,Song Bai,Lingxi Xie,Honggang Qi,Qingming Huang,Qi Tian +5 more
TL;DR: CenterNet++ is a bottom-up object detection approach that achieves state-of-the-art performance on the MS-COCO dataset, outperforming existing bottom-up detectors and achieving comparable performance to top-down approaches.
38
References
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
51.9K
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon,Santosh K. Divvala,Ross Girshick,Ali Farhadi +3 more
- 27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.