Objects as Extreme Points
Yang Yang,Min Li,Bo Meng,Zihao Huang,Junxing Ren,Degang Sun +5 more
- 08 Nov 2021
- pp 195-208
TL;DR: Zhang et al. as discussed by the authors proposed an Extreme-Point-Prediction-Based object detector (EPP-Net), which directly regresses the relative displacement vector between each pixel and the four extreme points.
read more
Abstract: Object detection can be regarded as a pixel clustering task, and its boundary is determined by four extreme points (leftmost, top, rightmost, and bottom). However, most studies focus on the center or corner points of the object, which are conditional results of the extreme points. In this paper, we present an Extreme-Point-Prediction-Based object detector (EPP-Net), which directly regresses the relative displacement vector between each pixel and the four extreme points. We also propose a new metric to measure the similarity between two groups of extreme points, namely, Extreme Intersection over Union (EIoU), and incorporate this EIoU as a new regression loss. Moreover, we propose a novel branch to predict the EIoU between the ground-truth and the prediction results, and take it as the localization confidence to filter out poor detection results. On the MS-COCO dataset, our method achieves an average precision (AP) of 44.0% with ResNet-50 and an AP of 50.3% with ResNeXt-101-DCN. The proposed EPP-Net provides a new method to detect objects and achieves very competitive performance among the state-of-the-art anchor-free detectors.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 3. Positive Sampling. The red points denote the positive samples. The center sampling strategy from FCOS takes the positive area as a square, whereas we dynamically adjust the sampling area according to the bbox shape. 
Table 3. EIoU vs. other counterparts. EIoU-branch denotes our EIoU predictor. QFL denotes the joint representation of IoU score and classification. 
Table 2. EIoU loss vs. Smooth-`1 loss. Settings are the same as the EPP-Net in Table 1. The performance of EIoU loss is much better than that of Smooth-`1 loss. 
Fig. 1. Illustration of EPP-Net predictions. As shown in this image, since the boundary of an object is determined by the extreme points, the bounding box (bbox) is actually a conditional result. Therefore, EPP-Net predicts four relative displacements, an 8D vector as the location of the object. 
Fig. 4. Illustration of EIoU loss. The four extreme points are taken as a convex quadrilateral composed of four vectors. To simplify the calculation, the IoU of the smallest enclosing rectangles and the cosine similarity between each paired vectors are used to measure the similarity of two groups of extreme points. 
Fig. 5. Qualitative results on the val2017 split. Extreme points and bbox detection results of EPP-Net are shown on the same image. With ResNet-50, our model (The model with AP 39.5%) can achieve excellent detection results in various scenes.
References
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
- 06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon,Santosh K. Divvala,Ross Girshick,Ali Farhadi +3 more
- 27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin,Piotr Dollár,Ross Girshick,Kaiming He,Bharath Hariharan,Serge Belongie +5 more
- 21 Jul 2017
TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Mask R-CNN
Kaiming He,Georgia Gkioxari,Piotr Dollár,Ross Girshick +3 more
- 20 Mar 2017
TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
Related Papers (5)
Zhixin Leng,Shu Li,Xin Li,Bingzhao Gao +3 more
- 01 Aug 2020
Nirwan Ansari,Kuo Wei Huang +1 more
- 01 Nov 1991