Objects as Extreme Points

doi:10.1007/978-3-030-89370-5_15

Open AccessProceedings Article10.1007/978-3-030-89370-5_15

Objects as Extreme Points

- 08 Nov 2021

- pp 195-208

TL;DR: Zhang et al. as discussed by the authors proposed an Extreme-Point-Prediction-Based object detector (EPP-Net), which directly regresses the relative displacement vector between each pixel and the four extreme points.

Abstract: Object detection can be regarded as a pixel clustering task, and its boundary is determined by four extreme points (leftmost, top, rightmost, and bottom). However, most studies focus on the center or corner points of the object, which are conditional results of the extreme points. In this paper, we present an Extreme-Point-Prediction-Based object detector (EPP-Net), which directly regresses the relative displacement vector between each pixel and the four extreme points. We also propose a new metric to measure the similarity between two groups of extreme points, namely, Extreme Intersection over Union (EIoU), and incorporate this EIoU as a new regression loss. Moreover, we propose a novel branch to predict the EIoU between the ground-truth and the prediction results, and take it as the localization confidence to filter out poor detection results. On the MS-COCO dataset, our method achieves an average precision (AP) of 44.0% with ResNet-50 and an AP of 50.3% with ResNeXt-101-DCN. The proposed EPP-Net provides a new method to detect objects and achieves very competitive performance among the state-of-the-art anchor-free detectors.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Fig. 3. Positive Sampling. The red points denote the positive samples. The center sampling strategy from FCOS takes the positive area as a square, whereas we dynamically adjust the sampling area according to the bbox shape.

Table 3. EIoU vs. other counterparts. EIoU-branch denotes our EIoU predictor. QFL denotes the joint representation of IoU score and classification.

Table 2. EIoU loss vs. Smooth-`1 loss. Settings are the same as the EPP-Net in Table 1. The performance of EIoU loss is much better than that of Smooth-`1 loss.

Fig. 1. Illustration of EPP-Net predictions. As shown in this image, since the boundary of an object is determined by the extreme points, the bounding box (bbox) is actually a conditional result. Therefore, EPP-Net predicts four relative displacements, an 8D vector as the location of the object.

Fig. 4. Illustration of EIoU loss. The four extreme points are taken as a convex quadrilateral composed of four vectors. To simplify the calculation, the IoU of the smallest enclosing rectangles and the cosine similarity between each paired vectors are used to measure the similarity of two groups of extreme points.

Fig. 5. Qualitative results on the val2017 split. Extreme points and bbox detection results of EPP-Net are shown on the same image. With ResNet-50, our model (The model with AP 39.5%) can achieve excellent detection results in various scenes.

References

•Journal Article•10.1109/TPAMI.2016.2577031

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 01 Jun 2017

- IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less

64.4K

•Book Chapter•10.1007/978-3-319-10602-1_48

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

- 06 Sep 2014

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

51.7K

•Proceedings Article•10.1109/CVPR.2016.91

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, +3 more

- 27 Jun 2016

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

45.7K

•Proceedings Article•10.1109/CVPR.2017.106

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin, +5 more

- 21 Jul 2017

TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

...read moreread less

29.5K

Proceedings Article•10.1109/ICCV.2017.322

Mask R-CNN

Kaiming He, +3 more

- 20 Mar 2017

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

...read moreread less

23.6K

...

Expand

Objects as Extreme Points

Chat with Paper

AI Agents for this Paper

Figures

References

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Microsoft COCO: Common Objects in Context

You Only Look Once: Unified, Real-Time Object Detection

Feature Pyramid Networks for Object Detection

Mask R-CNN

Related Papers (5)

An Improved Fast Ground Segmentation Algorithm for 3D Point Cloud

S-VoteNet: Deep Hough Voting with Spherical Proposal for 3D Object Detection

Nonparametric dominant point detection

A Non-Cooperative Satellite Feature Point Selection Method for Vision-Based Navigation System

Object Detection for Similar Appearance Objects Based on Entropy