Precise Single-stage Detector

doi:10.48550/arXiv.2210.04252

Journal Article10.48550/arXiv.2210.04252

Precise Single-stage Detector

Aisha Chandio, +7 more

- 09 Oct 2022

- arXiv.org

- Vol. abs/2210.04252

12

TL;DR: A modified version of Single Shot Multibox Detector (SSD) with improved features and a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm.

Abstract: Background and objectives: Deep learning (DL) logarithms have shown an impressive performance in various tasks. Among them, Single-stage object detectors (SSD) mainly depends on classification network to extract features, multiple feature maps to predict, and classification confidence to guide the filtration of the overlapping prediction boxes. However, there are still two problems causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confidence and predicted detection position cannot accurately indicate the position of the prediction boxes. Methods: In order to address these aforementioned issues, we propose a new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to SSD. Secondly, we construct a simple and effective feature enhancement module to expand the receptive field step by step for each layer and enhance its local and semantic information. Finally, we design a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes, and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically, with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object detection models. Besides, the proposed model performs significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove that the proposed model has a better trade-off between speed and accuracy.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1007/s10462-023-10426-2

ME-CCNN: Multi-encoded images and a cascade convolutional neural network for breast tumor segmentation and recognition

Ramin Ranjbarzadeh, +6 more

- 18 Feb 2023

- Artificial Intelligence Review

TL;DR: Several encoding approaches are first proposed to achieve an effective breast cancer recognition system as well as create new images from the input image to analyze the input texture more effectively without using a deep CNN model.

...read moreread less

49

Journal Article•10.48550/arXiv.2301.02830

Advanced Data Augmentation Approaches: A Comprehensive Survey and Future directions

Teerath Kumar

- arXiv.org

TL;DR: In this article , the authors provide an overview of data augmentation, present a novel and comprehensive taxonomy of the reviewed techniques and discuss their strengths and limitations, and provide comprehensive results of the impact of these techniques on three popular computer vision tasks: image classification, object detection, and semantic segmentation.

...read moreread less

23

•Posted Content•10.48550/arxiv.2301.02830

Image Data Augmentation Approaches: A Comprehensive Survey and Future directions

07 Jan 2023

TL;DR: A taxonomy of advanced data augmentation techniques can be found in this article , where the authors provide a background of the existing techniques, a comprehensive taxonomy and a comprehensive analysis of the effect of each technique on different tasks.

...read moreread less

13

Journal Article•10.1109/ICETECC56662.2022.10069052

Advanced Audio Aid for Blind People

Savera Sarwar, +5 more

- 17 Nov 2022

TL;DR: In this article , a real-time object detection and reading system for blind persons is presented. But, the system is not suitable for audio and cannot read printed text and cannot identify the objects in the way of blind persons.

...read moreread less

8

Journal Article•10.48550/arxiv.2309.04762

AudRandAug: Random Image Augmentations for Audio Classification

Teerath Kumar, +4 more

- 09 Sep 2023

- arXiv.org

TL;DR: AudRandAug is introduced, an adaptation of RandAug for audio data augmentation, which converts audio into an image-like pattern and outperforms other existingData augmentation methods regarding accuracy performance.

...read moreread less

5

References

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

Proceedings Article•10.1109/CVPR.2009.5206848

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

- 20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

75.9K

•Journal Article•10.1109/TPAMI.2016.2577031

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 01 Jun 2017

- IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less

64.4K

...

Expand