Aggregated Residual Transformations for Deep Neural Networks
Saining Xie,Ross Girshick,Piotr Dollár,Zhuowen Tu,Kaiming He +4 more
- 21 Jul 2017
- pp 5987-5995
TL;DR: ResNeXt as discussed by the authors is a simple, highly modularized network architecture for image classification, which is constructed by repeating a building block that aggregates a set of transformations with the same topology.
read more
Abstract: We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call cardinality (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An intelligent driven deep residual learning framework for brain tumor classification using MRI images
TL;DR: In this article , the authors proposed an optimization-based deep convolutional ResNet model combined with a novel evolutionary algorithm to optimize the architecture and hyperparameters of deep ResNet without need of human experts as well manual architecture design.
93
ELECTRICITY: An Efficient Multi-Camera Vehicle Tracking System for Intelligent City
Yijun Qian,Lijun Yu,Wenhe Liu,Alexander G. Hauptmann +3 more
- 14 Jun 2020
TL;DR: The proposed ELECTRICITY, an efficient multi-camera vehicle tracking system with aggregation loss and fast multi-target cross-camera tracking strategy, wins the first place in the City-Scale Multi-Camera Vehicle Tracking of AI City 2020 Challenge.
•Posted Content
CAT: Cross Attention in Vision Transformer
TL;DR: Cross Attention Transformer (CAT) as mentioned in this paper proposes a new attention mechanism in Transformer, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information.
92
Learning Multi-Object Tracking and Segmentation From Automatic Annotations
Lorenzo Porzi,Markus Hofinger,Idoia Ruiz,Joan Serrat,Samuel Rota Bulò,Peter Kontschieder +5 more
- 14 Jun 2020
TL;DR: The proposed track mining algorithm turns raw street-level videos into high-fidelity MOTS training data, is scalable and overcomes the need of expensive and time-consuming manual annotation approaches, and leverages state-of-the-art instance segmentation results in combination with optical flow predictions.
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
TL;DR: This paper provides a comprehensive survey on how to efficiently execute sparse and irregular tensor computations of ML models on hardware accelerators and categorizes different hardware designs and acceleration techniques.
92
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
- 07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Gao Huang,Zhuang Liu,Laurens van der Maaten,Kilian Q. Weinberger +3 more
- 21 Jul 2017
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014