Proceedings Article10.1109/ICIP40778.2020.9190675
DCM: A Dense-Attention Context Module For Semantic Segmentation
Li Shenghua,Quan Zhou,Jia Liu,Wang Jie,Yawen Fan,Xiaofu Wu,Longin Jan Latecki +6 more
- 01 Oct 2020
- pp 1431-1435
3
TL;DR: A new attention-augmented module named Dense-attention Context Module (DCM) is presented, which is used to connect the common backbones and the other decoding heads, which shows the promising results of this method on Cityscapes dataset.
read more
Abstract: For image semantic segmentation, a fully convolutional network is usually employed as the encoder to abstract visual features of the input image. A meticulously designed decoder is used to decoding the final feature map of the backbone. The output resolution of backbones which are designed for image classification task is too low to match segmentation task. Most existing methods for obtaining the final high-resolution feature map can not fully utilize the information of different layers of the backbone. To adequately extract the information of a single layer, the multi-scale context information of different layers, and the global information of backbone, we present a new attention-augmented module named Dense-attention Context Module (DCM), which is used to connect the common backbones and the other decoding heads. The experiments show the promising results of our method on Cityscapes dataset.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multi-Encoder Context Aggregation Network for Structured and Unstructured Urban Street Scene Analysis
TL;DR: In this paper , a multi-encoder Context Aggregation Network (MCANet) is proposed for real-time semantic scene segmentation, which offers the best combination of low model complexity and state-of-the-art performance on benchmark datasets.
Multi-encoder Context Aggregation Network for Structured and Unstructured Urban Street Scene Analysis
01 Jan 2023
TL;DR: In this paper , a multi-encoder Context Aggregation Network (MCANet) is proposed for real-time semantic scene segmentation, which offers the best combination of low model complexity and state-of-the-art performance on benchmark datasets.
Mixture lightweight transformer for scene understanding
TL;DR: Wang et al. as mentioned in this paper proposed a mixture lightweight Transformer backbone for image understanding, where each Transformer block, called SH-Transformer, adopts Single-Head Self-Attention (SHSA) and Convolutional Inception Module (CIM).
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
- 07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser,Illia Polosukhin +7 more
- 01 Jan 2017
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
51.8K
Related Papers (5)
Guo Min,Ding Xiao,Ma Miao,Chen Yuli,Pei Zhao +4 more
- 03 May 2019
Wenrui Liu,Zongqing Lu,He Xu +2 more
- 23 Apr 2020