BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Open AccessPosted Content

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

- 05 Apr 2020

- arXiv: Computer Vision and Pattern Recog...

880

TL;DR: This work proposes an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2), which performs favourably against a few state-of-the-art real-time semantic segmentation approaches.

Abstract: The low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, which leads to a considerable accuracy decrease. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation. To this end, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context. The Semantic Branch is lightweight due to reducing the channel capacity and a fast-downsampling strategy. Furthermore, we design a Guided Aggregation Layer to enhance mutual connections and fuse both types of feature representation. Besides, a booster training strategy is designed to improve the segmentation performance without any extra inference cost. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs favourably against a few state-of-the-art real-time semantic segmentation approaches. Specifically, for a 2,048x1,024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1080/17538947.2023.2301675

Extracting urban impervious surface based on optical and SAR images cross-modal multi-scale features fusion network

Songjing Guo, +4 more

- 10 Jan 2024

- International Journal of Digital Earth

TL;DR: This study proposes CMFFNet, a cross-modal multi-scale features fusion network, to extract urban impervious surface from optical and SAR images, leveraging complementary information and multi-scale characteristics for improved accuracy and outperforming current mainstream methods.

...read moreread less

3

Journal Article•10.48550/arXiv.2212.13764

Representation Separation for Semantic Segmentation with Vision Transformers

Yuanduo Hong, +4 more

- 28 Dec 2022

- arXiv.org

TL;DR: Zhang et al. as mentioned in this paper presented an efficient framework of representation separation in local patch level and global region level for semantic segmentation with vision transformers, which is targeted for the peculiar over-smoothness of ViTs.

...read moreread less

3

Journal Article•10.1109/tvcg.2023.3235364

Identity-Aware and Shape-Aware Propagation of Face Editing in Videos.

Yue-Ren Jiang, +3 more

- 09 Jan 2023

- IEEE Transactions on Visualization and C...

TL;DR: The authors disentangle the StyleGAN2 latent vectors of human face video frames to decouple the appearance, shape, expression, and motion from identity in order to reduce the difficulties of maintaining the identity and keeping the original 3D motion.

...read moreread less

3

Journal Article•10.1016/j.jksuci.2024.101929

RailSegVITNet: A lightweight VIT-based real-time track surface segmentation network for improving railroad safety

Zhichao Chen, +2 more

- 01 Jan 2024

- Journal of King Saud University - Comput...

TL;DR: This study proposes RailSegVITNet, a lightweight deep learning model for real-time track surface segmentation, achieving 91.43% MIoU on Railway-seg dataset with 2.01 GFLOPs and 1.4M parameters, outperforming popular models on Railsem19 dataset.

...read moreread less

3

Journal Article•10.1109/tcsvt.2023.3325360

BSSNet:A Real-Time Semantic Segmentation Network for Road Scenes Inspired from AutoEncoder

Guangjie Han

TL;DR: BSSNet is a novel real-time semantic segmentation network for road scenes that efficiently extracts spatial and border information using an AutoEncoder structure.

...read moreread less

3

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Book Chapter•10.1007/978-3-319-24574-4_28

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 05 Oct 2015

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

92K

...

Expand

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Chat with Paper

AI Agents for this Paper

Citations

Extracting urban impervious surface based on optical and SAR images cross-modal multi-scale features fusion network

Representation Separation for Semantic Segmentation with Vision Transformers

Identity-Aware and Shape-Aware Propagation of Face Editing in Videos.

RailSegVITNet: A lightweight VIT-based real-time track surface segmentation network for improving railroad safety

BSSNet:A Real-Time Semantic Segmentation Network for Road Scenes Inspired from AutoEncoder

References

Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet classification with deep convolutional neural networks

U-Net: Convolutional Networks for Biomedical Image Segmentation

Related Papers (5)

Fully convolutional networks for semantic segmentation

Pyramid Scene Parsing Network

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Deep Residual Learning for Image Recognition

The Cityscapes Dataset for Semantic Urban Scene Understanding