Top 7465 papers published in the topic of Segmentation in 2020

Showing papers on "Segmentation published in 2020"

Proceedings Article•10.1109/ICASSP40776.2020.9053405•

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

[...]

Huimin Huang¹, Lanfen Lin¹, Ruofeng Tong¹, Hongjie Hu², Qiaowei Zhang², Yutaro Iwamoto³, Xian-Hua Han³, Yen-Wei Chen³, Jian Wu¹ - Show less +5 more•Institutions (3)

Zhejiang University¹, Sir Run Run Shaw Hospital², Ritsumeikan University³

4 May 2020

TL;DR: A novel UNet 3+ is proposed, which takes advantage of full-scale skip connections and deep supervisions, and can reduce the network parameters to improve the computation efficiency.

...read moreread less

Abstract: Recently, a growing interest has been seen in deep learning-based semantic segmentation. UNet, which is one of deep learning networks with an encoder-decoder architecture, is widely used in medical image segmentation. Combining multi-scale features is one of important factors for accurate segmentation. UNet++ was developed as a modified Unet by designing an architecture with nested and dense skip connections. However, it does not explore sufficient information from full scales and there is still a large room for improvement. In this paper, we propose a novel UNet 3+, which takes advantage of full-scale skip connections and deep supervisions. The full-scale skip connections incorporate low-level details with high-level semantics from feature maps in different scales; while the deep supervision learns hierarchical representations from the full-scale aggregated feature maps. The proposed method is especially benefiting for organs that appear at varying scales. In addition to accuracy improvements, the proposed UNet 3+ can reduce the network parameters to improve the computation efficiency. We further propose a hybrid loss function and devise a classification-guided module to enhance the organ boundary and reduce the over-segmentation in a non-organ image, yielding more accurate segmentation results. The effectiveness of the proposed method is demonstrated on two datasets. The code is available at: github.com/ZJUGiveLab/UNet-Version

...read moreread less

2,302 citations

Posted Content•

Image Segmentation Using Deep Learning: A Survey

[...]

Shervin Minaee, Yuri Boykov¹, Fatih Porikli², Antonio Plaza³, Nasser Kehtarnavaz⁴, Demetri Terzopoulos⁵ - Show less +2 more•Institutions (5)

University of Waterloo¹, Australian National University², University of Extremadura³, University of Texas at Dallas⁴, University of California, Los Angeles⁵

15 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A comprehensive review of recent pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings are provided.

...read moreread less

Abstract: Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

...read moreread less

1,915 citations

Journal Article•10.1016/J.NEUNET.2019.08.025•

MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation.

[...]

Nabil Ibtehaz¹, M. Sohel Rahman²•Institutions (2)

Samsung¹, Bangladesh University of Engineering and Technology²

01 Jan 2020-Neural Networks

TL;DR: This work develops a novel architecture, MultiResUNet, as the potential successor to the U-Net architecture, and tests and compared it with the classical U- net on a vast repertoire of multimodal medical images.

...read moreread less

1,885 citations

Journal Article•10.1016/J.ISPRSJPRS.2020.01.013•

ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data

[...]

Foivos I. Diakogiannis¹, François Waldner¹, Peter Caccetta¹, Chen Wu²•Institutions (2)

Commonwealth Scientific and Industrial Research Organisation¹, University of Western Australia²

01 Apr 2020-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: In this article, a novel deep learning architecture, ResUNet-a, is proposed for the task of semantic segmentation of monotemporal very high-resolution aerial images.

...read moreread less

Abstract: Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large within-class and small between-class variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate state of the art performance for pixel level classification of objects. Here we propose a reliable framework for performant results for the task of semantic segmentation of monotemporal very high resolution aerial images. Our framework consists of a novel deep learning architecture, ResUNet-a , and a novel loss function based on the Dice loss. ResUNet-a uses a UNet encoder/decoder backbone, in combination with residual connections, atrous convolutions, pyramid scene parsing pooling and multi-tasking inference. ResUNet-a infers sequentially the boundary of the objects, the distance transform of the segmentation mask, the segmentation mask and a colored reconstruction of the input. Each of the tasks is conditioned on the inference of the previous ones, thus establishing a conditioned relationship between the various tasks, as this is described through the architecture’s computation graph. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has excellent convergence properties and behaves well even under the presence of highly imbalanced classes. The performance of our modeling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show state-of-the-art performance with an average F1 score of 92.9% over all classes for our best model.

...read moreread less

1,165 citations

Proceedings Article•10.1109/CIBCB48159.2020.9277638•

A survey of loss functions for semantic segmentation

[...]

Shruti Jadon¹•Institutions (1)

University of Massachusetts Amherst¹

27 Oct 2020

TL;DR: A new log-cosh dice loss function is introduced and it is showcased that certain loss functions perform well across all data-sets and can be taken as a good baseline choice in unknown data distribution scenarios.

...read moreread less

Abstract: Image Segmentation has been an active field of research as it has a wide range of applications, ranging from automated disease detection to self driving cars. In the past five years, various papers came up with different objective loss functions used in different cases such as biased data, sparse segmentation, etc. In this paper, we have summarized some of the well-known loss functions widely used for Image Segmentation and listed out the cases where their usage can help in fast and better convergence of a model. Furthermore, we have also introduced a new log-cosh dice loss function and compared its performance on NBFS skull-segmentation open source data-set with widely used loss functions. We also showcased that certain loss functions perform well across all data-sets and can be taken as a good baseline choice in unknown data distribution scenarios.

...read moreread less

1,107 citations

Proceedings Article•10.1109/CVPR42600.2020.00466•

PointPainting: Sequential Fusion for 3D Object Detection

[...]

Sourabh Vora, Alex H. Lang, Bassam Helou, Oscar Beijbom

14 Jun 2020

TL;DR: PointPainting as mentioned in this paper projects lidar points into the output of an image-only semantic segmentation network and appends the class scores to each point, which can then be fed to any lidar-only method.

...read moreread less

Abstract: Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion methods on the main benchmark datasets, suggesting a gap in the literature. In this work, we propose PointPainting: a sequential fusion method to fill this gap. PointPainting works by projecting lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point. The appended (painted) point cloud can then be fed to any lidar-only method. Experiments show large improvements on three different state-of-the art methods, Point-RCNN, VoxelNet and PointPillars on the KITTI and nuScenes datasets. The painted version of PointRCNN represents a new state of the art on the KITTI leaderboard for the bird's-eye view detection task. In ablation, we study how the effects of Painting depends on the quality and format of the semantic segmentation output, and demonstrate how latency can be minimized through pipelining.

...read moreread less

1,106 citations

Journal Article•10.1109/TMI.2020.2996645•

Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images

[...]

Deng-Ping Fan, Tao Zhou, Ge-Peng Ji¹, Yi Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao² - Show less +4 more•Institutions (2)

Wuhan University¹, Zayed University²

22 May 2020-IEEE Transactions on Medical Imaging

TL;DR: Li et al. as discussed by the authors proposed a COVID-19 Lung Infection Segmentation Deep Network ( Inf-Net) to automatically identify infected regions from chest CT slices, where a parallel partial decoder is used to aggregate the high-level features and generate a global map.

...read moreread less

Abstract: Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network ( Inf-Net ) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net , a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.

...read moreread less

1,054 citations

Proceedings Article•10.1109/CVPR42600.2020.00990•

End-to-End Learning of Visual Representations From Uncurated Instructional Videos

[...]

Antoine Miech¹, Jean-Baptiste Alayrac, Lucas Smaira, Ivan Laptev¹, Josef Sivic¹, Andrew Zisserman - Show less +2 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

14 Jun 2020

TL;DR: This work proposes a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos and outperforms all published self-supervised approaches for these tasks as well as several fully supervised baselines.

...read moreread less

Abstract: Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video models still rely on manually annotated data. With the recent introduction of the HowTo100M dataset, narrated videos now offer the possibility of learning video representations without manual supervision. In this work we propose a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos. With this approach we are able to learn strong video representations from scratch, without the need for any manual annotation. We evaluate our representations on a wide range of four downstream tasks over eight datasets: action recognition (HMDB-51, UCF-101, Kinetics-700), text-to- video retrieval (YouCook2, MSR-VTT), action localization (YouTube-8M Segments, CrossTask) and action segmentation (COIN). Our method outperforms all published self-supervised approaches for these tasks as well as several fully supervised baselines.

...read moreread less

943 citations

Proceedings Article•10.1109/CVPR42600.2020.00982•

PointRend: Image Segmentation As Rendering

[...]

Alexander Kirillov¹, Yuxin Wu¹, Kaiming He¹, Ross Girshick¹•Institutions (1)

Facebook¹

14 Jun 2020

TL;DR: PointRend as discussed by the authors proposes a point-based rendering module that performs segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm, which produces crisp object boundaries in regions that are over-smoothed by previous methods.

...read moreread less

Abstract: We present a new method for efficient high-quality image segmentation of objects and scenes. By analogizing classical computer graphics methods for efficient rendering with over- and undersampling challenges faced in pixel labeling tasks, we develop a unique perspective of image segmentation as a rendering problem. From this vantage, we present the PointRend (Point-based Rendering) neural network module: a module that performs point-based segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm. PointRend can be flexibly applied to both instance and semantic segmentation tasks by building on top of existing state-of-the-art models. While many concrete implementations of the general idea are possible, we show that a simple design already achieves excellent results. Qualitatively, PointRend outputs crisp object boundaries in regions that are over-smoothed by previous methods. Quantitatively, PointRend yields significant gains on COCO and Cityscapes, for both instance and semantic segmentation. PointRend's efficiency enables output resolutions that are otherwise impractical in terms of memory or computation compared to existing approaches. Code has been made available at https://github.com/facebookresearch/detectron2/tree/master/projects/PointRend.

...read moreread less

882 citations

Posted Content•

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

[...]

Changqian Yu¹, Changqian Yu², Changxin Gao², Jingbo Wang³, Gang Yu⁴, Chunhua Shen¹, Nong Sang² - Show less +3 more•Institutions (4)

University of Adelaide¹, Huazhong University of Science and Technology², The Chinese University of Hong Kong³, Tencent⁴

05 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2), which performs favourably against a few state-of-the-art real-time semantic segmentation approaches.

...read moreread less

Abstract: The low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, which leads to a considerable accuracy decrease. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation. To this end, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context. The Semantic Branch is lightweight due to reducing the channel capacity and a fast-downsampling strategy. Furthermore, we design a Guided Aggregation Layer to enhance mutual connections and fuse both types of feature representation. Besides, a booster training strategy is designed to improve the segmentation performance without any extra inference cost. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs favourably against a few state-of-the-art real-time semantic segmentation approaches. Specifically, for a 2,048x1,024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.

...read moreread less

881 citations

Book Chapter•10.1007/978-3-030-58452-8_17•

Conditional Convolutions for Instance Segmentation

[...]

Zhi Tian¹, Chunhua Shen¹, Hao Chen¹•Institutions (1)

University of Adelaide¹

23 Aug 2020

TL;DR: A simpler instance segmentation method that can achieve improved performance in both accuracy and inference speed on the COCO dataset, and outperform a few recent methods including well-tuned Mask RCNN baselines, without longer training schedules needed.

...read moreread less

Abstract: We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation). Top-performing instance segmentation methods such as Mask R-CNN rely on ROI operations (typically ROIPool or ROIAlign) to obtain the final instance masks. In contrast, we propose to solve instance segmentation from a new perspective. Instead of using instance-wise ROIs as inputs to a network of fixed weights, we employ dynamic instance-aware networks, conditioned on instances. CondInst enjoys two advantages: (1) Instance segmentation is solved by a fully convolutional network, eliminating the need for ROI cropping and feature alignment. (2) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference. We demonstrate a simpler instance segmentation method that can achieve improved performance in both accuracy and inference speed. On the COCO dataset, we outperform a few recent methods including well-tuned Mask R-CNN baselines, without longer training schedules needed. Code is available: https://git.io/AdelaiDet.

...read moreread less

Book Chapter•10.1007/978-3-030-37734-2_37•

Kvasir-SEG: A Segmented Polyp Dataset

[...]

Debesh Jha, Pia H. Smedsrud, Michael Riegler, Pål Halvorsen, Thomas de Lange¹, Dag Johansen, Håvard D. Johansen - Show less +3 more•Institutions (1)

University of Oslo¹

5 Jan 2020

TL;DR: This paper presents Kvasir-SEG: an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist, and demonstrates the use of the dataset with a traditional segmentation approach and a modern deep-learning based Convolutional Neural Network approach.

...read moreread less

Abstract: Pixel-wise image segmentation is a highly demanding task in medical-image analysis. In practice, it is difficult to find annotated medical images with corresponding segmentation masks. In this paper, we present Kvasir-SEG: an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist. Moreover, we also generated the bounding boxes of the polyp regions with the help of segmentation masks. We demonstrate the use of our dataset with a traditional segmentation approach and a modern deep-learning based Convolutional Neural Network (CNN) approach. The dataset will be of value for researchers to reproduce results and compare methods. By adding segmentation masks to the Kvasir dataset, which only provide frame-wise annotations, we enable multimedia and computer vision researchers to contribute in the field of polyp segmentation and automatic analysis of colonoscopy images.

...read moreread less

Book Chapter•10.1007/978-3-030-59725-2_26•

PraNet: Parallel Reverse Attention Network for Polyp Segmentation

[...]

Deng-Ping Fan, Ge-Peng Ji¹, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao - Show less +3 more•Institutions (1)

Wuhan University¹

4 Oct 2020

TL;DR: Wang et al. as mentioned in this paper proposed a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images, which first aggregate the features in high-level layers using a parallel partial decoder (PPD), and then generate a global map as the initial guidance area for the following components.

...read moreread less

Abstract: Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using the reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating some misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency (\(\varvec{\sim }\)50 fps).

...read moreread less

Posted Content•

SOLOv2: Dynamic and Fast Instance Segmentation

[...]

Xinlong Wang¹, Rufeng Zhang², Tao Kong³, Lei Li, Chunhua Shen¹ - Show less +1 more•Institutions (3)

University of Adelaide¹, Tongji University², Tsinghua University³

23 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.

...read moreread less

Abstract: In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: this https URL

...read moreread less

Posted Content•

PraNet: Parallel Reverse Attention Network for Polyp Segmentation

[...]

Deng-Ping Fan, Ge-Peng Ji¹, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao - Show less +3 more•Institutions (1)

Wuhan University¹

13 Jun 2020-arXiv: Image and Video Processing

TL;DR: Quantitative and qualitative evaluations on five challenging datasets across six metrics show that the PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.

...read moreread less

Abstract: Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using a reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating any misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.00414•

FDA: Fourier Domain Adaptation for Semantic Segmentation

[...]

Yanchao Yang¹, Stefano Soatto¹•Institutions (1)

University of California, Los Angeles¹

14 Jun 2020

TL;DR: A simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other, which results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away.

...read moreread less

Abstract: We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other. We illustrate the method in semantic segmentation, where densely annotated images are aplenty in one domain (synthetic data), but difficult to obtain in another (real images). Current state-of-the-art methods are complex, some requiring adversarial optimization to render the backbone of a neural network invariant to the discrete domain selection variable. Our method does not require any training to perform the domain alignment, just a simple Fourier Transform and its inverse. Despite its simplicity, it achieves state-of-the-art performance in the current benchmarks, when integrated into a relatively standard semantic segmentation model. Our results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away.

...read moreread less

Journal Article•10.1007/S10845-019-01476-X•

Segmentation-based deep-learning approach for surface-defect detection

[...]

Domen Tabernik¹, Samo Šela, Jure Skvarč, Danijel Skočaj¹•Institutions (1)

University of Ljubljana¹

01 Mar 2020-Journal of Intelligent Manufacturing

TL;DR: A segmentation-based deep-learning architecture that is designed for the detection and segmentation of surface anomalies and is demonstrated on a specific domain of surface-crack detection.

...read moreread less

Abstract: Automated surface-anomaly detection using machine learning has become an interesting and promising area of research, with a very high and direct impact on the application domain of visual inspection. Deep-learning methods have become the most suitable approaches for this task. They allow the inspection system to learn to detect the surface anomaly by simply showing it a number of exemplar images. This paper presents a segmentation-based deep-learning architecture that is designed for the detection and segmentation of surface anomalies and is demonstrated on a specific domain of surface-crack detection. The design of the architecture enables the model to be trained using a small number of samples, which is an important requirement for practical applications. The proposed model is compared with the related deep-learning methods, including the state-of-the-art commercial software, showing that the proposed approach outperforms the related methods on the specific domain of surface-crack detection. The large number of experiments also shed light on the required precision of the annotation, the number of required training samples and on the required computational cost. Experiments are performed on a newly created dataset based on a real-world quality control case and demonstrates that the proposed approach is able to learn on a small number of defected surfaces, using only approximately 25–30 defective training samples, instead of hundreds or thousands, which is usually the case in deep-learning applications. This makes the deep-learning method practical for use in industry where the number of available defective samples is limited. The dataset is also made publicly available to encourage the development and evaluation of new methods for surface-defect detection.

...read moreread less

Proceedings Article•10.1109/CBMS49503.2020.00111•

DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation

[...]

Debesh Jha, Michael Riegler, Dag Johansen, Pål Halvorsen¹, Håvard D. Johansen - Show less +1 more•Institutions (1)

Metropolitan University¹

28 Jul 2020

TL;DR: Encouraging results show that DoubleU-Net can be used as a strong baseline for both medical image segmentation and cross-dataset evaluation testing to measure the generalizability of Deep Learning (DL) models.

...read moreread less

Abstract: Semantic image segmentation is the process of labeling each pixel of an image with its corresponding class. An encoder-decoder based approach, like U-Net and its variants, is a popular strategy for solving medical image segmentation tasks. To improve the performance of U-Net on various segmentation tasks, we propose a novel architecture called DoubleU-Net, which is a combination of two U-Net architectures stacked on top of each other. The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, we added another U-Net at the bottom. We also adopt Atrous Spatial Pyramid Pooling (ASPP) to capture contextual information within the network. We have evaluated DoubleU-Net using four medical segmentation datasets, covering various imaging modalities such as colonoscopy, dermoscopy, and microscopy. Experiments on the MICCAI 2015 segmentation challenge, the CVC-ClinicDB, the 2018 Data Science Bowl challenge, and the Lesion boundary segmentation datasets demonstrate that the DoubleU-Net outperforms U-Net and the baseline models. Moreover, DoubleU-Net produces more accurate segmentation masks, especially in the case of the CVC-ClinicDB and MICCAI 2015 segmentation challenge datasets, which have challenging images such as smaller and flat polyps. These results show the improvement over the existing U-Net model. The encouraging results, produced on various medical image segmentation datasets, show that DoubleU-Net can be used as a strong baseline for both medical image segmentation and cross-dataset evaluation testing to measure the generalizability of Deep Learning (DL) models.

...read moreread less

Posted Content•10.1101/2020.04.22.20074948•

Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Scans

[...]

Deng-Ping Fan, Tao Zhou, Ge-Peng Ji¹, Yi Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao² - Show less +4 more•Institutions (2)

Wuhan University¹, Zayed University²

22 Apr 2020-medRxiv

TL;DR: A novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT scans and outperforms most cutting-edge segmentation models and advances the state-of-the-art technology.

...read moreread less

Abstract: Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net, a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.00860•

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

[...]

Hao Chen¹, Kunyang Sun¹, Zhi Tian¹, Chunhua Shen¹, Yongming Huang², Youliang Yan³ - Show less +2 more•Institutions (3)

University of Adelaide¹, Southeast University², Huawei³

14 Jun 2020

TL;DR: The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.

...read moreread less

Abstract: Instance segmentation is one of the fundamental vision tasks. Recently, fully convolutional instance segmentation methods have drawn much attention as they are often simpler and more efficient than two-stage approaches like Mask R-CNN. To date, almost all such approaches fall behind the two-stage Mask R-CNN method in mask precision when models have similar computation complexity, leaving great room for improvement. In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity. Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches. The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference. BlendMask can be easily incorporate with the state-of-the-art one-stage detection frameworks and outperforms Mask R-CNN under the same training schedule while being faster. A light-weight version of BlendMask achieves 36.0 mAP at 27 FPS evaluated on a single 1080Ti. Because of its simplicity and efficacy, we hope that our BlendMask could serve as a simple yet strong baseline for a wide range of instance-wise prediction tasks.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01269•

Semi-Supervised Semantic Segmentation With Cross-Consistency Training

[...]

Yassine Ouali¹, Céline Hudelot¹, Myriam Tami¹•Institutions (1)

Université Paris-Saclay¹

14 Jun 2020

TL;DR: This work observes that for semantic segmentation, the low-density regions are more apparent within the hidden representations than within the inputs, and proposes cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder.

...read moreread less

Abstract: In this paper, we present a novel cross-consistency based semi-supervised approach for semantic segmentation. Consistency training has proven to be a powerful semi-supervised learning framework for leveraging unlabeled data under the cluster assumption, in which the decision boundary should lie in low-density regions. In this work, we first observe that for semantic segmentation, the low-density regions are more apparent within the hidden representations than within the inputs. We thus propose cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder. Concretely, a shared encoder and a main decoder are trained in a supervised manner using the available labeled examples. To leverage the unlabeled examples, we enforce a consistency between the main decoder predictions and those of the auxiliary decoders, taking as inputs different perturbed versions of the encoder's output, and consequently, improving the encoder's representations. The proposed method is simple and can easily be extended to use additional training signal, such as image-level labels or pixel-level labels across different domains. We perform an ablation study to tease apart the effectiveness of each component, and conduct extensive experiments to demonstrate that our method achieves state-of-the-art results in several datasets.

...read moreread less

Book Chapter•10.1007/978-3-030-58523-5_38•

SOLO: Segmenting Objects by Locations

[...]

Xinlong Wang¹, Tao Kong, Chunhua Shen¹, Yuning Jiang, Lei Li - Show less +1 more•Institutions (1)

University of Adelaide¹

23 Aug 2020

TL;DR: Adelai et al. as discussed by the authors proposed the notion of instance categories, which assigns categories to each pixel within an instance according to the instance's location and size, thus nicely converting instance segmentation into a single-shot classification-solvable problem.

...read moreread less

Abstract: We present a new, embarrassingly simple approach to instance segmentation. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the “detect-then-segment” strategy (e.g., Mask R-CNN), or predict embedding vectors first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of “instance categories”, which assigns categories to each pixel within an instance according to the instance’s location and size, thus nicely converting instance segmentation into a single-shot classification-solvable problem. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent single-shot instance segmenters in accuracy. We hope that this simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation. Code is available at https://git.io/AdelaiDet.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01229•

Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

[...]

Yude Wang¹, Jie Zhang¹, Meina Kan¹, Shiguang Shan¹, Xilin Chen¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

14 Jun 2020

TL;DR: Zhang et al. as mentioned in this paper proposed a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap between full and weak supervisions.

...read moreread less

Abstract: Image-level weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years. Most of advanced solutions exploit class activation map (CAM). However, CAMs can hardly serve as the object mask due to the gap between full and weak supervisions. In this paper, we propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation, whose pixel-level labels take the same spatial transformation as the input images during data augmentation. However, this constraint is lost on the CAMs trained by image-level supervision. Therefore, we propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning. Moreover, we propose a pixel correlation module (PCM), which exploits context appearance information and refines the prediction of current pixel by its similar neighbors, leading to further improvement on CAMs consistency. Extensive experiments on PASCAL VOC 2012 dataset demonstrate our method outperforms state-of-the-art methods using the same level of supervision. The code is released online.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01221•

PolarMask: Single Shot Instance Segmentation With Polar Representation

[...]

Enze Xie¹, Peize Sun², Xiaoge Song³, Wenhai Wang³, Xuebo Liu⁴, Ding Liang⁴, Chunhua Shen⁵, Ping Luo¹ - Show less +4 more•Institutions (5)

University of Hong Kong¹, Xi'an Jiaotong University², Nanjing University³, SenseTime⁴, University of Adelaide⁵

14 Jun 2020

TL;DR: PolarMask as discussed by the authors formulates the instance segmentation problem as predicting contour of instance through instance center classification and dense distance regression in a polar coordinate, which can be used by easily embedding it into most off-the-shelf detection methods.

...read moreread less

Abstract: In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used by easily embedding it into most off-the-shelf detection methods. Our method, termed PolarMask, formulates the instance segmentation problem as predicting contour of instance through instance center classification and dense distance regression in a polar coordinate. Moreover, we propose two effective approaches to deal with sampling high-quality center examples and optimization for dense distance regression, respectively, which can significantly improve the performance and simplify the training process. Without any bells and whistles, PolarMask achieves 32.9% in mask mAP with single-model and single-scale training/testing on the challenging COCO dataset. For the first time, we show that the complexity of instance segmentation, in terms of both design and computation complexity, can be the same as bounding box object detection and this much simpler and flexible instance segmentation framework can achieve competitive accuracy. We hope that the proposed PolarMask framework can serve as a fundamental and strong baseline for single shot instance segmentation task.

...read moreread less

Journal Article•10.1609/AAAI.V34I07.6812•

Real-Time Scene Text Detection with Differentiable Binarization

[...]

Minghui Liao¹, Zhaoyi Wan, Cong Yao, Kai Chen², Xiang Bai¹ - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Shanghai Jiao Tong University²

3 Apr 2020

TL;DR: Differentiable binarization (DB) as discussed by the authors proposes a module named Differentiable Binarization, which can adaptively set the thresholds for binarisation, which not only simplifies the post-processing but also enhances the performance of text detection.

...read moreread less

Abstract: Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB.

...read moreread less

Journal Article•10.1109/TCYB.2020.2992433•

SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation

[...]

Xiaolin Zhang¹, Yunchao Wei¹, Yi Yang¹, Thomas S. Huang²•Institutions (2)

University of Technology, Sydney¹, University of Illinois at Urbana–Champaign²

04 Jun 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This article proposes a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem, aiming at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category.

...read moreread less

Abstract: One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this article, we propose a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we first adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adopted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework that can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SG-One achieves the mIoU score of 46.3%, surpassing the baseline methods.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01392•

CenterMask: Real-Time Anchor-Free Instance Segmentation

[...]

Youngwan Lee¹, Jongyoul Park¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

14 Jun 2020

TL;DR: CenterMask as mentioned in this paper proposes a spatial attention-guided mask (SAG-Mask) branch to anchor-free one-stage object detector (FCOS) in the same vein with Mask R-CNN.

...read moreread less

Abstract: We propose a simple yet efficient anchor-free instance segmentation, called CenterMask, that adds a novel spatial attention-guided mask (SAG-Mask) branch to anchor-free one stage object detector (FCOS) in the same vein with Mask R-CNN. Plugged into the FCOS object detector, the SAG-Mask branch predicts a segmentation mask on each box with the spatial attention map that helps to focus on informative pixels and suppress noise. We also present an improved backbone networks, VoVNetV2, with two effective strategies: (1) residual connection for alleviating the optimization problem of larger VoVNet \cite{lee2019energy} and (2) effective Squeeze-Excitation (eSE) dealing with the channel information loss problem of original SE. With SAG-Mask and VoVNetV2, we deign CenterMask and CenterMask-Lite that are targeted to large and small models, respectively. Using the same ResNet-101-FPN backbone, CenterMask achieves 38.3%, surpassing all previous state-of-the-art methods while at a much faster speed. CenterMask-Lite also outperforms the state-of-the-art by large margins at over 35fps on Titan Xp. We hope that CenterMask and VoVNetV2 can serve as a solid baseline of real-time instance segmentation and backbone network for various vision tasks, respectively. The Code is available at https://github.com/youngwanLEE/CenterMask.

...read moreread less

Book Chapter•10.1007/978-3-030-58548-8_7•

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

[...]

Huiyu Wang¹, Yukun Zhu², Bradley Ray Green², Hartwig Adam², Alan L. Yuille¹, Liang-Chieh Chen² - Show less +2 more•Institutions (2)

Johns Hopkins University¹, Google²

23 Aug 2020

TL;DR: Recently, Axial-DeepLab as mentioned in this paper proposed a position-sensitive self-attention layer, a novel building block that one could stack to form axial attention models for image classification and dense prediction.

...read moreread less

Abstract: Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is \(3.8\times \) parameter-efficient and \(27\times \) computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.

...read moreread less

Journal Article•10.1016/J.COMPBIOMED.2020.104037•

Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and segmentation.

[...]

Amine Amyar¹, Amine Amyar², Romain Modzelewski², Hua Li³, Su Ruan² - Show less +1 more•Institutions (3)

GE Healthcare¹, University of Rouen², University of Illinois at Urbana–Champaign³

08 Oct 2020-Computers in Biology and Medicine

TL;DR: An automatic classification segmentation tool for helping screening COVID-19 pneumonia using chest CT imaging and shows very encouraging performance with a dice coefficient higher than 0.88 for the segmentation and an area under the ROC curve higher than 97% for the classification.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01249•

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

[...]

Bowen Cheng¹, Maxwell D. Collins¹, Yukun Zhu¹, Ting Liu¹, Thomas S. Huang², Hartwig Adam¹, Liang-Chieh Chen¹ - Show less +3 more•Institutions (2)

Google¹, University of Illinois at Urbana–Champaign²

14 Jun 2020

TL;DR: Panoptic-DeepLab as discussed by the authors adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods.

...read moreread less

Abstract: In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025x2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several top-down approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.

...read moreread less

...

Expand