Top 4201 papers published in the topic of Segmentation in 2017

Showing papers on "Segmentation published in 2017"

Proceedings Article•10.1109/CVPR.2017.16•

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

[...]

R. Qi Charles¹, Hao Su¹, Mo Kaichun¹, Leonidas J. Guibas¹•Institutions (1)

21 Jul 2017

TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

...read moreread less

Abstract: Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.

...read moreread less

15,726 citations

Journal Article•10.1109/TPAMI.2016.2572683•

Fully Convolutional Networks for Semantic Segmentation

[...]

Evan Shelhamer¹, Jonathan Long¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

01 Apr 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Fully convolutional networks (FCN) as mentioned in this paper were proposed to combine semantic information from a deep, coarse layer with appearance information from shallow, fine layer to produce accurate and detailed segmentations.

...read moreread less

Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional networks achieve improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image.

...read moreread less

10,676 citations

Proceedings Article•10.1109/CVPR.2017.549•

RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation

[...]

Guosheng Lin¹, Anton Milan², Chunhua Shen², Ian Reid²•Institutions (2)

Nanyang Technological University¹, University of Adelaide²

21 Jul 2017

TL;DR: RefineNet is presented, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections and introduces chained residual pooling, which captures rich background context in an efficient manner.

...read moreread less

Abstract: Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments and set new state-of-the-art results on seven public datasets. In particular, we achieve an intersection-over-union score of 83.4 on the challenging PASCAL VOC 2012 dataset, which is the best reported result to date.

...read moreread less

3,373 citations

Journal Article•10.1109/TPAMI.2016.2646371•

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[...]

Baoguang Shi¹, Xiang Bai¹, Cong Yao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Abstract: Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

...read moreread less

3,210 citations

Journal Article•10.1093/BIOINFORMATICS/BTX180•

Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification.

[...]

Ignacio Arganda-Carreras¹, Ignacio Arganda-Carreras², Verena Kaynig³, Curtis Rueden⁴, Kevin W. Eliceiri⁴, Johannes Schindelin⁴, Albert Cardona⁵, H. Sebastian Seung⁶ - Show less +4 more•Institutions (6)

Donostia International Physics Center¹, Ikerbasque², Harvard University³, University of Wisconsin-Madison⁴, Howard Hughes Medical Institute⁵, Princeton University⁶

01 Aug 2017-Bioinformatics

TL;DR: The Trainable Weka Segmentation (TWS), a machine learning tool that leverages a limited number of manual annotations in order to train a classifier and segment the remaining data automatically, is introduced.

...read moreread less

Abstract: Summary State-of-the-art light and electron microscopes are capable of acquiring large image datasets, but quantitatively evaluating the data often involves manually annotating structures of interest. This process is time-consuming and often a major bottleneck in the evaluation pipeline. To overcome this problem, we have introduced the Trainable Weka Segmentation (TWS), a machine learning tool that leverages a limited number of manual annotations in order to train a classifier and segment the remaining data automatically. In addition, TWS can provide unsupervised segmentation learning schemes (clustering) and can be customized to employ user-designed image features or classifiers. Availability and implementation TWS is distributed as open-source software as part of the Fiji image processing distribution of ImageJ at http://imagej.net/Trainable_Weka_Segmentation . Contact ignacio.arganda@ehu.eus. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

2,077 citations

Book Chapter•10.1007/978-3-319-67558-9_28•

Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations

[...]

Carole H. Sudre¹, Carole H. Sudre², Wenqi Li², Tom Vercauteren², Sebastien Ourselin¹, Sebastien Ourselin², M. Jorge Cardoso², M. Jorge Cardoso¹ - Show less +4 more•Institutions (2)

UCL Institute of Neurology¹, University College London²

14 Sep 2017

TL;DR: In this paper, the authors investigate the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks.

...read moreread less

Abstract: Deep-learning has proved in recent years to be a powerful tool for image analysis and is now widely used to segment both 2D and 3D medical images. Deep-learning segmentation frameworks rely not only on the choice of network architecture but also on the choice of loss function. When the segmentation process targets rare observations, a severe class imbalance is likely to occur between candidate labels, thus resulting in sub-optimal performance. In order to mitigate this issue, strategies such as the weighted cross-entropy function, the sensitivity function or the Dice loss function, have been proposed. In this work, we investigate the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks. We also propose to use the class re-balancing properties of the Generalized Dice overlap, a known metric for segmentation assessment, as a robust and accurate deep-learning loss function for unbalanced tasks.

...read moreread less

2,055 citations

Proceedings Article•10.1109/VCIP.2017.8305148•

LinkNet: Exploiting encoder representations for efficient semantic segmentation

[...]

Abhishek Chaurasia¹, Eugenio Culurciello¹•Institutions (1)

Purdue University¹

14 Jun 2017

TL;DR: In this paper, the authors proposed a novel deep neural network architecture which allows it to learn without any significant increase in number of parameters and achieves state-of-the-art performance on CamVid and Cityscapes dataset.

...read moreread less

Abstract: Pixel-wise semantic segmentation for visual scene understanding not only needs to be accurate, but also efficient in order to find any use in real-time application. Existing algorithms even though are accurate but they do not focus on utilizing the parameters of neural network efficiently. As a result they are huge in terms of parameters and number of operations; hence slow too. In this paper, we propose a novel deep neural network architecture which allows it to learn without any significant increase in number of parameters. Our network uses only 11.5 million parameters and 21.2 GFLOPs for processing an image of resolution 3 × 640 × 360. It gives state-of-the-art performance on CamVid and comparable results on Cityscapes dataset. We also compare our networks processing time on NVIDIA GPU and embedded system device with existing state-of-the-art architectures for different image resolutions.

...read moreread less

1,833 citations

Proceedings Article•10.1109/CVPRW.2017.156•

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

[...]

Simon Jégou, Michal Drozdzal¹, David Vazquez, Adriana Romero, Yoshua Bengio - Show less +1 more•Institutions (1)

École Polytechnique de Montréal¹

21 Jul 2017

TL;DR: In this article, the authors extend DenseNets to semantic segmentation and achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining.

...read moreread less

Abstract: State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.,,,,,, Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train.,,,,,, In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.

...read moreread less

1,686 citations

Proceedings Article•10.1109/CVPR.2017.189•

Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network

[...]

Chao Peng¹, Xiangyu Zhang, Gang Yu, Guiming Luo¹, Jian Sun - Show less +1 more•Institutions (1)

Tsinghua University¹

21 Jul 2017

TL;DR: This work proposes a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation and suggests a residual-based boundary refinement to further refine the object boundaries.

...read moreread less

Abstract: One of recent trends [31, 32, 14] in network architecture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more efficient than a large kernel, given the same computational complexity. However, in the field of semantic segmentation, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the classification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the object boundaries. Our approach achieves state-of-art performance on two public benchmarks and significantly outperforms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

...read moreread less

1,625 citations

Posted Content•

A Review on Deep Learning Techniques Applied to Semantic Segmentation.

[...]

Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez - Show less +1 more

22 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A review on deep learning methods for semantic segmentation applied to various application areas as well as mandatory background concepts to help researchers decide which are the ones that best suit their needs and their targets.

...read moreread less

Abstract: Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.

...read moreread less

1,448 citations

Book Chapter•10.1007/978-3-319-67558-9_28•

Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations

[...]

Carole H. Sudre¹, Carole H. Sudre², Wenqi Li², Tom Vercauteren², Sebastien Ourselin², Sebastien Ourselin¹, M. Jorge Cardoso¹, M. Jorge Cardoso² - Show less +4 more•Institutions (2)

UCL Institute of Neurology¹, University College London²

11 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work investigates the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks and proposes to use the class re-balancing properties of the Generalized Dice overlap as a robust and accurate deep-learning loss function for unbalanced tasks.

...read moreread less

Proceedings Article•10.1109/CVPR.2017.472•

Fully Convolutional Instance-Aware Semantic Segmentation

[...]

Yi Li¹, Haozhi Qi¹, Jifeng Dai¹, Xiangyang Ji², Yichen Wei¹ - Show less +1 more•Institutions (2)

Microsoft¹, Tsinghua University²

21 Jul 2017

TL;DR: The first fully convolutional end-to-end solution for instance-aware semantic segmentation task, which achieves state-of-the-art performance in both accuracy and efficiency, wins the COCO 2016 segmentation competition by a large margin.

...read moreread less

Abstract: We present the first fully convolutional end-to-end solution for instance-aware semantic segmentation task. It inherits all the merits of FCNs for semantic segmentation [29] and instance mask proposal [5]. It performs instance mask prediction and classification jointly. The underlying convolutional representation is fully shared between the two sub-tasks, as well as between all regions of interest. The network architecture is highly integrated and efficient. It achieves state-of-the-art performance in both accuracy and efficiency. It wins the COCO 2016 segmentation competition by a large margin. Code would be released at https://github.com/daijifeng001/TA-FCN.

...read moreread less

Proceedings Article•10.1109/ICCV.2017.487•

Structure-Measure: A New Way to Evaluate Foreground Maps

[...]

Deng-Ping Fan¹, Ming-Ming Cheng¹, Yun Liu¹, Tao Li¹, Ali Borji² - Show less +1 more•Institutions (2)

Nankai University¹, University of Central Florida²

1 Oct 2017

TL;DR: In this paper, the structural similarity measure (Structure-measure) is proposed to evaluate non-binary foreground maps, which simultaneously evaluates region-aware and object-aware structural similarity between a saliency map and a ground-truth map.

...read moreread less

Abstract: Foreground map evaluation is crucial for gauging the progress of object segmentation algorithms, in particular in the field of salient object detection where the purpose is to accurately detect and segment the most salient object in a scene. Several widely-used measures such as Area Under the Curve (AUC), Average Precision (AP) and the recently proposed F W/B (Fbw) have been used to evaluate the similarity between a non-binary saliency map (SM) and a ground-truth (GT) map. These measures are based on pixel-wise errors and often ignore the structural similarities. Behavioral vision studies, however, have shown that the human visual system is highly sensitive to structures in scenes. Here, we propose a novel, efficient, and easy to calculate measure known as structural similarity measure (Structure-measure) to evaluate non-binary foreground maps. Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a SM and a GT map. We demonstrate superiority of our measure over existing ones using 5 meta-measures on 5 benchmark datasets.

...read moreread less

Posted Content•

The 2017 DAVIS Challenge on Video Object Segmentation

[...]

Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alexander Sorkine-Hornung, Luc Van Gool - Show less +2 more

03 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The scope of the benchmark, the main characteristics of the dataset, the evaluation metrics of the competition, and a detailed analysis of the results of the participants to the challenge are described.

...read moreread less

Abstract: We present the 2017 DAVIS Challenge on Video Object Segmentation, a public dataset, benchmark, and competition specifically designed for the task of video object segmentation. Following the footsteps of other successful initiatives, such as ILSVRC and PASCAL VOC, which established the avenue of research in the fields of scene classification and semantic segmentation, the DAVIS Challenge comprises a dataset, an evaluation methodology, and a public competition with a dedicated workshop co-located with CVPR 2017. The DAVIS Challenge follows up on the recent publication of DAVIS (Densely-Annotated VIdeo Segmentation), which has fostered the development of several novel state-of-the-art video object segmentation techniques. In this paper we describe the scope of the benchmark, highlight the main characteristics of the dataset, define the evaluation metrics of the competition, and present a detailed analysis of the results of the participants to the challenge.

...read moreread less

Posted Content•

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

[...]

Chao Peng¹, Xiangyu Zhang, Gang Yu, Guiming Luo¹, Jian Sun - Show less +1 more•Institutions (1)

Tsinghua University¹

08 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a Global Convolutional Network (GCN) is proposed to address both the classification and localization issues for the semantic segmentation, which achieves state-of-the-art performance on two public benchmarks.

...read moreread less

Abstract: One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity. However, in the field of semantic segmenta- tion, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the clas- sification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the ob- ject boundaries. Our approach achieves state-of-art perfor- mance on two public benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

...read moreread less

Posted Content•

ICNet for Real-Time Semantic Segmentation on High-Resolution Images.

[...]

Hengshuang Zhao¹, Xiaojuan Qi¹, Xiaoyong Shen², Jianping Shi³, Jiaya Jia¹ - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, Tencent², SenseTime³

27 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: An image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address the challenging task of real-time semantic segmentation is proposed and in-depth analysis of the framework is provided.

...read moreread less

Abstract: We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff.

...read moreread less

Journal Article•10.1109/TMI.2017.2677499•

A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology

[...]

Neeraj Kumar¹, Ruchika Verma¹, Sanuj Sharma¹, Surabhi Bhargava¹, Abhishek Vahadane¹, Amit Sethi¹ - Show less +2 more•Institutions (1)

Indian Institute of Technology Guwahati¹

06 Mar 2017-IEEE Transactions on Medical Imaging

TL;DR: A large publicly accessible data set of hematoxylin and eosin (H&E)-stained tissue images with more than 21000 painstakingly annotated nuclear boundaries is introduced, whose quality was validated by a medical doctor.

...read moreread less

Abstract: Nuclear segmentation in digital microscopic tissue images can enable extraction of high-quality features for nuclear morphometrics and other analysis in computational pathology. Conventional image processing techniques, such as Otsu thresholding and watershed segmentation, do not work effectively on challenging cases, such as chromatin-sparse and crowded nuclei. In contrast, machine learning-based segmentation can generalize across various nuclear appearances. However, training machine learning algorithms requires data sets of images, in which a vast number of nuclei have been annotated. Publicly accessible and annotated data sets, along with widely agreed upon metrics to compare techniques, have catalyzed tremendous innovation and progress on other image classification problems, particularly in object recognition. Inspired by their success, we introduce a large publicly accessible data set of hematoxylin and eosin (H&E)-stained tissue images with more than 21000 painstakingly annotated nuclear boundaries, whose quality was validated by a medical doctor. Because our data set is taken from multiple hospitals and includes a diversity of nuclear appearances from several patients, disease states, and organs, techniques trained on it are likely to generalize well and work right out-of-the-box on other H&E-stained images. We also propose a new metric to evaluate nuclear segmentation results that penalizes object- and pixel-level errors in a unified manner, unlike previous metrics that penalize only one type of error. We also propose a segmentation technique based on deep learning that lays a special emphasis on identifying the nuclear boundaries, including those between the touching or overlapping nuclei, and works well on a diverse set of test images.

...read moreread less

Proceedings Article•10.1109/ICCV.2017.153•

Adversarial Examples for Semantic Segmentation and Object Detection

[...]

Cihang Xie¹, Jianyu Wang², Zhishuai Zhang¹, Yuyin Zhou¹, Lingxi Xie¹, Alan L. Yuille¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, Baidu²

1 Oct 2017

TL;DR: Zhang et al. as discussed by the authors proposed Dense Adversary Generation (DAG), which applies to the state-of-the-art networks for segmentation and detection, and found that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks.

...read moreread less

Abstract: It has been well demonstrated that adversarial examples, i.e., natural images with visually imperceptible perturbations added, cause deep networks to fail on image classification. In this paper, we extend adversarial examples to semantic segmentation and object detection which are much more difficult. Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e.g., the target is a pixel or a receptive field in segmentation, and an object proposal in detection). This inspires us to optimize a loss function over a set of targets for generating adversarial perturbations. Based on this, we propose a novel algorithm named Dense Adversary Generation (DAG), which applies to the state-of-the-art networks for segmentation and detection. We find that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks. In particular, the transfer ability across networks with the same architecture is more significant than in other cases. Besides, we show that summing up heterogeneous perturbations often leads to better transfer performance, which provides an effective method of black-box adversarial attack.

...read moreread less

Book Chapter•10.1007/978-3-319-67389-9_44•

Tversky loss function for image segmentation using 3D fully convolutional deep networks

[...]

Seyed Sadegh Mohseni Salehi¹, Seyed Sadegh Mohseni Salehi², Deniz Erdogmus², Ali Gholipour¹•Institutions (2)

Boston Children's Hospital¹, Northeastern University²

10 Sep 2017

TL;DR: A generalized loss function based on the Tversky index is proposed to address the issue of data imbalance and achieve much better trade-off between precision and recall in training 3D fully convolutional deep neural networks.

...read moreread less

Abstract: Fully convolutional deep neural networks carry out excellent potential for fast and accurate image segmentation. One of the main challenges in training these networks is data imbalance, which is particularly problematic in medical imaging applications such as lesion segmentation where the number of lesion voxels is often much lower than the number of non-lesion voxels. Training with unbalanced data can lead to predictions that are severely biased towards high precision but low recall (sensitivity), which is undesired especially in medical applications where false negatives are much less tolerable than false positives. Several methods have been proposed to deal with this problem including balanced sampling, two step training, sample re-weighting, and similarity loss functions. In this paper, we propose a generalized loss function based on the Tversky index to address the issue of data imbalance and achieve much better trade-off between precision and recall in training 3D fully convolutional deep neural networks. Experimental results in multiple sclerosis lesion segmentation on magnetic resonance images show improved \(F_2\) score, Dice coefficient, and the area under the precision-recall curve in test data. Based on these results we suggest Tversky loss function as a generalized framework to effectively train deep neural networks.

...read moreread less

Proceedings Article•10.1109/CVPR.2017.565•

One-Shot Video Object Segmentation

[...]

Sergi Caelles¹, Kevis-Kokitsi Maninis¹, Jordi Pont-Tuset¹, Laura Leal-Taixé², Daniel Cremers², L. Van Gool¹ - Show less +2 more•Institutions (2)

ETH Zurich¹, Technische Universität München²

1 Jan 2017

TL;DR: One-shot video object segmentation (OSVOS) as mentioned in this paper is based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence.

...read moreread less

Abstract: This paper tackles the task of semi-supervised video object segmentation, i.e., the separation of an object from the background in a video, given the mask of the first frame. We present One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot). Although all frames are processed independently, the results are temporally coherent and stable. We perform experiments on two annotated video segmentation databases, which show that OSVOS is fast and improves the state of the art by a significant margin (79.8% vs 68.0%).

...read moreread less

Proceedings Article•10.1109/CVPR.2017.687•

Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

[...]

Yunchao Wei¹, Jiashi Feng¹, Xiaodan Liang², Ming-Ming Cheng, Yao Zhao³, Shuicheng Yan¹ - Show less +2 more•Institutions (3)

National University of Singapore¹, Carnegie Mellon University², Beijing Jiaotong University³

1 Jul 2017

TL;DR: This work investigates a principle way to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems and proposes a new adversarial erasing approach for localizing and expanding object regions progressively.

...read moreread less

Abstract: We investigate a principle way to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems. Classification networks are only responsive to small and sparse discriminative regions from the object of interest, which deviates from the requirement of the segmentation task that needs to localize dense, interior and integral regions for pixel-wise inference. To mitigate this gap, we propose a new adversarial erasing approach for localizing and expanding object regions progressively. Starting with a single small object region, our proposed approach drives the classification network to sequentially discover new and complement object regions by erasing the current mined regions in an adversarial manner. These localized regions eventually constitute a dense and complete object region for learning semantic segmentation. To further enhance the quality of the discovered regions by adversarial erasing, an online prohibitive segmentation learning approach is developed to collaborate with adversarial erasing by providing auxiliary segmentation supervision modulated by the more reliable classification scores. Despite its apparent simplicity, the proposed approach achieves 55.0% and 55.7% mean Intersection-over-Union (mIoU) scores on PASCAL VOC 2012 val and test sets, which are the new state-of-the-arts.

...read moreread less

Proceedings Article•10.1109/CVPR.2017.181•

Simple Does It: Weakly Supervised Instance and Semantic Segmentation

[...]

Anna Khoreva¹, Rodrigo Benenson¹, Jan Hosang¹, Matthias Hein², Bernt Schiele¹ - Show less +1 more•Institutions (2)

Max Planck Society¹, Saarland University²

1 Jul 2017

TL;DR: The authors proposed a weak supervision approach that does not require modification of the segmentation training procedure, and showed that when carefully designing the input labels from given bounding boxes, even a single round of training is enough to improve over previously reported weakly supervised results.

...read moreread less

Abstract: Semantic labelling and instance segmentation are two tasks that require particularly costly annotations. Starting from weak supervision in the form of bounding box detection annotations, we propose a new approach that does not require modification of the segmentation training procedure. We show that when carefully designing the input labels from given bounding boxes, even a single round of training is enough to improve over previously reported weakly supervised results. Overall, our weak supervision approach reaches ~95% of the quality of the fully supervised model, both for semantic labelling and instance segmentation.

...read moreread less

Proceedings Article•10.5244/C.31.167•

One-Shot Learning for Semantic Segmentation

[...]

Amirreza Shaban¹, Shray Bansal¹, Zhen Liu¹, Irfan Essa¹, Byron Boots¹ - Show less +1 more•Institutions (1)

Georgia Institute of Technology¹

11 Sep 2017

TL;DR: In this paper, a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN) to perform dense pixel-level prediction on a test image for the new semantic class.

...read moreread less

Abstract: Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.

...read moreread less

Journal Article•10.1109/TMI.2016.2548501•

Automatic segmentation of MR brain images with a convolutional neural network

[...]

Pim Moeskops¹, Max A. Viergever¹, Adriënne M. Mendrik¹, Linda S. de Vries¹, Manon J. N. L. Benders¹, Ivana Išgum¹ - Show less +2 more•Institutions (1)

Utrecht University¹

11 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a convolutional neural network (CNN) was used for segmentation of MR brain images into a number of tissue classes using a single anatomical MR image only.

...read moreread less

Abstract: Automatic segmentation in MR brain images is important for quantitative analysis in large-scale studies with images acquired at all ages. This paper presents a method for the automatic segmentation of MR brain images into a number of tissue classes using a convolutional neural network. To ensure that the method obtains accurate segmentation details as well as spatial consistency, the network uses multiple patch sizes and multiple convolution kernel sizes to acquire multi-scale information about each voxel. The method is not dependent on explicit features, but learns to recognise the information that is important for the classification based on training data. The method requires a single anatomical MR image only. The segmentation method is applied to five different data sets: coronal T2-weighted images of preterm infants acquired at 30 weeks postmenstrual age (PMA) and 40 weeks PMA, axial T2- weighted images of preterm infants acquired at 40 weeks PMA, axial T1-weighted images of ageing adults acquired at an average age of 70 years, and T1-weighted images of young adults acquired at an average age of 23 years. The method obtained the following average Dice coefficients over all segmented tissue classes for each data set, respectively: 0.87, 0.82, 0.84, 0.86 and 0.91. The results demonstrate that the method obtains accurate segmentations in all five sets, and hence demonstrates its robustness to differences in age and acquisition protocol.

...read moreread less

Journal Article•10.1016/J.NEUROIMAGE.2017.04.041•

VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images

[...]

Hao Chen¹, Qi Dou¹, Lequan Yu¹, Jing Qin², Pheng-Ann Heng¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Hong Kong Polytechnic University²

23 Apr 2017-NeuroImage

TL;DR: An auto‐context version of the VoxResNet is proposed by combining the low‐level image appearance features, implicit shape information, and high‐level context together for further improving the segmentation performance, and achieved the best performance in the 2013 MICCAI MRBrainS challenge.

...read moreread less

Journal Article•10.1109/MSP.2017.2739299•

Convolutional Neural Networks for Inverse Problems in Imaging: A Review

[...]

Michael T. McCann¹, Kyong Hwan Jin¹, Michael Unser¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

09 Nov 2017-IEEE Signal Processing Magazine

TL;DR: Recent experimental work in convolutional neural networks to solve inverse problems in imaging, with a focus on the critical design decisions is reviewed, including sparsity-based techniques such as compressed sensing.

...read moreread less

Abstract: In this article, we review recent uses of convolutional neural networks (CNNs) to solve inverse problems in imaging. It has recently become feasible to train deep CNNs on large databases of images, and they have shown outstanding performance on object classification and segmentation tasks. Motivated by these successes, researchers have begun to apply CNNs to the resolution of inverse problems such as denoising, deconvolution, superresolution, and medical image reconstruction, and they have started to report improvements over state-of-the-art methods, including sparsity-based techniques such as compressed sensing. Here, we review the recent experimental work in these areas, with a focus on the critical design decisions.

...read moreread less

Book Chapter•10.1007/978-3-319-60964-5_44•

Automatic Brain Tumor Detection and Segmentation Using U-Net Based Fully Convolutional Networks

[...]

Hao Dong¹, Guang Yang², Fangde Liu¹, Yuanhan Mo¹, Yike Guo¹ - Show less +1 more•Institutions (2)

Imperial College London¹, St George's, University of London²

11 Jul 2017

TL;DR: Wang et al. as discussed by the authors proposed a fully automatic method for brain tumor segmentation, which was developed using U-Net based deep convolutional networks, and evaluated on Multimodal Brain Tumor Image Segmentation (BRATS 2015) datasets, which contain 220 high-grade brain tumor and 54 low-grade tumor cases.

...read moreread less

Abstract: A major challenge in brain tumor treatment planning and quantitative evaluation is determination of the tumor extent. The noninvasive magnetic resonance imaging (MRI) technique has emerged as a front-line diagnostic tool for brain tumors without ionizing radiation. Manual segmentation of brain tumor extent from 3D MRI volumes is a very time-consuming task and the performance is highly relied on operator’s experience. In this context, a reliable fully automatic segmentation method for the brain tumor segmentation is necessary for an efficient measurement of the tumor extent. In this study, we propose a fully automatic method for brain tumor segmentation, which is developed using U-Net based deep convolutional networks. Our method was evaluated on Multimodal Brain Tumor Image Segmentation (BRATS 2015) datasets, which contain 220 high-grade brain tumor and 54 low-grade tumor cases. Cross-validation has shown that our method can obtain promising segmentation efficiently.

...read moreread less

Posted Content•

Towards Automatic Learning of Procedures from Web Instructional Videos

[...]

Luowei Zhou¹, Chenliang Xu², Jason J. Corso¹•Institutions (2)

University of Michigan¹, University of Rochester²

28 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A segment-level recurrent network is proposed for generating procedure segments by modeling the dependencies across segments and it is shown that the proposed model outperforms competitive baselines in procedure segmentation.

...read moreread less

Abstract: The potential for agents, whether embodied or software, to learn by observing other agents performing procedures involving objects and actions is rich. Current research on automatic procedure learning heavily relies on action labels or video subtitles, even during the evaluation phase, which makes them infeasible in real-world scenarios. This leads to our question: can the human-consensus structure of a procedure be learned from a large set of long, unconstrained videos (e.g., instructional videos from YouTube) with only visual evidence? To answer this question, we introduce the problem of procedure segmentation--to segment a video procedure into category-independent procedure segments. Given that no large-scale dataset is available for this problem, we collect a large-scale procedure segmentation dataset with procedure segments temporally localized and described; we use cooking videos and name the dataset YouCook2. We propose a segment-level recurrent network for generating procedure segments by modeling the dependencies across segments. The generated segments can be used as pre-processing for other tasks, such as dense video captioning and event parsing. We show in our experiments that the proposed model outperforms competitive baselines in procedure segmentation.

...read moreread less

Proceedings Article•10.17863/CAM.56535•

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

[...]

Alex Kendall¹, Vijay Badrinarayanan¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

1 Jan 2017

TL;DR: Bayesian SegNet as discussed by the authors uses Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels, which improves segmentation performance by 2-3% across a number of datasets and architectures.

...read moreread less

Abstract: © 2017. The copyright of this document resides with its authors. We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixel-wise class labels with a measure of model uncertainty using Bayesian deep learning. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of datasets and architectures such as SegNet, FCN, Dilation Network and DenseNet.

...read moreread less

Journal Article•10.1109/TPAMI.2016.2636150•

STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation

[...]

Yunchao Wei¹, Xiaodan Liang, Yunpeng Chen², Xiaohui Shen³, Ming-Ming Cheng⁴, Jiashi Feng², Yao Zhao¹, Shuicheng Yan² - Show less +4 more•Institutions (4)

Beijing Jiaotong University¹, National University of Singapore², Adobe Systems³, Nankai University⁴

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation, which demonstrates the superiority of the proposed STC framework compared with other state-of-the-arts frameworks.

...read moreread less

Abstract: Recently, significant improvement has been made on semantic object segmentation due to the development of deep convolutional neural networks (DCNNs). Training such a DCNN usually relies on a large number of images with pixel-level segmentation masks, and annotating these images is very costly in terms of both finance and human effort. In this paper, we propose a simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation. Specifically, we first train an initial segmentation network called Initial-DCNN with the saliency maps of simple images (i.e., those with a single category of major object(s) and clean background). These saliency maps can be automatically obtained by existing bottom-up salient object detection techniques, where no supervision information is needed. Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations. Finally, more pixel-level segmentation masks of complex images (two or more categories of objects with cluttered background), which are inferred by using Enhanced-DCNN and image-level annotations, are utilized as the supervision information to learn the Powerful-DCNN for semantic segmentation. Our method utilizes 40K simple images from Flickr.com and 10K complex images from PASCAL VOC for step-wisely boosting the segmentation network. Extensive experimental results on PASCAL VOC 2012 segmentation benchmark well demonstrate the superiority of the proposed STC framework compared with other state-of-the-arts.

...read moreread less

...

Expand