TL;DR: A method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel, alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information.
Abstract: Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. We report results using multiple postprocessing methods to produce the final labeling. Among those, we propose a technique to automatically retrieve, from a pool of segmentation components, an optimal set of components that best explain the scene; these components are arbitrary, for example, they can be taken from a segmentation tree or from any family of oversegmentations. The system yields record accuracies on the SIFT Flow dataset (33 classes) and the Barcelona dataset (170 classes) and near-record accuracy on Stanford background dataset (eight classes), while being an order of magnitude faster than competing approaches, producing a 320×240 image labeling in less than a second, including feature extraction.
TL;DR: A new solution for the label fusion problem in which weighted voting is formulated in terms of minimizing the total expectation of labeling error and in which pairwise dependency between atlases is explicitly modeled as the joint probability of two atlas making a segmentation error at a voxel is proposed.
Abstract: Multi-atlas segmentation is an effective approach for automatically labeling objects of interest in biomedical images. In this approach, multiple expert-segmented example images, called atlases, are registered to a target image, and deformed atlas segmentations are combined using label fusion. Among the proposed label fusion strategies, weighted voting with spatially varying weight distributions derived from atlas-target intensity similarity have been particularly successful. However, one limitation of these strategies is that the weights are computed independently for each atlas, without taking into account the fact that different atlases may produce similar label errors. To address this limitation, we propose a new solution for the label fusion problem in which weighted voting is formulated in terms of minimizing the total expectation of labeling error and in which pairwise dependency between atlases is explicitly modeled as the joint probability of two atlases making a segmentation error at a voxel. This probability is approximated using intensity similarity between a pair of atlases and the target image in the neighborhood of each voxel. We validate our method in two medical image segmentation problems: hippocampus segmentation and hippocampus subfield segmentation in magnetic resonance (MR) images. For both problems, we show consistent and significant improvement over label fusion strategies that assign atlas weights independently.
TL;DR: This work proposes algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information and shows how this contextual information in turn improves object recognition.
Abstract: We address the problems of contour detection, bottom-up grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies super pixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art.
TL;DR: This method is fast, fully automatic, and makes minimal assumptions about the video, which enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations.
Abstract: We present a technique for separating foreground objects from the background in a video. Our method is fast, fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-the-art background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on object proposals [14, 16, 27], while being orders of magnitude faster.
TL;DR: Given the advantages of magnetic resonance imaging over other diagnostic imaging, this survey is focused on MRI brain tumor segmentation, and semiautomatic and fully automatic techniques are emphasized.
TL;DR: A novel system for voxel classification integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D image, respectively, which performs better than a state-of-the-art method using 3D multi-scale features.
Abstract: Segmentation of anatomical structures in medical images is often based on a voxel/pixel classification approach. Deep learning systems, such as convolutional neural networks (CNNs), can infer a hierarchical representation of images that fosters categorization. We propose a novel system for voxel classification integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D image, respectively. We applied our method to the segmentation of tibial cartilage in low field knee MRI scans and tested it on 114 unseen scans. Although our method uses only 2D features at a single scale, it performs better than a state-of-the-art method using 3D multi-scale features. In the latter approach, the features and the classifier have been carefully adapted to the problem at hand. That we were able to get better results by a deep learning architecture that autonomously learns the features from the images is the main insight of this study.
TL;DR: An unsupervised video segmentation approach by simultaneously tracking multiple holistic figure-ground segments that outperforms state-of-the-art approaches in the dataset, showing its efficiency and robustness to challenges in different video sequences.
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figure-ground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing high-order statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, showing its efficiency and robustness to challenges in different video sequences.
TL;DR: This work argues that a per-image score instead of one computed over the entire dataset brings a lot more insight, and proposes new ways to evaluate semantic segmentation.
Abstract: In this work, we consider the evaluation of the semantic segmentation task. We discuss the strengths and limitations of the few existing measures, and propose new ways to evaluate semantic segmentation. First, we argue that a per-image score instead of one computed over the entire dataset brings a lot more insight. Second, we propose to take contours more carefully into account. Based on the conducted experiments, we suggest best practices for the evaluation. Finally, we present a user study we conducted to better understand how the quality of image segmentations is perceived by humans.
TL;DR: A method to reduce calculation time, achieve high accuracy, and increase sensitivity compared to the original Frangi method is presented and a new high resolution fundus database is proposed to compare it to the state-of-the-art algorithms.
Abstract: One of the most common modalities to examine the human eye is the eye-fundus photograph. The evaluation of fundus photographs is carried out by medical experts during time-consuming visual inspection. Our aim is to accelerate this process using computer aided diagnosis. As a first step, it is necessary to segment structures in the images for tissue differentiation. As the eye is the only organ, where the vasculature can be imaged in an in vivo and noninterventional way without using expensive scanners, the vessel tree is one of the most interesting and important structures to analyze. The quality and resolution of fundus images are rapidly increasing. Thus, segmentation methods need to be adapted to the new challenges of high resolutions. In this paper, we present a method to reduce calculation time, achieve high accuracy, and increase sensitivity compared to the original Frangi method. This method contains approaches to avoid potential problems like specular reflexes of thick vessels. The proposed method is evaluated using the STARE and DRIVE databases and we propose a new high resolution fundus database to compare it to the state-of-the-art algorithms. The results show an average accuracy above 94% and low computational needs. This outperforms state-of-the-art methods.
TL;DR: The proposed method for automatically extracting blood vessels from colour retinal images is based on the fact that by changing the length of a basic line detector, line detectors at varying scales are achieved and it produces accurate segmentation on central reflex vessels while keeping close vessels well separated.
TL;DR: This work improves a state of the art method for estimating human joint locations from videos and incorporates additional segmentation cues and temporal constraints to select the ``best'' one, which is able to localize body joints more accurately than existing methods.
Abstract: We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the ``best'' one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations of body parts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.
TL;DR: This survey examines methods that have been proposed to segment 3D point clouds into multiple homogeneous regions and outlines the promising future research directions.
Abstract: 3D point cloud segmentation is the process of classifying point clouds into multiple homogeneous regions, the points in the same region will have the same properties. The segmentation is challenging because of high redundancy, uneven sampling density, and lack explicit structure of point cloud data. This problem has many applications in robotics such as intelligent vehicles, autonomous mapping and navigation. Many authors have introduced different approaches and algorithms. In this survey, we examine methods that have been proposed to segment 3D point clouds. The advantages, disadvantages, and design mechanisms of these methods are analyzed and discussed. Finally, we outline the promising future research directions.
TL;DR: This work addresses multi-class segmentation of indoor scenes with RGB-D inputs by applying a multiscale convolutional network to learn features directly from the images and the depth information.
Abstract: This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such as an FPGA.
TL;DR: This survey, which aims to provide a comprehensive state-of-the-art review of the field, also addresses several challenges associated with these systems and applications.
Abstract: This review article surveys extensively the current progresses made toward video-based human activity recognition Three aspects for human activity recognition are addressed including core technology, human activity recognition systems, and applications from low-level to high-level representation In the core technology, three critical processing stages are thoroughly discussed mainly: human object segmentation, feature extraction and representation, activity detection and classification algorithms In the human activity recognition systems, three main types are mentioned, including single person activity recognition, multiple people interaction and crowd behavior, and abnormal activity recognition Finally the domains of applications are discussed in detail, specifically, on surveillance environments, entertainment environments and healthcare systems Our survey, which aims to provide a comprehensive state-of-the-art review of the field, also addresses several challenges associated with these systems and applications Moreover, in this survey, various applications are discussed in great detail, specifically, a survey on the applications in healthcare monitoring systems
TL;DR: A systematic survey of graph theoretical methods for image segmentation, where the problem is modeled in terms of partitioning a graph into several sub-graphs such that each of them represents a meaningful object of interest in the image.
TL;DR: Experiments based on Kapur's entropy indicate that the ABC algorithm can be efficiently used in multilevel thresholding, and CPU time results show that the algorithms are scalable and that the running times of the algorithms seem to grow at a linear rate as the problem size increases.
Abstract: Segmentation is a critical task in image processing. Bi-level segmentation involves dividing the whole image into partitions based on a threshold value, whereas multilevel segmentation involves multiple threshold values. A successful segmentation assigns proper threshold values to optimise a criterion such as entropy or between-class variance. High computational cost and inefficiency of an exhaustive search for the optimal thresholds leads to the use of global search heuristics to set the optimal thresholds. An emerging area in global heuristics is swarm-intelligence, which models the collective behaviour of the organisms. In this paper, two successful swarm-intelligence-based global optimisation algorithms, particle swarm optimisation (PSO) and artificial bee colony (ABC), have been employed to find the optimal multilevel thresholds. Kapur's entropy, one of the maximum entropy techniques, and between-class variance have been investigated as fitness functions. Experiments have been performed on test images using various numbers of thresholds. The results were assessed using statistical tools and suggest that Otsu's technique, PSO and ABC show equal performance when the number of thresholds is two, while the ABC algorithm performs better than PSO and Otsu's technique when the number of thresholds is greater than two. Experiments based on Kapur's entropy indicate that the ABC algorithm can be efficiently used in multilevel thresholding. Moreover, segmentation methods are required to have a minimum running time in addition to high performance. Therefore, the CPU times of ABC and PSO have been investigated to check their validity in real-time. The CPU time results show that the algorithms are scalable and that the running times of the algorithms seem to grow at a linear rate as the problem size increases.
TL;DR: An automated nuclei segmentation method that works with hematoxylin and eosin stained breast cancer histopathology images, which represent regions of whole digital slides, is developed.
Abstract: The introduction of fast digital slide scanners that provide whole slide images has led to a revival of interest in image analysis applications in pathology. Segmentation of cells and nuclei is an important first step towards automatic analysis of digitized microscopy images. We therefore developed an automated nuclei segmentation method that works with hematoxylin and eosin (H&E) stained breast cancer histopathology images, which represent regions of whole digital slides. The procedure can be divided into four main steps: 1) pre-processing with color unmixing and morphological operators, 2) marker-controlled watershed segmentation at multiple scales and with different markers, 3) post-processing for rejection of false regions and 4) merging of the results from multiple scales. The procedure was developed on a set of 21 breast cancer cases (subset A) and tested on a separate validation set of 18 cases (subset B). The evaluation was done in terms of both detection accuracy (sensitivity and positive predictive value) and segmentation accuracy (Dice coefficient). The mean estimated sensitivity for subset A was 0.875 (±0.092) and for subset B 0.853 (±0.077). The mean estimated positive predictive value was 0.904 (±0.075) and 0.886 (±0.069) for subsets A and B, respectively. For both subsets, the distribution of the Dice coefficients had a high peak around 0.9, with the vast majority of segmentations having values larger than 0.8.
TL;DR: This article demonstrates how the multi‐atlas approach can be extended to work with input atlases that are unique and extremely time consuming to construct by generating a library of multiple automatically generated templates of different brains (MAGeT Brain), and demonstrates the efficacy of the method for the mouse and human.
Abstract: Classically, model-based segmentation procedures match magnetic resonance imaging (MRI) volumes to an expertly labeled atlas using nonlinear registration. The accuracy of these techniques are limited due to atlas biases, misregistration, and resampling error. Multi-atlas-based approaches are used as a remedy and involve matching each subject to a number of manually labeled templates. This approach yields numerous independent segmentations that are fused using a voxel-by-voxel label-voting procedure. In this article, we demonstrate how the multi-atlas approach can be extended to work with input atlases that are unique and extremely time consuming to construct by generating a library of multiple automatically generated templates of different brains (MAGeT Brain). We demonstrate the efficacy of our method for the mouse and human using two different nonlinear registration algorithms (ANIMAL and ANTs). The input atlases consist a high-resolution mouse brain atlas and an atlas of the human basal ganglia and thalamus derived from serial histological data. MAGeT Brain segmentation improves the identification of the mouse anterior commissure (mean Dice Kappa values (κ = 0.801), but may be encountering a ceiling effect for hippocampal segmentations. Applying MAGeT Brain to human subcortical structures improves segmentation accuracy for all structures compared to regular model-based techniques (κ = 0.845, 0.752, and 0.861 for the striatum, globus pallidus, and thalamus, respectively). Experiments performed with three manually derived input templates suggest that MAGeT Brain can approach or exceed the accuracy of multi-atlas label-fusion segmentation (κ = 0.894, 0.815, and 0.895 for the striatum, globus pallidus, and thalamus, respectively).
TL;DR: In this article, a multiscale convolutional network is applied to learn features directly from the images and the depth information, achieving state-of-the-art performance on the NYU-v2 depth dataset with an accuracy of 64.5%.
Abstract: This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such as an FPGA.
TL;DR: Comparison with other state-of-the art brain tumor segmentation works with publicly available low-grade glioma BRATS2012 dataset show that the segmentation results are more consistent and on the average outperforms these methods for the patients where ground truth is made available.
Abstract: A stochastic model for characterizing tumor texture in brain magnetic resonance (MR) images is proposed. The efficacy of the model is demonstrated in patient-independent brain tumor texture feature extraction and tumor segmentation in magnetic resonance images (MRIs). Due to complex appearance in MRI, brain tumor texture is formulated using a multiresolution-fractal model known as multifractional Brownian motion (mBm). Detailed mathematical derivation for mBm model and corresponding novel algorithm to extract spatially varying multifractal features are proposed. A multifractal feature-based brain tumor segmentation method is developed next. To evaluate efficacy, tumor segmentation performance using proposed multifractal feature is compared with that using Gabor-like multiscale texton feature. Furthermore, novel patient-independent tumor segmentation scheme is proposed by extending the well-known AdaBoost algorithm. The modification of AdaBoost algorithm involves assigning weights to component classifiers based on their ability to classify difficult samples and confidence in such classification. Experimental results for 14 patients with over 300 MRIs show the efficacy of the proposed technique in automatic segmentation of tumors in brain MRIs. Finally, comparison with other state-of-the art brain tumor segmentation works with publicly available low-grade glioma BRATS2012 dataset show that our segmentation results are more consistent and on the average outperforms these methods for the patients where ground truth is made available.
TL;DR: A novel document image binarization technique that addresses issues ofSegmentation of text from badly degraded document images by using adaptive image contrast, a combination of the local image contrast and theLocal image gradient that is tolerant to text and background variation caused by different types of document degradations.
Abstract: Segmentation of text from badly degraded document images is a very challenging task due to the high inter/intra-variation between the document background and the foreground text of different document images. In this paper, we propose a novel document image binarization technique that addresses these issues by using adaptive image contrast. The adaptive image contrast is a combination of the local image contrast and the local image gradient that is tolerant to text and background variation caused by different types of document degradations. In the proposed technique, an adaptive contrast map is first constructed for an input degraded document image. The contrast map is then binarized and combined with Canny's edge map to identify the text stroke edge pixels. The document text is further segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. The proposed method is simple, robust, and involves minimum parameter tuning. It has been tested on three public datasets that are used in the recent document image binarization contest (DIBCO) 2009 & 2011 and handwritten-DIBCO 2010 and achieves accuracies of 93.5%, 87.8%, and 92.03%, respectively, that are significantly higher than or close to that of the best-performing methods reported in the three contests. Experiments on the Bickley diary dataset that consists of several challenging bad quality document images also show the superior performance of our proposed method, compared with other techniques.
TL;DR: A general, fully-automated method for multi-organ segmentation of abdominal computed tomography (CT) scans based on a hierarchical atlas registration and weighting scheme that generates target specific priors from an atlas database by combining aspects from multi-atlasRegistration and patch-based segmentation, two widely used methods in brain segmentation.
Abstract: A robust automated segmentation of abdominal organs can be crucial for computer aided diagnosis and laparoscopic surgery assistance. Many existing methods are specialized to the segmentation of individual organs and struggle to deal with the variability of the shape and position of abdominal organs. We present a general, fully-automated method for multi-organ segmentation of abdominal computed tomography (CT) scans. The method is based on a hierarchical atlas registration and weighting scheme that generates target specific priors from an atlas database by combining aspects from multi-atlas registration and patch-based segmentation, two widely used methods in brain segmentation. The final segmentation is obtained by applying an automatically learned intensity model in a graph-cuts optimization step, incorporating high-level spatial knowledge. The proposed approach allows to deal with high inter-subject variation while being flexible enough to be applied to different organs. We have evaluated the segmentation on a database of 150 manually segmented CT images. The achieved results compare well to state-of-the-art methods, that are usually tailored to more specific questions, with Dice overlap values of 94%, 93%, 70%, and 92% for liver, kidneys, pancreas, and spleen, respectively.
TL;DR: The model builds a model of the base-level category that can be fitted to images, producing high-quality foreground segmentation and mid-level part localizations, and improves the categorization accuracy over the state-of-the-art.
Abstract: We propose a new method for the task of fine-grained visual categorization The method builds a model of the base-level category that can be fitted to images, producing high-quality foreground segmentation and mid-level part localizations The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (eg bird) in each image Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (eg part layout) Our model builds on top of the part-based object category detector of Felzenszwalb et al, and also on the powerful Grab Cut segmentation algorithm of Rother et al, and adds a simple spatial saliency coupling between them In our evaluation, the model improves the categorization accuracy over the state-of-the-art It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently
TL;DR: In this article, a new index for the hydromorphological assessment of Italian rivers has been developed for the EU Water Framework Directive requirements, but its use can be extended to other applications in river management.
TL;DR: It is argued that image segmentation and dense 3D reconstruction contribute valuable information to each other's task and a rigorous mathematical framework is proposed to formulate and solve a joint segmentations and dense reconstruction problem.
Abstract: Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being 'too noisy'. Unfortunately, these priors generally yield overly smooth reconstructions and/or segmentations in certain regions whereas they fail in other areas to constrain the solution sufficiently. In this paper we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other's task. As a consequence, we propose a rigorous mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. Image segmentations provide geometric cues about which surface orientations are more likely to appear at a certain location in space whereas a dense 3D reconstruction yields a suitable regularization for the segmentation problem by lifting the labeling from 2D images to 3D space. We show how appearance-based cues and 3D surface orientation priors can be learned from training data and subsequently used for class-specific regularization. Experimental results on several real data sets highlight the advantages of our joint formulation.
TL;DR: A new local ranking strategy for template selection based on the locally normalised cross correlation (LNCC) and an extension to the classical STAPLE algorithm by Warfield et al. (2004) are proposed, which is referred to as STEPS for Similarity and Truth Estimation for Propagated Segmentations, which obtains more accurate segmentations even when using only a third of the templates, reducing the dependence on large template databases.
TL;DR: It is shown that combining this with a state-of-the-art classification algorithm leads to significant improvements in performance especially for datasets which are considered particularly hard for recognition, e.g. birds species.
Abstract: We propose a detection and segmentation algorithm for the purposes of fine-grained recognition. The algorithm first detects low-level regions that could potentially belong to the object and then performs a full-object segmentation through propagation. Apart from segmenting the object, we can also `zoom in' on the object, i.e. center it, normalize it for scale, and thus discount the effects of the background. We then show that combining this with a state-of-the-art classification algorithm leads to significant improvements in performance especially for datasets which are considered particularly hard for recognition, e.g. birds species. The proposed algorithm is much more efficient than other known methods in similar scenarios. Our method is also simpler and we apply it here to different classes of objects, e.g. birds, flowers, cats and dogs. We tested the algorithm on a number of benchmark datasets for fine-grained categorization. It outperforms all the known state-of-the-art methods on these datasets, sometimes by as much as 11%. It improves the performance of our baseline algorithm by 3-4%, consistently on all datasets. We also observed more than a 4% improvement in the recognition performance on a challenging large-scale flower dataset, containing 578 species of flowers and 250,000 images.
TL;DR: This work proposes a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut and shows that in many applications this simple term makes NP-hard segmentation functionals unnecessary.
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into high-order NP-hard functionals (Zhu-Yuille, Chan-Vese, Grab Cut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.
TL;DR: A multi-atlas method that formulates a patch-based label fusion model in a Bayesian framework for cardiac magnetic resonance (MR) image segmentation and improves image registration accuracy by utilizing label information, which leads to improvement of segmentation accuracy.
Abstract: The evaluation of ventricular function is important for the diagnosis of cardiovascular diseases. It typically involves measurement of the left ventricular (LV) mass and LV cavity volume. Manual delineation of the myocardial contours is time-consuming and dependent on the subjective experience of the expert observer. In this paper, a multi-atlas method is proposed for cardiac magnetic resonance (MR) image segmentation. The proposed method is novel in two aspects. First, it formulates a patch-based label fusion model in a Bayesian framework. Second, it improves image registration accuracy by utilizing label information, which leads to improvement of segmentation accuracy. The proposed method was evaluated on a cardiac MR image set of 28 subjects. The average Dice overlap metric of our segmentation is 0.92 for the LV cavity, 0.89 for the right ventricular cavity and 0.82 for the myocardium. The results show that the proposed method is able to provide accurate information for clinical diagnosis.
TL;DR: The joint label fusion technique and the corrective learning technique, which won the first place of the 2012 MICCAI Multi-Atlas Labeling Challenge, is developed and an Insight-Toolkit based open source implementation of the methods are described, which extends the methods to work with multi-modality imaging data and is more suitable for segmentation problems with multiple labels.
Abstract: Label fusion based multi-atlas segmentation has proven to be one of the most competitive techniques for medical image segmentation. This technique transfers segmentations from expert-labeled images, called atlases, to a novel image using deformable image registration. Errors produced by label transfer are further reduced by label fusion that combines the results produced by all atlases into a consensus solution. Among the proposed label fusion strategies, weighted voting with spatially varying weight distributions derived from atlas-target intensity similarity is a simple and highly effective label fusion technique. However, one limitation of most weighted voting methods is that the weights are computed independently for each atlas, without taking into account the fact that different atlases may produce similar label errors. To address this problem, we recently developed the joint label fusion technique and the corrective learning technique, which won the first place of the 2012 MICCAI Multi-Atlas Labeling Challenge and was one of the top performers in 2013 MICCAI Segmentation: Algorithms, Theory and Applications (SATA) challenge. To make our techniques more accessible to the scientific research community, we describe an Insight-Toolkit based open source implementation of our label fusion methods. Our implementation extends our methods to work with multi-modality imaging data and is more suitable for segmentation problems with multiple labels. We demonstrate the usage of our tools through applying them to the 2012 MICCAI Multi-Atlas Labeling Challenge brain image dataset and the 2013 SATA challenge canine leg image dataset. We report the best results on these two datasets so far.