Top 255 papers published in the topic of Pyramid (image processing) in 2016

Showing papers on "Pyramid (image processing) published in 2016"

Book Chapter•10.1007/978-3-319-46478-7_38•

Towards Perspective-Free Object Counting with Deep Learning

[...]

Daniel Oñoro-Rubio¹, Roberto J. López-Sastre¹•Institutions (1)

8 Oct 2016

TL;DR: A novel convolutional neural network solution, named Counting CNN (CCNN), formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps, able to estimate object densities in different very crowded scenarios.

...read moreread less

Abstract: In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal of a novel convolutional neural network solution, named Counting CNN (CCNN). Essentially, the CCNN is formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps. Our second contribution consists in a scale-aware counting model, the Hydra CNN, able to estimate object densities in different very crowded scenarios where no geometric information of the scene can be provided. Hydra CNN learns a multiscale non-linear regression model which uses a pyramid of image patches extracted at multiple scales to perform the final density prediction. We report an extensive experimental evaluation, using up to three different object counting benchmarks, where we show how our solutions achieve a state-of-the-art performance.

...read moreread less

824 citations

Journal Article•10.1109/TGRS.2015.2493201•

Anomaly Detection in Hyperspectral Images Based on Low-Rank and Sparse Representation

[...]

Yang Xu¹, Zebin Wu¹, Jun Li², Antonio Plaza³, Zhihui Wei¹ - Show less +1 more•Institutions (3)

Nanjing University of Science and Technology¹, Sun Yat-sen University², University of Extremadura³

01 Apr 2016-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation based on the separation of the background and the anomalies in the observed data.

...read moreread less

Abstract: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation. The proposed method is based on the separation of the background and the anomalies in the observed data. Since each pixel in the background can be approximately represented by a background dictionary and the representation coefficients of all pixels form a low-rank matrix, a low-rank representation is used to model the background part. To better characterize each pixel's local representation, a sparsity-inducing regularization term is added to the representation coefficients. Moreover, a dictionary construction strategy is adopted to make the dictionary more stable and discriminative. Then, the anomalies are determined by the response of the residual matrix. An important advantage of the proposed algorithm is that it combines the global and local structure in the HSI. Experimental results have been conducted using both simulated and real data sets. These experiments indicate that our algorithm achieves very promising anomaly detection performance.

...read moreread less

545 citations

Book Chapter•10.1007/978-3-319-28549-8_3•

Transforms and Operators for Directional Bioimage Analysis: A Survey

[...]

Zsuzsanna Püspöki¹, Martin Storath¹, Daniel Sage¹, Michael Unser¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Jan 2016-Advances in Anatomy Embryology and Cell Biology

TL;DR: The intent is to provide image-processing methods that can be deployed in algorithms that analyze biomedical images with improved rotation invariance and high directional sensitivity, and address the problem of matching directional patterns by proposing steerable filters.

...read moreread less

Abstract: We give a methodology-oriented perspective on directional image analysis and rotation-invariant processing. We review the state of the art in the field and make connections with recent mathematical developments in functional analysis and wavelet theory. We unify our perspective within a common framework using operators. The intent is to provide image-processing methods that can be deployed in algorithms that analyze biomedical images with improved rotation invariance and high directional sensitivity. We start our survey with classical methods such as directional-gradient and the structure tensor. Then, we discuss how these methods can be improved with respect to robustness, invariance to geometric transformations (with a particular interest in scaling), and computation cost. To address robustness against noise, we move forward to higher degrees of directional selectivity and discuss Hessian-based detection schemes. To present multiscale approaches, we explain the differences between Fourier filters, directional wavelets, curvelets, and shearlets. To reduce the computational cost, we address the problem of matching directional patterns by proposing steerable filters, where one might perform arbitrary rotations and optimizations without discretizing the orientation. We define the property of steerability and give an introduction to the design of steerable filters. We cover the spectrum from simple steerable filters through pyramid schemes up to steerable wavelets. We also present illustrations on the design of steerable wavelets and their application to pattern recognition.

...read moreread less

482 citations

Journal Article•10.1109/TIP.2015.2495260•

Efficient Algorithms for Convolutional Sparse Representations

[...]

Brendt Wohlberg¹•Institutions (1)

Los Alamos National Laboratory¹

01 Jan 2016-IEEE Transactions on Image Processing

TL;DR: New, efficient algorithms that substantially improve on the performance of other recent methods of sparse representation are presented, contributing to the development of this type of representation as a practical tool for a wider range of problems.

...read moreread less

Abstract: When applying sparse representation techniques to images, the standard approach is to independently compute the representations for a set of overlapping image patches. This method performs very well in a variety of applications, but results in a representation that is multi-valued and not optimized with respect to the entire image. An alternative representation structure is provided by a convolutional sparse representation, in which a sparse representation of an entire image is computed by replacing the linear combination of a set of dictionary vectors by the sum of a set of convolutions with dictionary filters. The resulting representation is both single-valued and jointly optimized over the entire image. While this form of a sparse representation has been applied to a variety of problems in signal and image processing and computer vision, the computational expense of the corresponding optimization problems has restricted application to relatively small signals and images. This paper presents new, efficient algorithms that substantially improve on the performance of other recent methods, contributing to the development of this type of representation as a practical tool for a wider range of problems.

...read moreread less

419 citations

Two Dimensional Signal And Image Processing

[...]

Karolin Baecker

1 Jan 2016

TL;DR: The two dimensional signal and image processing is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.

...read moreread less

Abstract: Thank you for downloading two dimensional signal and image processing. As you may know, people have look hundreds times for their chosen novels like this two dimensional signal and image processing, but end up in malicious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some infectious virus inside their computer. two dimensional signal and image processing is available in our book collection an online access to it is set as public so you can download it instantly. Our digital library spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the two dimensional signal and image processing is universally compatible with any devices to read.

...read moreread less

282 citations

Book Chapter•10.1007/978-3-319-46466-4_41•

Human Attribute Recognition by Deep Hierarchical Contexts

[...]

Yining Li¹, Chen Huang¹, Chen Change Loy¹, Xiaoou Tang¹•Institutions (1)

The Chinese University of Hong Kong¹

8 Oct 2016

TL;DR: This work trains a Convolutional Neural Network to select the most attribute-descriptive human parts from all poselet detections, and combines them with the whole body as a pose-normalized deep representation, which surpasses competitive baselines on this dataset and other popular ones.

...read moreread less

Abstract: We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a pose-normalized deep representation. We further improve by using deep hierarchical contexts ranging from human-centric level to scene level. Human-centric context captures human relations, which we compute from the nearest neighbor parts of other people on a pyramid of CNN feature maps. The matched parts are then average pooled and they act as a similarity regularization. To utilize the scene context, we re-score human-centric predictions by the global scene classification score jointly learned in our CNN, yielding final scene-aware predictions. To facilitate our study, a large-scale WIDER Attribute dataset(Dataset URL: http://mmlab.ie.cuhk.edu.hk/projects/WIDERAttribute) is introduced with human attribute and image event annotations, and our method surpasses competitive baselines on this dataset and other popular ones.

...read moreread less

221 citations

Journal Article•10.1016/J.NEUCOM.2016.02.047•

Union Laplacian pyramid with multiple features for medical image fusion

[...]

Jiao Du¹, Weisheng Li¹, Bin Xiao¹, Qamar Nawaz¹•Institutions (1)

Chongqing University of Posts and Telecommunications¹

19 Jun 2016-Neurocomputing

TL;DR: Visual and statistical analyses show that the quality of fused image can be significantly improved over that of typical image quality assessment metrics in terms of structural similarity, peak-signal-to-noise ratio, standard deviation, and tone mapped image quality index metrics.

...read moreread less

214 citations

Posted Content•

Exploring Context with Deep Structured models for Semantic Segmentation

[...]

Guosheng Lin¹, Chunhua Shen², Anton van den Hengel², Ian Reid²•Institutions (2)

Nanyang Technological University¹, University of Adelaide²

10 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work formulate deep structured models by combining CNNs and Conditional Random Fields for learning the patch-patch context between image regions, and formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.

...read moreread less

Abstract: State-of-the-art semantic image segmentation methods are mostly based on training deep convolutional neural networks (CNNs). In this work, we proffer to improve semantic segmentation with the use of contextual information. In particular, we explore `patch-patch' context and `patch-background' context in deep CNNs. We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions. Specifically, we formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied in order to avoid repeated expensive CRF inference during the course of back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image inputs and sliding pyramid pooling is very effective for improving performance. We perform comprehensive evaluation of the proposed method. We achieve new state-of-the-art performance on a number of challenging semantic segmentation datasets including $NYUDv2$, $PASCAL$-$VOC2012$, $Cityscapes$, $PASCAL$-$Context$, $SUN$-$RGBD$, $SIFT$-$flow$, and $KITTI$ datasets. Particularly, we report an intersection-over-union score of $77.8$ on the $PASCAL$-$VOC2012$ dataset.

...read moreread less

94 citations

Journal Article•10.1109/TIFS.2016.2535899•

Fingerprint Liveness Detection From Single Image Using Low-Level Features and Shape Analysis

[...]

Rohit K. Dubey¹, Jonathan Goh¹, Vrizlynn L. L. Thing¹•Institutions (1)

Agency for Science, Technology and Research¹

29 Feb 2016-IEEE Transactions on Information Forensics and Security

TL;DR: This paper proposes to combine low-level gradient features from speeded-up robust features, pyramid extension of the histograms of oriented gradient and texture features from Gabor wavelet using dynamic score level integration and extract these features from a single fingerprint image to overcome the issues faced in dynamic software approaches, which require user cooperation and longer computational time.

...read moreread less

Abstract: Fingerprint-based authentication systems have developed rapidly in the recent years. However, current fingerprint-based biometric systems are vulnerable to spoofing attacks. Moreover, single feature-based static approach does not perform equally over different fingerprint sensors and spoofing materials. In this paper, we propose a static software approach. We propose to combine low-level gradient features from speeded-up robust features, pyramid extension of the histograms of oriented gradient and texture features from Gabor wavelet using dynamic score level integration. We extract these features from a single fingerprint image to overcome the issues faced in dynamic software approaches, which require user cooperation and longer computational time. A experimental analysis done on LivDet 2011 data produced an average equal error rate (EER) of 3.95% over four databases. The result outperforms the existing best average EER of 9.625%. We also performed experiments with LivDet 2013 database and achieved an average classification error rate of 2.27% in comparison with 12.87% obtained by the LivDet 2013 competition winner.

...read moreread less

88 citations

Proceedings Article•

Learning cross-domain neural networks for sketch-based 3D shape retrieval

[...]

Fan Zhu¹, Jin Xie¹, Yi Fang¹•Institutions (1)

New York University Abu Dhabi¹

12 Feb 2016

TL;DR: Experimental results suggest that both CDNN and PCDNN can outperform state-of-the-art performance, where PCdNN can further improve CDNN when employing a hierarchical structure.

...read moreread less

Abstract: Sketch-based 3D shape retrieval, which returns a set of relevant 3D shapes based on users' input sketch queries, has been receiving increasing attentions in both graphics community and vision community. In this work, we address the sketch-based 3D shape retrieval problem with a novel Cross-Domain Neural Networks (CDNN) approach, which is further extended to Pyramid Cross-Domain Neural Networks (PCDNN) by cooperating with a hierarchical structure. In order to alleviate the discrepancies between sketch features and 3D shape features, a neural network pair that forces identical representations at the target layer for instances of the same class is trained for sketches and 3D shapes respectively. By constructing cross-domain neural networks at multiple pyramid levels, a many-to-one relationship is established between a 3D shape feature and sketch features extracted from different scales. We evaluate the effectiveness of both CDNN and PCDNN approach on the extended large-scale SHREC 2014 benchmark and compare with some other well established methods. Experimental results suggest that both CDNN and PCDNN can outperform state-of-the-art performance, where PCDNN can further improve CDNN when employing a hierarchical structure.

...read moreread less

77 citations

Journal Article•10.1109/TIP.2015.2502147•

Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization

[...]

Luming Zhang¹, Yang Yang², Meng Wang¹, Richang Hong¹, Liqiang Nie³, Xuelong Li⁴ - Show less +2 more•Institutions (4)

Hefei University of Technology¹, University of Electronic Science and Technology of China², National University of Singapore³, Chinese Academy of Sciences⁴

01 Feb 2016-IEEE Transactions on Image Processing

TL;DR: A dense graph mining algorithm is developed to discover graphlets representative to each super-/sub-category, and the discovered graphlets from each sub-category accurately capture those tiny discriminative object components, e.g., bird claws, heads, and bodies.

...read moreread less

Abstract: Fine-grained image categorization is a challenging task aiming at distinguishing objects belonging to the same basic-level category, e.g., leaf or mushroom. It is a useful technique that can be applied for species recognition, face verification, and so on. Most of the existing methods either have difficulties to detect discriminative object components automatically, or suffer from the limited amount of training data in each sub-category. To solve these problems, this paper proposes a new fine-grained image categorization model. The key is a dense graph mining algorithm that hierarchically localizes discriminative object parts in each image. More specifically, to mimic the human hierarchical perception mechanism, a superpixel pyramid is generated for each image. Thereby, graphlets from each layer are constructed to seamlessly capture object components. Intuitively, graphlets representative to each super-/sub-category is densely distributed in their feature space. Thus, a dense graph mining algorithm is developed to discover graphlets representative to each super-/sub-category. Finally, the discovered graphlets from pairwise images are integrated into an image kernel for fine-grained recognition. Theoretically, the learned kernel can generalize several state-of-the-art image kernels. Experiments on nine image sets demonstrate the advantage of our method. Moreover, the discovered graphlets from each sub-category accurately capture those tiny discriminative object components, e.g., bird claws, heads, and bodies.

...read moreread less

Proceedings Article•10.1109/ICIP.2016.7532434•

Multi-scale blocks based image emotion classification using multiple instance learning

[...]

Tianrong Rao¹, Min Xu¹, Huiying Liu², Jinqiao Wang³, Ian Burnett¹ - Show less +1 more•Institutions (3)

University of Technology, Sydney¹, Institute for Infocomm Research Singapore², Chinese Academy of Sciences³

3 Aug 2016

TL;DR: This work proposes an emotion classification method based on multi-scale blocks using Multiple Instance Learning (MIL), which reduces the need for exact labelling and is employed to classify the dominant emotion type of the image.

...read moreread less

Abstract: Emotional factors usually affect users' preferences for and evaluations of images. Although affective image analysis attracts increasing attention, there are still three major challenges remaining: 1) it is difficult to classify an image into a single emotion type since different regions within an image can represent different emotions; 2) there is a gap between low-level features and high-level emotions and 3) it is difficult to collect a training set of reliable emotional image content. To address these three issues, we propose an emotion classification method based on multi-scale blocks using Multiple Instance Learning (MIL). We firstly extract blocks of an image at multiple scales using different image segmentation methods pyramid segmentation and simple linear iterative clustering (SLIC) and represent each block using the bag-of-visual-words (BoVW) method. Then, to bridge the “affective gap”, probabilistic latent semantic analysis (pLSA) is employed to estimate the latent topic distribution as a mid-level representation of each block. Finally, MIL, which reduces the need for exact labelling, is employed to classify the dominant emotion type of the image. Experiments carried out on three widely used datasets demonstrate that our proposed method with S-LIC effectively improves the state-of-the-art results of image emotion classification 5.1% on average.

...read moreread less

Proceedings Article•10.1109/CVPR.2016.298•

Laplacian Patch-Based Image Synthesis

[...]

Jooho Lee¹, Inchang Choi¹, Min H. Kim¹•Institutions (1)

KAIST¹

27 Jun 2016

TL;DR: The Laplacian pyramid has the advantage of being isotropic in detecting changes to provide more consistent performance in decomposing the base structure and the detailed localization and does not require heavy computation as it employs approximation by the differences of Gaussians.

...read moreread less

Abstract: Patch-based image synthesis has been enriched with global optimization on the image pyramid. Successively, the gradient-based synthesis has improved structural coherence and details. However, the gradient operator is directional and inconsistent and requires computing multiple operators. It also introduces a significantly heavy computational burden to solve the Poisson equation that often accompanies artifacts in non-integrable gradient fields. In this paper, we propose a patch-based synthesis using a Laplacian pyramid to improve searching correspondence with enhanced awareness of edge structures. Contrary to the gradient operators, the Laplacian pyramid has the advantage of being isotropic in detecting changes to provide more consistent performance in decomposing the base structure and the detailed localization. Furthermore, it does not require heavy computation as it employs approximation by the differences of Gaussians. We examine the potentials of the Laplacian pyramid for enhanced edge-aware correspondence search. We demonstrate the effectiveness of the Laplacian-based approach over the state-of-the-art patchbased image synthesis methods.

...read moreread less

Journal Article•10.1016/J.NEUCOM.2015.12.042•

Speed up deep neural network based pedestrian detection by sharing features across multi-scale models

[...]

Xiaoheng Jiang¹, Yanwei Pang¹, Xuelong Li², Jing Pan³•Institutions (3)

Tianjin University¹, Chinese Academy of Sciences², Tianjin University of Technology and Education³

12 Apr 2016-Neurocomputing

TL;DR: This paper proposes to share features across a group of DNNs that correspond to pedestrian models of different sizes that can detect pedestrians of several different scales on one single layer of an image pyramid to improve detection efficiency.

...read moreread less

Proceedings Article•10.1117/12.2233963•

PASSATA - Object oriented numerical simulation software for adaptive optics

[...]

Guido Agapito, Alfio Puglisi, Simone Esposito

26 Jul 2016-arXiv: Instrumentation and Methods for Astrophysics

TL;DR: The last version of the PyrAmid Simulator Software for Adaptive opTics Arcetri (PASSATA), an IDL and CUDA based object oriented software developed in the Adaptive Optics group of theArcetri observatory for Monte-Carlo end-to-end adaptive optics simulations is presented.

...read moreread less

Abstract: We present the last version of the PyrAmid Simulator Software for Adaptive opTics Arcetri (PASSATA), an IDL and CUDA based object oriented software developed in the Adaptive Optics group of the Arcetri observatory for Monte-Carlo end-to-end adaptive optics simulations. The original aim of this software was to evaluate the performance of a single conjugate adaptive optics system for ground based telescope with a pyramid wavefront sensor. After some years of development, the current version of PASSATA is able to simulate several adaptive optics systems: single conjugate, multi conjugate and ground layer, with Shack Hartmann and Pyramid wavefront sensors. It can simulate from 8m to 40m class telescopes, with diffraction limited and resolved sources at finite or infinite distance from the pupil. The main advantages of this software are the versatility given by the object oriented approach and the speed given by the CUDA implementation of the most computational demanding routines. We describe the software with its last developments and present some examples of application.

...read moreread less

Book Chapter•10.1007/978-3-319-46448-0_24•

Fast 6D Pose Estimation from a Monocular Image Using Hierarchical Pose Trees

[...]

Yoshinori Konishi¹, Yuki Hanzawa¹, Masato Kawade¹, Manabu Hashimoto²•Institutions (2)

Omron¹, Chukyo University²

8 Oct 2016

TL;DR: In this paper, the authors proposed a perspectively cumulated orientation feature (PCOF) based on the orientation histograms extracted from randomly generated 2D projection images using 3D CAD data, and the template using PCOF explicitly handle a certain range of 3D object pose.

...read moreread less

Abstract: It has been shown that the template based approaches could quickly estimate 6D pose of texture-less objects from a monocular image. However, they tend to be slow when the number of templates amounts to tens of thousands for handling a wider range of 3D object pose. To alleviate this problem, we propose a novel image feature and a tree-structured model. Our proposed perspectively cumulated orientation feature (PCOF) is based on the orientation histograms extracted from randomly generated 2D projection images using 3D CAD data, and the template using PCOF explicitly handle a certain range of 3D object pose. The hierarchical pose trees (HPT) is built by clustering 3D object pose and reducing the resolutions of templates, and HPT accelerates 6D pose estimation based on a coarse-to-fine strategy with an image pyramid. In the experimental evaluation on our texture-less object dataset, the combination of PCOF and HPT showed higher accuracy and faster speed in comparison with state-of-the-art techniques.

...read moreread less

Posted Content•

Multigrid Neural Architectures

[...]

Tsung-Wei Ke¹, Michael Maire², Stella X. Yu¹•Institutions (2)

University of California, Berkeley¹, Simon Fraser University²

23 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Multigrid as mentioned in this paper proposes a multigrid extension of convolutional neural networks (CNNs) to operate across scale space, on a pyramid of grids. But it does not address the problem of spatial transformation.

...read moreread less

Abstract: We propose a multigrid extension of convolutional neural networks (CNNs). Rather than manipulating representations living on a single spatial grid, our network layers operate across scale space, on a pyramid of grids. They consume multigrid inputs and produce multigrid outputs; convolutional filters themselves have both within-scale and cross-scale extent. This aspect is distinct from simple multiscale designs, which only process the input at different scales. Viewed in terms of information flow, a multigrid network passes messages across a spatial pyramid. As a consequence, receptive field size grows exponentially with depth, facilitating rapid integration of context. Most critically, multigrid structure enables networks to learn internal attention and dynamic routing mechanisms, and use them to accomplish tasks on which modern CNNs fail. Experiments demonstrate wide-ranging performance advantages of multigrid. On CIFAR and ImageNet classification tasks, flipping from a single grid to multigrid within the standard CNN paradigm improves accuracy, while being compute and parameter efficient. Multigrid is independent of other architectural choices; we show synergy in combination with residual connections. Multigrid yields dramatic improvement on a synthetic semantic segmentation dataset. Most strikingly, relatively shallow multigrid networks can learn to directly perform spatial transformation tasks, where, in contrast, current CNNs fail. Together, our results suggest that continuous evolution of features on a multigrid pyramid is a more powerful alternative to existing CNN designs on a flat grid.

...read moreread less

Proceedings Article•10.1109/ISBI.2016.7493400•

X-ray image classification using domain transferred convolutional neural networks and local sparse spatial pyramid

[...]

Euijoon Ahn¹, Ashnil Kumar¹, Jinman Kim¹, Changyang Li¹, Dagan Feng¹, Michael J. Fulham¹ - Show less +2 more•Institutions (1)

University of Sydney¹

13 Apr 2016

TL;DR: A late-fusion of domain transferred convolutional neural networks (DT-CNNs) with sparse spatial pyramid (SSP) features derived from a local image dictionary is proposed, which is robust as it exploits the rich generic information provided by the DT- CNNs and uses the specific local features and characteristics inherent in the X-ray images.

...read moreread less

Abstract: The classification of medical images is a critical step for imaging-based clinical decision support systems. Existing classification methods for X-ray images, however, generally represent the image using only local texture or generic image features (e.g. color or shape) derived from predefined feature spaces. This limits the ability to quantify the image characteristics using general data-derived features learned from image datasets. In this study we present a new algorithm to improve the performance of X-ray image classification, where we propose a late-fusion of domain transferred convolutional neural networks (DT-CNNs) with sparse spatial pyramid (SSP) features derived from a local image dictionary. Our method is robust as it exploits the rich generic information provided by the DT-CNNs and uses the specific local features and characteristics inherent in the X-ray images. Our method was evaluated on a public dataset of X-ray images and was compared to several state-of-the-art approaches. Experimental results show that our method was the most accurate for classification.

...read moreread less

Posted Content•

Adaptive Deep Pyramid Matching for Remote Sensing Scene Classification.

[...]

Qingshan Liu, Renlong Hang, Huihui Song, Fuping Zhu, Javier Plaza, Antonio Plaza - Show less +2 more

11 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new adaptive deep pyramid matching (ADPM) model is proposed that takes advantage of the features from all of the convolutional layers for remote sensing image classification, and significantly improves the performance when compared to other state-of-the-art methods.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have attracted increasing attention in the remote sensing community. Most CNNs only take the last fully-connected layers as features for the classification of remotely sensed images, discarding the other convolutional layer features which may also be helpful for classification purposes. In this paper, we propose a new adaptive deep pyramid matching (ADPM) model that takes advantage of the features from all of the convolutional layers for remote sensing image classification. To this end, the optimal fusing weights for different convolutional layers are learned from the data itself. In remotely sensed scenes, the objects of interest exhibit different scales in distinct scenes, and even a single scene may contain objects with different sizes. To address this issue, we select the CNN with spatial pyramid pooling (SPP-net) as the basic deep network, and further construct a multi-scale ADPM model to learn complementary information from multi-scale images. Our experiments have been conducted using two widely used remote sensing image databases, and the results show that the proposed method significantly improves the performance when compared to other state-of-the-art methods.

...read moreread less

Journal Article•10.1145/2886775•

Semantic Photo Retargeting Under Noisy Image Labels

[...]

Luming Zhang¹, Xuelong Li², Liqiang Nie³, Yan Yan⁴, Roger Zimmermann³ - Show less +1 more•Institutions (4)

Hefei University of Technology¹, Chinese Academy of Sciences², National University of Singapore³, University of Trento⁴

20 May 2016-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: A new semantically aware photo retargeting that shrinks a photo according to region semantics and a probabilistic model is proposed to enforce the spatial layout of a retargeted photo to be maximally similar to those from the training photos.

...read moreread less

Abstract: With the popularity of mobile devices, photo retargeting has become a useful technique that adapts a high-resolution photo onto a low-resolution screen Conventional approaches are limited in two aspects The first factor is the de-emphasized role of semantic content that is many times more important than low-level features in photo aesthetics Second is the importance of image spatial modeling: toward a semantically reasonable retargeted photo, the spatial distribution of objects within an image should be accurately learned To solve these two problems, we propose a new semantically aware photo retargeting that shrinks a photo according to region semantics The key technique is a mechanism transferring semantics of noisy image labels (inaccurate labels predicted by a learner like an SVM) into different image regions In particular, we first project the local aesthetic features (graphlets in this work) onto a semantic space, wherein image labels are selectively encoded according to their noise level Then, a category-sharing model is proposed to robustly discover the semantics of each image region The model is motivated by the observation that the semantic distribution of graphlets from images tagged by a common label remains stable in the presence of noisy labels Thereafter, a spatial pyramid is constructed to hierarchically encode the spatial layout of graphlet semantics Based on this, a probabilistic model is proposed to enforce the spatial layout of a retargeted photo to be maximally similar to those from the training photos Experimental results show that (1) noisy image labels predicted by different learners can improve the retargeting performance, according to both qualitative and quantitative analysis, and (2) the category-sharing model stays stable even when 3236p of image labels are incorrectly predicted

...read moreread less

Proceedings Article•10.1109/AIPR.2016.8010595•

Registering large volume serial-section electron microscopy image sets for neural circuit reconstruction using FFT signal whitening

[...]

Arthur W. Wetzel¹, Jennifer Bakal¹, Markus Dittrich¹, David G. C. Hildebrand², Josh Morgan², Jeff W. Lichtman² - Show less +2 more•Institutions (2)

Pittsburgh Supercomputing Center¹, Harvard University²

1 Oct 2016

TL;DR: In this article, a Signal Whitening Fourier Transform Image Registration (SWiFT-IR) approach is proposed to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University.

...read moreread less

Abstract: The detailed reconstruction of neural anatomy for connectomics studies requires a combination of resolution and large three-dimensional data capture provided by serial section electron microscopy (ssEM). The convergence of high throughput ssEM imaging and improved tissue preparation methods now allows ssEM capture of complete specimen volumes up to cubic millimeter scale. The resulting multi-terabyte image sets span thousands of serial sections and must be precisely registered into coherent volumetric forms in which neural circuits can be traced and segmented. This paper introduces a Signal Whitening Fourier Transform Image Registration approach (SWiFT-IR) under development at the Pittsburgh Supercomputing Center and its use to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University. Unlike other methods now used for ssEM registration, SWiFT-IR modifies its spatial frequency response during image matching to maximize a signal-to-noise measure used as its primary indicator of alignment quality. This alignment signal is more robust to rapid variations in biological content and unavoidable data distortions than either phase-only or standard Pearson correlation, thus allowing more precise alignment and statistical confidence. These improvements in turn enable an iterative registration procedure based on projections through multiple sections rather than more typical adjacent-pair matching methods. This projection approach, when coupled with known anatomical constraints and iteratively applied in a multi-resolution pyramid fashion, drives the alignment into a smooth form that properly represents complex and widely varying anatomical content such as the full crosssection zebrafish data.

...read moreread less

Journal Article•10.1016/J.PATREC.2016.03.024•

A multi-process system for HEp-2 cells classification based on SVM

[...]

Donato Cascio¹, Vincenzo Taormina¹, Marco Cipolla¹, Salvatore Bruno¹, Francesco Fauci, Giuseppe Raso¹ - Show less +2 more•Institutions (1)

University of Palermo¹

15 Oct 2016-Pattern Recognition Letters

TL;DR: A system able to classify pre-segmented immunofluorescence images of HEp-2 cells into six classes based on the one-against-one (OAO) scheme is described.

...read moreread less

Proceedings Article•10.1109/AIPR.2016.8010595•

Registering large volume serial-section electron microscopy image sets for neural circuit reconstruction using FFT signal whitening

[...]

Arthur W. Wetzel¹, Jennifer Bakal¹, Markus Dittrich¹, David G. C. Hildebrand², Josh Morgan², Jeff W. Lichtman² - Show less +2 more•Institutions (2)

Pittsburgh Supercomputing Center¹, Harvard University²

14 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A Signal Whitening Fourier Transform Image Registration approach (SWiFT-IR) under development at the Pittsburgh Supercomputing Center and its use to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University are introduced.

...read moreread less

Journal Article•10.1109/TCSVT.2015.2418585•

Blur-Kernel Bound Estimation From Pyramid Statistics

[...]

Shaoguo Liu¹, Haibo Wang², Jue Wang³, Chunhong Pan¹•Institutions (3)

Chinese Academy of Sciences¹, Shandong University², Adobe Systems³

01 May 2016-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Experimental results show that the proposed method can estimate accurate blur kernel sizes, enabling existing blind deconvolution methods to achieve best possible results.

...read moreread less

Abstract: This letter presents an approach for automatically estimating the spatial bound of the blur kernel in a motion-blurred image based on the statistics of multilevel image gradients. We observe that blur has a significant impact on the histogram of oriented gradients (HOGs) at higher levels of an image pyramid, but has much less of an impact at coarser levels. Based on this fact, we estimate the spatial bound of the unknown blur kernel using a learning-based approach. We first learn a generic pyramid HOG model from natural sharp images, then given an HOG pyramid of a blurry image, we predict the corresponding model of its latent sharp image. Finally, we learn another model to predict the spatial kernel bound from the difference between the observed and the predicted HOG pyramids. Experimental results show that the proposed method can estimate accurate blur kernel sizes, enabling existing blind deconvolution methods to achieve best possible results.

...read moreread less

Journal Article•10.1109/TIP.2016.2590825•

Edge-Guided Image Gap Interpolation Using Multi-Scale Transformation

[...]

Bahareh Langari¹, Saeed Vaseghi², Ales Prochazka², Babak Vaziri, Farzad Tahmasebi Aria³ - Show less +1 more•Institutions (3)

Brunel University London¹, Institute of Chemical Technology in Prague², Middlesex University³

13 Jul 2016-IEEE Transactions on Image Processing

TL;DR: Improvements in image gap restoration through the incorporation of edge-based directional interpolation within multi-scale pyramid transforms are presented, demonstrating that the proposed method improves peak-signal-to-noise-ratio by 1-5 dB compared with a range of best published works.

...read moreread less

Abstract: This paper presents improvements in image gap restoration through the incorporation of edge-based directional interpolation within multi-scale pyramid transforms. Two types of image edges are reconstructed: 1) the local edges or textures, inferred from the gradients of the neighboring pixels and 2) the global edges between image objects or segments, inferred using a Canny detector. Through a process of pyramid transformation and downsampling, the image is progressively transformed into a series of reduced size layers until at the pyramid apex the gap size is one sample. At each layer, an edge skeleton image is extracted for edge-guided interpolation. The process is then reversed; from the apex, at each layer, the missing samples are estimated (an iterative method is used in the last stage of upsampling), up-sampled, and combined with the available samples of the next layer. Discrete cosine transform and a family of discrete wavelet transforms are utilized as alternatives for pyramid construction. Evaluations over a range of images, in regular and random loss pattern, at loss rates of up to 40%, demonstrate that the proposed method improves peak-signal-to-noise-ratio by 1–5 dB compared with a range of best published works.

...read moreread less

Journal Article•10.1007/S11760-016-0876-7•

Image de-fencing framework with hybrid inpainting algorithm

[...]

Muhammad Shahid Farid¹, Muhammad Shahid Farid², Arif Mahmood³, Marco Grangetto¹•Institutions (3)

University of Turin¹, University of the Punjab², Qatar University³

02 Mar 2016-Signal, Image and Video Processing

TL;DR: A novel image de-fencing algorithm that effectively detects and removes fences with minimal user input is presented and is able to remove both regular and irregular fences.

...read moreread less

Abstract: Detection and removal of fences from digital images become essential when an important part of the scene turns to be occluded by such unwanted structures. Image de-fencing is challenging because manually marking fence boundaries is tedious and time-consuming. In this paper, a novel image de-fencing algorithm that effectively detects and removes fences with minimal user input is presented. The user is only requested to mark few fence pixels; then, color models are estimated and used to train Bayes classifier to segment the fence and the background. Finally, the fence mask is refined exploiting connected component analysis and morphological operators. To restore the occluded region, a hybrid inpainting algorithm is proposed that integrates exemplar-based technique with a pyramid-based interpolation approach. In contrast to previous solutions which work only for regular pattern fences, the proposed technique is able to remove both regular and irregular fences. A large number of experiments are carried out on a wide variety of images containing different types of fences demonstrating the effectiveness of the proposed approach. The proposed approach is also compared with state-of-the-art image de-fencing and inpainting techniques and showed convincing results.

...read moreread less

Book Chapter•10.1007/978-3-319-46484-8_41•

Deep Self-correlation Descriptor for Dense Cross-Modal Correspondence

[...]

Seungryong Kim¹, Dongbo Min², Stephen Lin³, Kwanghoon Sohn¹•Institutions (3)

Yonsei University¹, Chungnam National University², Microsoft³

8 Oct 2016

TL;DR: A novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions, which can be robust to cross-modal imaging and densely computed in an efficient manner that significantly reduces computational redundancy.

...read moreread less

Abstract: We present a novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. Motivated by local self-similarity (LSS), we formulate a novel descriptor by leveraging LSS in a deep architecture, leading to better discriminative power and greater robustness to non-rigid image deformations than state-of-the-art descriptors. The DSC first computes self-correlation surfaces over a local support window for randomly sampled patches, and then builds hierarchical self-correlation surfaces by performing an average pooling within a deep architecture. Finally, the feature responses on the self-correlation surfaces are encoded through a spatial pyramid pooling in a circular configuration. In contrast to convolutional neural networks (CNNs) based descriptors, the DSC is training-free, is robust to cross-modal imaging, and can be densely computed in an efficient manner that significantly reduces computational redundancy. The state-of-the-art performance of DSC on challenging cases of cross-modal image pairs is demonstrated through extensive experiments.

...read moreread less

Proceedings Article•10.1109/COASE.2016.7743558•

Automated identification of components in raster piping and instrumentation diagram with minimal pre-processing

[...]

Wei Chian Tan¹, I-Ming Chen¹, Hoon Kiang Tan²•Institutions (2)

Nanyang Technological University¹, Lloyd's Register²

1 Aug 2016

TL;DR: A novel framework for automated recognition of components in a Piping and Instrumentation Diagram (P&ID) of raster form using Local Binary Pattern (LBP) as descriptor and concept of Spatial Pyramid Matching (SPM).

...read moreread less

Abstract: This paper proposes a novel framework for automated recognition of components in a Piping and Instrumentation Diagram (P&ID) of raster form. Contour is used as the main clue for visual recognition through the use of Local Binary Pattern (LBP) as descriptor and concept of Spatial Pyramid Matching (SPM). Comparison of two image patches is done by calculating the l1 distance between two corresponding LBP based descriptors. Firstly, the framework requires at least one example image per type of component to be recognised, the corresponding LBP and SPM based descriptor is determined and stored. Linear sliding window approach is used to detect a small set of top candidates from a pool of all sub-images in original image. Verification against the entire library of symbols is performed on each candidate selected from previous stage, using concept of nearest neighbour based classification. The method has demonstrated state of the art performance in a new challenging dataset created with advices from a group of experienced engineers in marine and offshore industry.

...read moreread less

Proceedings Article•10.1109/DCC.2016.23•

Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing

[...]

Jie Lin¹, Olivier Morère², Julie Petta, Vijay Chandrasekhar¹, Antoine Veillard³ - Show less +1 more•Institutions (3)

Institute for Infocomm Research Singapore¹, Pierre-and-Marie-Curie University², University of Paris³

1 Mar 2016

TL;DR: Unsupervised Triplet Hashing (UTH) as mentioned in this paper is a fully unsupervised method to compute extremely compact binary hashes from high-dimensional global descriptors, which consists of two successive deep learning steps.

...read moreread less

Abstract: A typical image retrieval pipeline starts with the comparison of global descriptors from a large database to find a short list of candidate matches. A good image descriptor is key to the retrieval pipeline and should reconcile two contradictory requirements: providing recall rates as high as possible and being as compact as possible for fast matching. Following the recent successes of Deep Convolutional Neural Networks (DCNN) for large scale image classification, descriptors extracted from DCNNs are increasingly used in place of the traditional hand crafted descriptors such as Fisher Vectors (FV) with better retrieval performances. Nevertheless, the dimensionality of a typical DCNN descriptor–extracted either from the visual feature pyramid or the fully-connected layers–remains quite high at several thousands of scalar values. In this paper, we propose Unsupervised Triplet Hashing (UTH), a fully unsupervised method to compute extremely compact binary hashes–in the 32-256 bits range–from high-dimensional global descriptors. UTH consists of two successive deep learning steps. First, Stacked Restricted Boltzmann Machines (SRBM), a type of unsupervised deep neural nets, are used to learn binary embedding functions able to bring the descriptor size down to the desired bitrate. SRBMs are typically able to ensure a very high compression rate at the expense of loosing some desirable metric properties of the original DCNN descriptor space. Then, triplet networks, a rank learning scheme based on weight sharing nets is used to fine-tune the binary embedding functions to retain as much as possible of the useful metric properties of the original space. A thorough empirical evaluation conducted on multiple publicly available dataset using DCNN descriptors shows that our method is able to significantly outperform state-of-the-art unsupervised schemes in the target bit range.

...read moreread less

Journal Article•10.1016/J.NEUCOM.2015.09.049•

Video-based facial expression recognition using learned spatiotemporal pyramid sparse coding features

[...]

Fei Long¹, Marian Stewart Bartlett²•Institutions (2)

Xiamen University¹, University of California, San Diego²

15 Jan 2016-Neurocomputing

TL;DR: Experimental results on widely used Cohn-Kanade database show that the classification performance can be improved effectively by considering spatiotemporal layout of facial expressions, and the method outperforms popular methods using hand-designed features.

...read moreread less

...

Expand