Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Pyramid (image processing)
  4. 2016
  1. Home
  2. Topics
  3. Pyramid (image processing)
  4. 2016
Showing papers on "Pyramid (image processing) published in 2016"
Book Chapter•10.1007/978-3-319-46478-7_38•
Towards Perspective-Free Object Counting with Deep Learning

[...]

Daniel Oñoro-Rubio1, Roberto J. López-Sastre1•
University of Alcalá1
8 Oct 2016
TL;DR: A novel convolutional neural network solution, named Counting CNN (CCNN), formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps, able to estimate object densities in different very crowded scenarios.
Abstract: In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal of a novel convolutional neural network solution, named Counting CNN (CCNN). Essentially, the CCNN is formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps. Our second contribution consists in a scale-aware counting model, the Hydra CNN, able to estimate object densities in different very crowded scenarios where no geometric information of the scene can be provided. Hydra CNN learns a multiscale non-linear regression model which uses a pyramid of image patches extracted at multiple scales to perform the final density prediction. We report an extensive experimental evaluation, using up to three different object counting benchmarks, where we show how our solutions achieve a state-of-the-art performance.

824 citations

Journal Article•10.1109/TGRS.2015.2493201•
Anomaly Detection in Hyperspectral Images Based on Low-Rank and Sparse Representation

[...]

Yang Xu1, Zebin Wu1, Jun Li2, Antonio Plaza3, Zhihui Wei1 •
Nanjing University of Science and Technology1, Sun Yat-sen University2, University of Extremadura3
01 Apr 2016-IEEE Transactions on Geoscience and Remote Sensing
TL;DR: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation based on the separation of the background and the anomalies in the observed data.
Abstract: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation. The proposed method is based on the separation of the background and the anomalies in the observed data. Since each pixel in the background can be approximately represented by a background dictionary and the representation coefficients of all pixels form a low-rank matrix, a low-rank representation is used to model the background part. To better characterize each pixel's local representation, a sparsity-inducing regularization term is added to the representation coefficients. Moreover, a dictionary construction strategy is adopted to make the dictionary more stable and discriminative. Then, the anomalies are determined by the response of the residual matrix. An important advantage of the proposed algorithm is that it combines the global and local structure in the HSI. Experimental results have been conducted using both simulated and real data sets. These experiments indicate that our algorithm achieves very promising anomaly detection performance.

545 citations

Book Chapter•10.1007/978-3-319-28549-8_3•
Transforms and Operators for Directional Bioimage Analysis: A Survey

[...]

Zsuzsanna Püspöki1, Martin Storath1, Daniel Sage1, Michael Unser1•
École Polytechnique Fédérale de Lausanne1
01 Jan 2016-Advances in Anatomy Embryology and Cell Biology
TL;DR: The intent is to provide image-processing methods that can be deployed in algorithms that analyze biomedical images with improved rotation invariance and high directional sensitivity, and address the problem of matching directional patterns by proposing steerable filters.
Abstract: We give a methodology-oriented perspective on directional image analysis and rotation-invariant processing. We review the state of the art in the field and make connections with recent mathematical developments in functional analysis and wavelet theory. We unify our perspective within a common framework using operators. The intent is to provide image-processing methods that can be deployed in algorithms that analyze biomedical images with improved rotation invariance and high directional sensitivity. We start our survey with classical methods such as directional-gradient and the structure tensor. Then, we discuss how these methods can be improved with respect to robustness, invariance to geometric transformations (with a particular interest in scaling), and computation cost. To address robustness against noise, we move forward to higher degrees of directional selectivity and discuss Hessian-based detection schemes. To present multiscale approaches, we explain the differences between Fourier filters, directional wavelets, curvelets, and shearlets. To reduce the computational cost, we address the problem of matching directional patterns by proposing steerable filters, where one might perform arbitrary rotations and optimizations without discretizing the orientation. We define the property of steerability and give an introduction to the design of steerable filters. We cover the spectrum from simple steerable filters through pyramid schemes up to steerable wavelets. We also present illustrations on the design of steerable wavelets and their application to pattern recognition.

482 citations

Journal Article•10.1109/TIP.2015.2495260•
Efficient Algorithms for Convolutional Sparse Representations

[...]

Brendt Wohlberg1•
Los Alamos National Laboratory1
01 Jan 2016-IEEE Transactions on Image Processing
TL;DR: New, efficient algorithms that substantially improve on the performance of other recent methods of sparse representation are presented, contributing to the development of this type of representation as a practical tool for a wider range of problems.
Abstract: When applying sparse representation techniques to images, the standard approach is to independently compute the representations for a set of overlapping image patches. This method performs very well in a variety of applications, but results in a representation that is multi-valued and not optimized with respect to the entire image. An alternative representation structure is provided by a convolutional sparse representation, in which a sparse representation of an entire image is computed by replacing the linear combination of a set of dictionary vectors by the sum of a set of convolutions with dictionary filters. The resulting representation is both single-valued and jointly optimized over the entire image. While this form of a sparse representation has been applied to a variety of problems in signal and image processing and computer vision, the computational expense of the corresponding optimization problems has restricted application to relatively small signals and images. This paper presents new, efficient algorithms that substantially improve on the performance of other recent methods, contributing to the development of this type of representation as a practical tool for a wider range of problems.

419 citations

Two Dimensional Signal And Image Processing

[...]

Karolin Baecker
1 Jan 2016
TL;DR: The two dimensional signal and image processing is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.
Abstract: Thank you for downloading two dimensional signal and image processing. As you may know, people have look hundreds times for their chosen novels like this two dimensional signal and image processing, but end up in malicious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some infectious virus inside their computer. two dimensional signal and image processing is available in our book collection an online access to it is set as public so you can download it instantly. Our digital library spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the two dimensional signal and image processing is universally compatible with any devices to read.

282 citations

Book Chapter•10.1007/978-3-319-46466-4_41•
Human Attribute Recognition by Deep Hierarchical Contexts

[...]

Yining Li1, Chen Huang1, Chen Change Loy1, Xiaoou Tang1•
The Chinese University of Hong Kong1
8 Oct 2016
TL;DR: This work trains a Convolutional Neural Network to select the most attribute-descriptive human parts from all poselet detections, and combines them with the whole body as a pose-normalized deep representation, which surpasses competitive baselines on this dataset and other popular ones.
Abstract: We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a pose-normalized deep representation. We further improve by using deep hierarchical contexts ranging from human-centric level to scene level. Human-centric context captures human relations, which we compute from the nearest neighbor parts of other people on a pyramid of CNN feature maps. The matched parts are then average pooled and they act as a similarity regularization. To utilize the scene context, we re-score human-centric predictions by the global scene classification score jointly learned in our CNN, yielding final scene-aware predictions. To facilitate our study, a large-scale WIDER Attribute dataset(Dataset URL: http://mmlab.ie.cuhk.edu.hk/projects/WIDERAttribute) is introduced with human attribute and image event annotations, and our method surpasses competitive baselines on this dataset and other popular ones.

221 citations

Journal Article•10.1016/J.NEUCOM.2016.02.047•
Union Laplacian pyramid with multiple features for medical image fusion

[...]

Jiao Du1, Weisheng Li1, Bin Xiao1, Qamar Nawaz1•
Chongqing University of Posts and Telecommunications1
19 Jun 2016-Neurocomputing
TL;DR: Visual and statistical analyses show that the quality of fused image can be significantly improved over that of typical image quality assessment metrics in terms of structural similarity, peak-signal-to-noise ratio, standard deviation, and tone mapped image quality index metrics.

214 citations

Posted Content•
Exploring Context with Deep Structured models for Semantic Segmentation

[...]

Guosheng Lin1, Chunhua Shen2, Anton van den Hengel2, Ian Reid2•
Nanyang Technological University1, University of Adelaide2
10 Mar 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: This work formulate deep structured models by combining CNNs and Conditional Random Fields for learning the patch-patch context between image regions, and formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.
Abstract: State-of-the-art semantic image segmentation methods are mostly based on training deep convolutional neural networks (CNNs). In this work, we proffer to improve semantic segmentation with the use of contextual information. In particular, we explore `patch-patch' context and `patch-background' context in deep CNNs. We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions. Specifically, we formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied in order to avoid repeated expensive CRF inference during the course of back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image inputs and sliding pyramid pooling is very effective for improving performance. We perform comprehensive evaluation of the proposed method. We achieve new state-of-the-art performance on a number of challenging semantic segmentation datasets including $NYUDv2$, $PASCAL$-$VOC2012$, $Cityscapes$, $PASCAL$-$Context$, $SUN$-$RGBD$, $SIFT$-$flow$, and $KITTI$ datasets. Particularly, we report an intersection-over-union score of $77.8$ on the $PASCAL$-$VOC2012$ dataset.

94 citations

Journal Article•10.1109/TIFS.2016.2535899•
Fingerprint Liveness Detection From Single Image Using Low-Level Features and Shape Analysis

[...]

Rohit K. Dubey1, Jonathan Goh1, Vrizlynn L. L. Thing1•
Agency for Science, Technology and Research1
29 Feb 2016-IEEE Transactions on Information Forensics and Security
TL;DR: This paper proposes to combine low-level gradient features from speeded-up robust features, pyramid extension of the histograms of oriented gradient and texture features from Gabor wavelet using dynamic score level integration and extract these features from a single fingerprint image to overcome the issues faced in dynamic software approaches, which require user cooperation and longer computational time.
Abstract: Fingerprint-based authentication systems have developed rapidly in the recent years. However, current fingerprint-based biometric systems are vulnerable to spoofing attacks. Moreover, single feature-based static approach does not perform equally over different fingerprint sensors and spoofing materials. In this paper, we propose a static software approach. We propose to combine low-level gradient features from speeded-up robust features, pyramid extension of the histograms of oriented gradient and texture features from Gabor wavelet using dynamic score level integration. We extract these features from a single fingerprint image to overcome the issues faced in dynamic software approaches, which require user cooperation and longer computational time. A experimental analysis done on LivDet 2011 data produced an average equal error rate (EER) of 3.95% over four databases. The result outperforms the existing best average EER of 9.625%. We also performed experiments with LivDet 2013 database and achieved an average classification error rate of 2.27% in comparison with 12.87% obtained by the LivDet 2013 competition winner.

88 citations

Proceedings Article•
Learning cross-domain neural networks for sketch-based 3D shape retrieval

[...]

Fan Zhu1, Jin Xie1, Yi Fang1•
New York University Abu Dhabi1
12 Feb 2016
TL;DR: Experimental results suggest that both CDNN and PCDNN can outperform state-of-the-art performance, where PCdNN can further improve CDNN when employing a hierarchical structure.
Abstract: Sketch-based 3D shape retrieval, which returns a set of relevant 3D shapes based on users' input sketch queries, has been receiving increasing attentions in both graphics community and vision community. In this work, we address the sketch-based 3D shape retrieval problem with a novel Cross-Domain Neural Networks (CDNN) approach, which is further extended to Pyramid Cross-Domain Neural Networks (PCDNN) by cooperating with a hierarchical structure. In order to alleviate the discrepancies between sketch features and 3D shape features, a neural network pair that forces identical representations at the target layer for instances of the same class is trained for sketches and 3D shapes respectively. By constructing cross-domain neural networks at multiple pyramid levels, a many-to-one relationship is established between a 3D shape feature and sketch features extracted from different scales. We evaluate the effectiveness of both CDNN and PCDNN approach on the extended large-scale SHREC 2014 benchmark and compare with some other well established methods. Experimental results suggest that both CDNN and PCDNN can outperform state-of-the-art performance, where PCDNN can further improve CDNN when employing a hierarchical structure.

77 citations

Journal Article•10.1109/TIP.2015.2502147•
Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization

[...]

Luming Zhang1, Yang Yang2, Meng Wang1, Richang Hong1, Liqiang Nie3, Xuelong Li4 •
Hefei University of Technology1, University of Electronic Science and Technology of China2, National University of Singapore3, Chinese Academy of Sciences4
01 Feb 2016-IEEE Transactions on Image Processing
TL;DR: A dense graph mining algorithm is developed to discover graphlets representative to each super-/sub-category, and the discovered graphlets from each sub-category accurately capture those tiny discriminative object components, e.g., bird claws, heads, and bodies.
Abstract: Fine-grained image categorization is a challenging task aiming at distinguishing objects belonging to the same basic-level category, e.g., leaf or mushroom. It is a useful technique that can be applied for species recognition, face verification, and so on. Most of the existing methods either have difficulties to detect discriminative object components automatically, or suffer from the limited amount of training data in each sub-category. To solve these problems, this paper proposes a new fine-grained image categorization model. The key is a dense graph mining algorithm that hierarchically localizes discriminative object parts in each image. More specifically, to mimic the human hierarchical perception mechanism, a superpixel pyramid is generated for each image. Thereby, graphlets from each layer are constructed to seamlessly capture object components. Intuitively, graphlets representative to each super-/sub-category is densely distributed in their feature space. Thus, a dense graph mining algorithm is developed to discover graphlets representative to each super-/sub-category. Finally, the discovered graphlets from pairwise images are integrated into an image kernel for fine-grained recognition. Theoretically, the learned kernel can generalize several state-of-the-art image kernels. Experiments on nine image sets demonstrate the advantage of our method. Moreover, the discovered graphlets from each sub-category accurately capture those tiny discriminative object components, e.g., bird claws, heads, and bodies.
Proceedings Article•10.1109/ICIP.2016.7532434•
Multi-scale blocks based image emotion classification using multiple instance learning

[...]

Tianrong Rao1, Min Xu1, Huiying Liu2, Jinqiao Wang3, Ian Burnett1 •
University of Technology, Sydney1, Institute for Infocomm Research Singapore2, Chinese Academy of Sciences3
3 Aug 2016
TL;DR: This work proposes an emotion classification method based on multi-scale blocks using Multiple Instance Learning (MIL), which reduces the need for exact labelling and is employed to classify the dominant emotion type of the image.
Abstract: Emotional factors usually affect users' preferences for and evaluations of images. Although affective image analysis attracts increasing attention, there are still three major challenges remaining: 1) it is difficult to classify an image into a single emotion type since different regions within an image can represent different emotions; 2) there is a gap between low-level features and high-level emotions and 3) it is difficult to collect a training set of reliable emotional image content. To address these three issues, we propose an emotion classification method based on multi-scale blocks using Multiple Instance Learning (MIL). We firstly extract blocks of an image at multiple scales using different image segmentation methods pyramid segmentation and simple linear iterative clustering (SLIC) and represent each block using the bag-of-visual-words (BoVW) method. Then, to bridge the “affective gap”, probabilistic latent semantic analysis (pLSA) is employed to estimate the latent topic distribution as a mid-level representation of each block. Finally, MIL, which reduces the need for exact labelling, is employed to classify the dominant emotion type of the image. Experiments carried out on three widely used datasets demonstrate that our proposed method with S-LIC effectively improves the state-of-the-art results of image emotion classification 5.1% on average.
Proceedings Article•10.1109/CVPR.2016.298•
Laplacian Patch-Based Image Synthesis

[...]

Jooho Lee1, Inchang Choi1, Min H. Kim1•
KAIST1
27 Jun 2016
TL;DR: The Laplacian pyramid has the advantage of being isotropic in detecting changes to provide more consistent performance in decomposing the base structure and the detailed localization and does not require heavy computation as it employs approximation by the differences of Gaussians.
Abstract: Patch-based image synthesis has been enriched with global optimization on the image pyramid. Successively, the gradient-based synthesis has improved structural coherence and details. However, the gradient operator is directional and inconsistent and requires computing multiple operators. It also introduces a significantly heavy computational burden to solve the Poisson equation that often accompanies artifacts in non-integrable gradient fields. In this paper, we propose a patch-based synthesis using a Laplacian pyramid to improve searching correspondence with enhanced awareness of edge structures. Contrary to the gradient operators, the Laplacian pyramid has the advantage of being isotropic in detecting changes to provide more consistent performance in decomposing the base structure and the detailed localization. Furthermore, it does not require heavy computation as it employs approximation by the differences of Gaussians. We examine the potentials of the Laplacian pyramid for enhanced edge-aware correspondence search. We demonstrate the effectiveness of the Laplacian-based approach over the state-of-the-art patchbased image synthesis methods.
Journal Article•10.1016/J.NEUCOM.2015.12.042•
Speed up deep neural network based pedestrian detection by sharing features across multi-scale models

[...]

Xiaoheng Jiang1, Yanwei Pang1, Xuelong Li2, Jing Pan3•
Tianjin University1, Chinese Academy of Sciences2, Tianjin University of Technology and Education3
12 Apr 2016-Neurocomputing
TL;DR: This paper proposes to share features across a group of DNNs that correspond to pedestrian models of different sizes that can detect pedestrians of several different scales on one single layer of an image pyramid to improve detection efficiency.
Proceedings Article•10.1117/12.2233963•
PASSATA - Object oriented numerical simulation software for adaptive optics

[...]

Guido Agapito, Alfio Puglisi, Simone Esposito
26 Jul 2016-arXiv: Instrumentation and Methods for Astrophysics
TL;DR: The last version of the PyrAmid Simulator Software for Adaptive opTics Arcetri (PASSATA), an IDL and CUDA based object oriented software developed in the Adaptive Optics group of theArcetri observatory for Monte-Carlo end-to-end adaptive optics simulations is presented.
Abstract: We present the last version of the PyrAmid Simulator Software for Adaptive opTics Arcetri (PASSATA), an IDL and CUDA based object oriented software developed in the Adaptive Optics group of the Arcetri observatory for Monte-Carlo end-to-end adaptive optics simulations. The original aim of this software was to evaluate the performance of a single conjugate adaptive optics system for ground based telescope with a pyramid wavefront sensor. After some years of development, the current version of PASSATA is able to simulate several adaptive optics systems: single conjugate, multi conjugate and ground layer, with Shack Hartmann and Pyramid wavefront sensors. It can simulate from 8m to 40m class telescopes, with diffraction limited and resolved sources at finite or infinite distance from the pupil. The main advantages of this software are the versatility given by the object oriented approach and the speed given by the CUDA implementation of the most computational demanding routines. We describe the software with its last developments and present some examples of application.
Book Chapter•10.1007/978-3-319-46448-0_24•
Fast 6D Pose Estimation from a Monocular Image Using Hierarchical Pose Trees

[...]

Yoshinori Konishi1, Yuki Hanzawa1, Masato Kawade1, Manabu Hashimoto2•
Omron1, Chukyo University2
8 Oct 2016
TL;DR: In this paper, the authors proposed a perspectively cumulated orientation feature (PCOF) based on the orientation histograms extracted from randomly generated 2D projection images using 3D CAD data, and the template using PCOF explicitly handle a certain range of 3D object pose.
Abstract: It has been shown that the template based approaches could quickly estimate 6D pose of texture-less objects from a monocular image. However, they tend to be slow when the number of templates amounts to tens of thousands for handling a wider range of 3D object pose. To alleviate this problem, we propose a novel image feature and a tree-structured model. Our proposed perspectively cumulated orientation feature (PCOF) is based on the orientation histograms extracted from randomly generated 2D projection images using 3D CAD data, and the template using PCOF explicitly handle a certain range of 3D object pose. The hierarchical pose trees (HPT) is built by clustering 3D object pose and reducing the resolutions of templates, and HPT accelerates 6D pose estimation based on a coarse-to-fine strategy with an image pyramid. In the experimental evaluation on our texture-less object dataset, the combination of PCOF and HPT showed higher accuracy and faster speed in comparison with state-of-the-art techniques.
Posted Content•
Multigrid Neural Architectures

[...]

Tsung-Wei Ke1, Michael Maire2, Stella X. Yu1•
University of California, Berkeley1, Simon Fraser University2
23 Nov 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: Multigrid as mentioned in this paper proposes a multigrid extension of convolutional neural networks (CNNs) to operate across scale space, on a pyramid of grids. But it does not address the problem of spatial transformation.
Abstract: We propose a multigrid extension of convolutional neural networks (CNNs). Rather than manipulating representations living on a single spatial grid, our network layers operate across scale space, on a pyramid of grids. They consume multigrid inputs and produce multigrid outputs; convolutional filters themselves have both within-scale and cross-scale extent. This aspect is distinct from simple multiscale designs, which only process the input at different scales. Viewed in terms of information flow, a multigrid network passes messages across a spatial pyramid. As a consequence, receptive field size grows exponentially with depth, facilitating rapid integration of context. Most critically, multigrid structure enables networks to learn internal attention and dynamic routing mechanisms, and use them to accomplish tasks on which modern CNNs fail. Experiments demonstrate wide-ranging performance advantages of multigrid. On CIFAR and ImageNet classification tasks, flipping from a single grid to multigrid within the standard CNN paradigm improves accuracy, while being compute and parameter efficient. Multigrid is independent of other architectural choices; we show synergy in combination with residual connections. Multigrid yields dramatic improvement on a synthetic semantic segmentation dataset. Most strikingly, relatively shallow multigrid networks can learn to directly perform spatial transformation tasks, where, in contrast, current CNNs fail. Together, our results suggest that continuous evolution of features on a multigrid pyramid is a more powerful alternative to existing CNN designs on a flat grid.
Proceedings Article•10.1109/ISBI.2016.7493400•
X-ray image classification using domain transferred convolutional neural networks and local sparse spatial pyramid

[...]

Euijoon Ahn1, Ashnil Kumar1, Jinman Kim1, Changyang Li1, Dagan Feng1, Michael J. Fulham1 •
University of Sydney1
13 Apr 2016
TL;DR: A late-fusion of domain transferred convolutional neural networks (DT-CNNs) with sparse spatial pyramid (SSP) features derived from a local image dictionary is proposed, which is robust as it exploits the rich generic information provided by the DT- CNNs and uses the specific local features and characteristics inherent in the X-ray images.
Abstract: The classification of medical images is a critical step for imaging-based clinical decision support systems. Existing classification methods for X-ray images, however, generally represent the image using only local texture or generic image features (e.g. color or shape) derived from predefined feature spaces. This limits the ability to quantify the image characteristics using general data-derived features learned from image datasets. In this study we present a new algorithm to improve the performance of X-ray image classification, where we propose a late-fusion of domain transferred convolutional neural networks (DT-CNNs) with sparse spatial pyramid (SSP) features derived from a local image dictionary. Our method is robust as it exploits the rich generic information provided by the DT-CNNs and uses the specific local features and characteristics inherent in the X-ray images. Our method was evaluated on a public dataset of X-ray images and was compared to several state-of-the-art approaches. Experimental results show that our method was the most accurate for classification.
Posted Content•
Adaptive Deep Pyramid Matching for Remote Sensing Scene Classification.

[...]

Qingshan Liu, Renlong Hang, Huihui Song, Fuping Zhu, Javier Plaza, Antonio Plaza 
11 Nov 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: A new adaptive deep pyramid matching (ADPM) model is proposed that takes advantage of the features from all of the convolutional layers for remote sensing image classification, and significantly improves the performance when compared to other state-of-the-art methods.
Abstract: Convolutional neural networks (CNNs) have attracted increasing attention in the remote sensing community. Most CNNs only take the last fully-connected layers as features for the classification of remotely sensed images, discarding the other convolutional layer features which may also be helpful for classification purposes. In this paper, we propose a new adaptive deep pyramid matching (ADPM) model that takes advantage of the features from all of the convolutional layers for remote sensing image classification. To this end, the optimal fusing weights for different convolutional layers are learned from the data itself. In remotely sensed scenes, the objects of interest exhibit different scales in distinct scenes, and even a single scene may contain objects with different sizes. To address this issue, we select the CNN with spatial pyramid pooling (SPP-net) as the basic deep network, and further construct a multi-scale ADPM model to learn complementary information from multi-scale images. Our experiments have been conducted using two widely used remote sensing image databases, and the results show that the proposed method significantly improves the performance when compared to other state-of-the-art methods.
Journal Article•10.1145/2886775•
Semantic Photo Retargeting Under Noisy Image Labels

[...]

Luming Zhang1, Xuelong Li2, Liqiang Nie3, Yan Yan4, Roger Zimmermann3 •
Hefei University of Technology1, Chinese Academy of Sciences2, National University of Singapore3, University of Trento4
20 May 2016-ACM Transactions on Multimedia Computing, Communications, and Applications
TL;DR: A new semantically aware photo retargeting that shrinks a photo according to region semantics and a probabilistic model is proposed to enforce the spatial layout of a retargeted photo to be maximally similar to those from the training photos.
Abstract: With the popularity of mobile devices, photo retargeting has become a useful technique that adapts a high-resolution photo onto a low-resolution screen Conventional approaches are limited in two aspects The first factor is the de-emphasized role of semantic content that is many times more important than low-level features in photo aesthetics Second is the importance of image spatial modeling: toward a semantically reasonable retargeted photo, the spatial distribution of objects within an image should be accurately learned To solve these two problems, we propose a new semantically aware photo retargeting that shrinks a photo according to region semantics The key technique is a mechanism transferring semantics of noisy image labels (inaccurate labels predicted by a learner like an SVM) into different image regions In particular, we first project the local aesthetic features (graphlets in this work) onto a semantic space, wherein image labels are selectively encoded according to their noise level Then, a category-sharing model is proposed to robustly discover the semantics of each image region The model is motivated by the observation that the semantic distribution of graphlets from images tagged by a common label remains stable in the presence of noisy labels Thereafter, a spatial pyramid is constructed to hierarchically encode the spatial layout of graphlet semantics Based on this, a probabilistic model is proposed to enforce the spatial layout of a retargeted photo to be maximally similar to those from the training photos Experimental results show that (1) noisy image labels predicted by different learners can improve the retargeting performance, according to both qualitative and quantitative analysis, and (2) the category-sharing model stays stable even when 3236p of image labels are incorrectly predicted
Proceedings Article•10.1109/AIPR.2016.8010595•
Registering large volume serial-section electron microscopy image sets for neural circuit reconstruction using FFT signal whitening

[...]

Arthur W. Wetzel1, Jennifer Bakal1, Markus Dittrich1, David G. C. Hildebrand2, Josh Morgan2, Jeff W. Lichtman2 •
Pittsburgh Supercomputing Center1, Harvard University2
1 Oct 2016
TL;DR: In this article, a Signal Whitening Fourier Transform Image Registration (SWiFT-IR) approach is proposed to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University.
Abstract: The detailed reconstruction of neural anatomy for connectomics studies requires a combination of resolution and large three-dimensional data capture provided by serial section electron microscopy (ssEM). The convergence of high throughput ssEM imaging and improved tissue preparation methods now allows ssEM capture of complete specimen volumes up to cubic millimeter scale. The resulting multi-terabyte image sets span thousands of serial sections and must be precisely registered into coherent volumetric forms in which neural circuits can be traced and segmented. This paper introduces a Signal Whitening Fourier Transform Image Registration approach (SWiFT-IR) under development at the Pittsburgh Supercomputing Center and its use to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University. Unlike other methods now used for ssEM registration, SWiFT-IR modifies its spatial frequency response during image matching to maximize a signal-to-noise measure used as its primary indicator of alignment quality. This alignment signal is more robust to rapid variations in biological content and unavoidable data distortions than either phase-only or standard Pearson correlation, thus allowing more precise alignment and statistical confidence. These improvements in turn enable an iterative registration procedure based on projections through multiple sections rather than more typical adjacent-pair matching methods. This projection approach, when coupled with known anatomical constraints and iteratively applied in a multi-resolution pyramid fashion, drives the alignment into a smooth form that properly represents complex and widely varying anatomical content such as the full crosssection zebrafish data.
Journal Article•10.1016/J.PATREC.2016.03.024•
A multi-process system for HEp-2 cells classification based on SVM

[...]

Donato Cascio1, Vincenzo Taormina1, Marco Cipolla1, Salvatore Bruno1, Francesco Fauci, Giuseppe Raso1 •
University of Palermo1
15 Oct 2016-Pattern Recognition Letters
TL;DR: A system able to classify pre-segmented immunofluorescence images of HEp-2 cells into six classes based on the one-against-one (OAO) scheme is described.
Proceedings Article•10.1109/AIPR.2016.8010595•
Registering large volume serial-section electron microscopy image sets for neural circuit reconstruction using FFT signal whitening

[...]

Arthur W. Wetzel1, Jennifer Bakal1, Markus Dittrich1, David G. C. Hildebrand2, Josh Morgan2, Jeff W. Lichtman2 •
Pittsburgh Supercomputing Center1, Harvard University2
14 Dec 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: A Signal Whitening Fourier Transform Image Registration approach (SWiFT-IR) under development at the Pittsburgh Supercomputing Center and its use to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University are introduced.
Abstract: The detailed reconstruction of neural anatomy for connectomics studies requires a combination of resolution and large three-dimensional data capture provided by serial section electron microscopy (ssEM). The convergence of high throughput ssEM imaging and improved tissue preparation methods now allows ssEM capture of complete specimen volumes up to cubic millimeter scale. The resulting multi-terabyte image sets span thousands of serial sections and must be precisely registered into coherent volumetric forms in which neural circuits can be traced and segmented. This paper introduces a Signal Whitening Fourier Transform Image Registration approach (SWiFT-IR) under development at the Pittsburgh Supercomputing Center and its use to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University. Unlike other methods now used for ssEM registration, SWiFT-IR modifies its spatial frequency response during image matching to maximize a signal-to-noise measure used as its primary indicator of alignment quality. This alignment signal is more robust to rapid variations in biological content and unavoidable data distortions than either phase-only or standard Pearson correlation, thus allowing more precise alignment and statistical confidence. These improvements in turn enable an iterative registration procedure based on projections through multiple sections rather than more typical adjacent-pair matching methods. This projection approach, when coupled with known anatomical constraints and iteratively applied in a multi-resolution pyramid fashion, drives the alignment into a smooth form that properly represents complex and widely varying anatomical content such as the full cross-section zebrafish data.
Journal Article•10.1109/TCSVT.2015.2418585•
Blur-Kernel Bound Estimation From Pyramid Statistics

[...]

Shaoguo Liu1, Haibo Wang2, Jue Wang3, Chunhong Pan1•
Chinese Academy of Sciences1, Shandong University2, Adobe Systems3
01 May 2016-IEEE Transactions on Circuits and Systems for Video Technology
TL;DR: Experimental results show that the proposed method can estimate accurate blur kernel sizes, enabling existing blind deconvolution methods to achieve best possible results.
Abstract: This letter presents an approach for automatically estimating the spatial bound of the blur kernel in a motion-blurred image based on the statistics of multilevel image gradients. We observe that blur has a significant impact on the histogram of oriented gradients (HOGs) at higher levels of an image pyramid, but has much less of an impact at coarser levels. Based on this fact, we estimate the spatial bound of the unknown blur kernel using a learning-based approach. We first learn a generic pyramid HOG model from natural sharp images, then given an HOG pyramid of a blurry image, we predict the corresponding model of its latent sharp image. Finally, we learn another model to predict the spatial kernel bound from the difference between the observed and the predicted HOG pyramids. Experimental results show that the proposed method can estimate accurate blur kernel sizes, enabling existing blind deconvolution methods to achieve best possible results.
Journal Article•10.1109/TIP.2016.2590825•
Edge-Guided Image Gap Interpolation Using Multi-Scale Transformation

[...]

Bahareh Langari1, Saeed Vaseghi2, Ales Prochazka2, Babak Vaziri, Farzad Tahmasebi Aria3 •
Brunel University London1, Institute of Chemical Technology in Prague2, Middlesex University3
13 Jul 2016-IEEE Transactions on Image Processing
TL;DR: Improvements in image gap restoration through the incorporation of edge-based directional interpolation within multi-scale pyramid transforms are presented, demonstrating that the proposed method improves peak-signal-to-noise-ratio by 1-5 dB compared with a range of best published works.
Abstract: This paper presents improvements in image gap restoration through the incorporation of edge-based directional interpolation within multi-scale pyramid transforms. Two types of image edges are reconstructed: 1) the local edges or textures, inferred from the gradients of the neighboring pixels and 2) the global edges between image objects or segments, inferred using a Canny detector. Through a process of pyramid transformation and downsampling, the image is progressively transformed into a series of reduced size layers until at the pyramid apex the gap size is one sample. At each layer, an edge skeleton image is extracted for edge-guided interpolation. The process is then reversed; from the apex, at each layer, the missing samples are estimated (an iterative method is used in the last stage of upsampling), up-sampled, and combined with the available samples of the next layer. Discrete cosine transform and a family of discrete wavelet transforms are utilized as alternatives for pyramid construction. Evaluations over a range of images, in regular and random loss pattern, at loss rates of up to 40%, demonstrate that the proposed method improves peak-signal-to-noise-ratio by 1–5 dB compared with a range of best published works.
Journal Article•10.1007/S11760-016-0876-7•
Image de-fencing framework with hybrid inpainting algorithm

[...]

Muhammad Shahid Farid1, Muhammad Shahid Farid2, Arif Mahmood3, Marco Grangetto1•
University of Turin1, University of the Punjab2, Qatar University3
02 Mar 2016-Signal, Image and Video Processing
TL;DR: A novel image de-fencing algorithm that effectively detects and removes fences with minimal user input is presented and is able to remove both regular and irregular fences.
Abstract: Detection and removal of fences from digital images become essential when an important part of the scene turns to be occluded by such unwanted structures. Image de-fencing is challenging because manually marking fence boundaries is tedious and time-consuming. In this paper, a novel image de-fencing algorithm that effectively detects and removes fences with minimal user input is presented. The user is only requested to mark few fence pixels; then, color models are estimated and used to train Bayes classifier to segment the fence and the background. Finally, the fence mask is refined exploiting connected component analysis and morphological operators. To restore the occluded region, a hybrid inpainting algorithm is proposed that integrates exemplar-based technique with a pyramid-based interpolation approach. In contrast to previous solutions which work only for regular pattern fences, the proposed technique is able to remove both regular and irregular fences. A large number of experiments are carried out on a wide variety of images containing different types of fences demonstrating the effectiveness of the proposed approach. The proposed approach is also compared with state-of-the-art image de-fencing and inpainting techniques and showed convincing results.
Book Chapter•10.1007/978-3-319-46484-8_41•
Deep Self-correlation Descriptor for Dense Cross-Modal Correspondence

[...]

Seungryong Kim1, Dongbo Min2, Stephen Lin3, Kwanghoon Sohn1•
Yonsei University1, Chungnam National University2, Microsoft3
8 Oct 2016
TL;DR: A novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions, which can be robust to cross-modal imaging and densely computed in an efficient manner that significantly reduces computational redundancy.
Abstract: We present a novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. Motivated by local self-similarity (LSS), we formulate a novel descriptor by leveraging LSS in a deep architecture, leading to better discriminative power and greater robustness to non-rigid image deformations than state-of-the-art descriptors. The DSC first computes self-correlation surfaces over a local support window for randomly sampled patches, and then builds hierarchical self-correlation surfaces by performing an average pooling within a deep architecture. Finally, the feature responses on the self-correlation surfaces are encoded through a spatial pyramid pooling in a circular configuration. In contrast to convolutional neural networks (CNNs) based descriptors, the DSC is training-free, is robust to cross-modal imaging, and can be densely computed in an efficient manner that significantly reduces computational redundancy. The state-of-the-art performance of DSC on challenging cases of cross-modal image pairs is demonstrated through extensive experiments.
Proceedings Article•10.1109/COASE.2016.7743558•
Automated identification of components in raster piping and instrumentation diagram with minimal pre-processing

[...]

Wei Chian Tan1, I-Ming Chen1, Hoon Kiang Tan2•
Nanyang Technological University1, Lloyd's Register2
1 Aug 2016
TL;DR: A novel framework for automated recognition of components in a Piping and Instrumentation Diagram (P&ID) of raster form using Local Binary Pattern (LBP) as descriptor and concept of Spatial Pyramid Matching (SPM).
Abstract: This paper proposes a novel framework for automated recognition of components in a Piping and Instrumentation Diagram (P&ID) of raster form. Contour is used as the main clue for visual recognition through the use of Local Binary Pattern (LBP) as descriptor and concept of Spatial Pyramid Matching (SPM). Comparison of two image patches is done by calculating the l1 distance between two corresponding LBP based descriptors. Firstly, the framework requires at least one example image per type of component to be recognised, the corresponding LBP and SPM based descriptor is determined and stored. Linear sliding window approach is used to detect a small set of top candidates from a pool of all sub-images in original image. Verification against the entire library of symbols is performed on each candidate selected from previous stage, using concept of nearest neighbour based classification. The method has demonstrated state of the art performance in a new challenging dataset created with advices from a group of experienced engineers in marine and offshore industry.
Proceedings Article•10.1109/DCC.2016.23•
Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing

[...]

Jie Lin1, Olivier Morère2, Julie Petta, Vijay Chandrasekhar1, Antoine Veillard3 •
Institute for Infocomm Research Singapore1, Pierre-and-Marie-Curie University2, University of Paris3
1 Mar 2016
TL;DR: Unsupervised Triplet Hashing (UTH) as mentioned in this paper is a fully unsupervised method to compute extremely compact binary hashes from high-dimensional global descriptors, which consists of two successive deep learning steps.
Abstract: A typical image retrieval pipeline starts with the comparison of global descriptors from a large database to find a short list of candidate matches. A good image descriptor is key to the retrieval pipeline and should reconcile two contradictory requirements: providing recall rates as high as possible and being as compact as possible for fast matching. Following the recent successes of Deep Convolutional Neural Networks (DCNN) for large scale image classification, descriptors extracted from DCNNs are increasingly used in place of the traditional hand crafted descriptors such as Fisher Vectors (FV) with better retrieval performances. Nevertheless, the dimensionality of a typical DCNN descriptor–extracted either from the visual feature pyramid or the fully-connected layers–remains quite high at several thousands of scalar values. In this paper, we propose Unsupervised Triplet Hashing (UTH), a fully unsupervised method to compute extremely compact binary hashes–in the 32-256 bits range–from high-dimensional global descriptors. UTH consists of two successive deep learning steps. First, Stacked Restricted Boltzmann Machines (SRBM), a type of unsupervised deep neural nets, are used to learn binary embedding functions able to bring the descriptor size down to the desired bitrate. SRBMs are typically able to ensure a very high compression rate at the expense of loosing some desirable metric properties of the original DCNN descriptor space. Then, triplet networks, a rank learning scheme based on weight sharing nets is used to fine-tune the binary embedding functions to retain as much as possible of the useful metric properties of the original space. A thorough empirical evaluation conducted on multiple publicly available dataset using DCNN descriptors shows that our method is able to significantly outperform state-of-the-art unsupervised schemes in the target bit range.
Journal Article•10.1016/J.NEUCOM.2015.09.049•
Video-based facial expression recognition using learned spatiotemporal pyramid sparse coding features

[...]

Fei Long1, Marian Stewart Bartlett2•
Xiamen University1, University of California, San Diego2
15 Jan 2016-Neurocomputing
TL;DR: Experimental results on widely used Cohn-Kanade database show that the classification performance can be improved effectively by considering spatiotemporal layout of facial expressions, and the method outperforms popular methods using hand-designed features.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve