Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Pyramid (image processing)
  4. 2012
  1. Home
  2. Topics
  3. Pyramid (image processing)
  4. 2012
Showing papers on "Pyramid (image processing) published in 2012"
Journal Article•10.1007/S00348-012-1345-X•
Multi-frame pyramid correlation for time-resolved PIV

[...]

Andrea Sciacchitano1, Fulvio Scarano1, Bernhard Wieneke•
Delft University of Technology1
15 Jul 2012-Experiments in Fluids
TL;DR: In this article, a pyramid correlation algorithm was proposed to increase the precision and robustness of time-resolved particle image velocimetry (TR-PIV) measurements.
Abstract: A novel technique is introduced to increase the precision and robustness of time-resolved particle image velocimetry (TR-PIV) measurements. The innovative element of the technique is the linear combination of the correlation signal computed at different separation time intervals. The domain of the correlation signal resulting from different temporal separations is matched via homothetic transformation prior to the averaging of the correlation maps. The resulting ensemble-averaged correlation function features a significantly higher signal-to-noise ratio and a more precise velocity estimation due to the evaluation of a larger particle image displacement. The method relies on a local optimization of the observation time between snapshots taking into account the local out-ofplane motion, continuum deformation due to in-plane velocity gradient and acceleration errors. The performance of the pyramid correlation algorithm is assessed on a synthetically generated image sequence reproducing a three-dimensional Batchelor vortex; experiments conducted in air and water flows are used to assess the performance on time-resolved PIV image sequences. The numerical assessment demonstrates the effectiveness of the pyramid correlation technique in reducing both random and bias errors by a factor 3 and one order of magnitude, respectively. The experimental assessment yields a significant increase of signal strength indicating enhanced measurement robustness. Moreover, the amplitude of noisy fluctuations is considerably attenuated and higher precision is obtained for the evaluation of time-resolved velocity and acceleration.

147 citations

Journal Article•10.1109/TCSVT.2011.2177182•
Incremental Learning in Human Action Recognition Based on Snippets

[...]

Rashid Minhas, Abdul Adeel Mohammed1, Q. M. Jonathan Wu2•
University of Waterloo1, University of Windsor2
01 Nov 2012-IEEE Transactions on Circuits and Systems for Video Technology
TL;DR: This paper presents a systematic framework for recognizing human actions without relying on impractical assumptions, such as processing of an entire video or requiring a large look-ahead of frames to label an incoming video.
Abstract: In this paper, we present a systematic framework for recognizing human actions without relying on impractical assumptions, such as processing of an entire video or requiring a large look-ahead of frames to label an incoming video. As a secondary goal, we examine incremental learning as an overlooked obstruction to the implementation of reliable real-time recognition. Assuming weak appearance constancy, the shape of an actor is approximated by adaptively changing intensity histograms to extract pyramid histograms of oriented gradient features. As action progresses, the shape update is carried out by adjustment of a few blocks within a tracking window to closely track evolving contours. The nonlinear dynamics of an action are learned using a recursive analytic approach, which transforms training into a simple linear representation. Such a learning strategy has two advantages: 1) minimized error rates, and significant savings in computational time; and 2) elimination of the widely accepted limitations of batch-mode training for action recognition. The effectiveness of our proposed framework is corroborated by experimental validation against the state of the art.

94 citations

Book•10.1002/9781118568767•
Signal and Image Multiresolution Analysis: Ouahabi/Signal and Image Multiresolution Analysis

[...]

Abdeldjalil Ouahabi
1 Oct 2012

89 citations

Journal Article•10.1016/J.PATCOG.2011.09.020•
Discriminative compact pyramids for object and scene recognition

[...]

Noha M. Elfiky1, Fahad Shahbaz Khan1, Joost van de Weijer1, Jordi Gonzílez1•
Autonomous University of Barcelona1
01 Apr 2012-Pattern Recognition
TL;DR: Comparison to clustering based on agglomerative information bottleneck (AIB) shows that the method obtains superior results at significantly lower computational costs and the optimal combination of multiple features in the context of the compact pyramid representation is investigated.

68 citations

Journal Article•10.1109/TII.2011.2173943•
Design and Implementation of a Pipelined Datapath for High-Speed Face Detection Using FPGA

[...]

Seunghun Jin1, Dongkyun Kim1, Thuy Tuong Nguyen1, Daijin Kim2, Munsang Kim3, Jae Wook Jeon1 •
Sungkyunkwan University1, Pohang University of Science and Technology2, Korea Institute of Science and Technology3
01 Feb 2012-IEEE Transactions on Industrial Informatics
TL;DR: This paper presents design and implementation of a pipelined datapath for real-time face detection using cascades of boosted classifiers, and proposes following methods: symmetric image downscaling, classifier sharing, and cascade merging, to achieve the desired processing speed and area efficiency.
Abstract: This paper presents design and implementation of a pipelined datapath for real-time face detection using cascades of boosted classifiers. We propose following methods: symmetric image downscaling, classifier sharing, and cascade merging, to achieve the desired processing speed and area efficiency. First, an image pyramid with 16 levels is generated from the input image to simultaneously detect faces with different scales. The downscaled images are then transferred to the first stage of the cascade that is shared between the corresponding image pairs based on the pixel validity of the symmetric image pyramid. The last method exploits the different hit ratios of the cascade stages. We use a tree-structured cascade of classifiers since most of the nonface elements are eliminated during the early stages of the classifier. The use of a synthesis tool confirms that the proposed design reduces resource utilization by one-eighth without accuracy loss, compared to the fully parallelized implementation of the same algorithm. We implemented the proposed hardware architecture on a Xilinx Virtex-5 LX330 FPGA. The indicative throughput is 307 frames/s irrespective of the number of faces in the scene for standard VGA (640 × 480) images with an operating frequency of 125.59 MHz. We may ensure that face detection results are generated at each clock cycle after the initial pipeline delay, using this fully pipelined datapath for tree-structured cascade classifiers.

64 citations

Proceedings Article•10.1109/CVPR.2012.6248087•
SURFing the point clouds: Selective 3D spatial pyramids for category-level object recognition

[...]

Carolina Redondo-Cabrera1, Roberto J. López-Sastre1, Javier Acevedo-Rodríguez1, Saturnino Maldonado-Bascón1•
University of Alcalá1
16 Jun 2012
TL;DR: A Bag-of-Words representation in 3D, which is used in conjunction with a SVM classification machinery, and the 3D Spatial Pyramid Matching Kernel, which works by partitioning a working volume into fine sub-volumes, and computing a hierarchical weighted sum of histogram intersections at each level of the pyramid structure.
Abstract: This paper proposes a novel approach to recognize object categories in point clouds. By quantizing 3D SURF local descriptors, computed on partial 3D shapes extracted from the point clouds, a vocabulary of 3D visual words is generated. Using this codebook, we build a Bag-of-Words representation in 3D, which is used in conjunction with a SVM classification machinery. We also introduce the 3D Spatial Pyramid Matching Kernel, which works by partitioning a working volume into fine sub-volumes, and computing a hierarchical weighted sum of histogram intersections at each level of the pyramid structure. With the aim of increasing both the classification accuracy and the computational efficiency of the kernel, we propose selective hierarchical volume decomposition strategies, based on representative and discriminative (sub-)volume selection processes, which drastically reduce the pyramid to consider. Results on the challenging large-scale RGB-D object dataset show that our kernels significantly outperform the state-of-the-art results by using a single 3D shape feature type extracted from individual depth images.

49 citations

Proceedings Article•10.5244/C.26.89•
Spatial orientations of visual word pairs to improve Bag-of-Visual-Words model

[...]

Rahat Khan, Cécile Barat, Damien Muselet, Christophe Ducottet
4 Sep 2012
TL;DR: This paper presents a novel approach to incorporate spatial information in the bag-of-visual-words model for category level and scene classification by taking advantage of the orientation of the segments formed by Pairs of Identical visual Words (PIW).
Abstract: This paper presents a novel approach to incorporate spatial information in the bag-of-visual-words model for category level and scene classification. In the traditional bag-of-visual-words model, feature vectors are histograms of visual words. This representation is appearance based and does not contain any information regarding the arrangement of the visual words in the 2D image space. In this framework, we present a simple and effi- cient way to infuse spatial information. Particularly, we are interested in explicit global relationships among the spatial positions of visual words. Therefore, we take advantage of the orientation of the segments formed by Pairs of Identical visual Words (PIW). An evenly distributed normalized histogram of angles of PIW is computed. Histograms pro- duced by each word type constitute a powerful description of intra type visual words relationships. Experiments on challenging datasets demonstrate that our method is com- petitive with the concurrent ones. We also show that, our method provides important complementary information to the spatial pyramid matching and can improve the overall performance.

48 citations

Proceedings Article•
Ulcer detection in wireless capsule endoscopy images

[...]

Lecheng Yu1, Pong C. Yuen2, Jianhuang Lai1•
Sun Yat-sen University1, Hong Kong Baptist University2
1 Nov 2012
TL;DR: Experimental results show that the proposed method outperforms single feature methods and existing methods for ulcer detection.
Abstract: The invention of wireless capsule endoscopy greatly helps physician to view small intestine images without causing much pain to patients It becomes very popular around the world for its usability and performance However, physician requires a long time (around 45 minutes) to examine a capsule endoscopy video generated from each examination In this paper, we propose a new image processing method using combination of local features for ulcer detection The proposed method is developed based on bag-of-words model and feature fusion technique Image patches are described by LBP and SIFT features The pyramid bag-of-words is employed to model and represent the images, and SVM classifiers are trained Finally feature fusion technique is employed to draw a final conclusion Experimental results show that the proposed method outperforms single feature methods and existing methods

44 citations

Patent•
Image processing method and apparatus using multi-layer representation

[...]

Aron Baik1•
Samsung1
18 Dec 2012
TL;DR: In this paper, a multi-layer representation for a reference view including information about a scene may be generated based on at least one input view, and the representation may be expanded to obtain information about occluded regions.
Abstract: A method and apparatus for processing an image using a multi-layer representation is provided. A multi-layer representation for a reference view including information about a scene may be generated based on at least one input view. The multi-layer representation may be expanded to obtain information about a portion occluded by the at least one common input view. Output views viewed at different viewpoints may be generated using the expanded multi-layer representation.

44 citations

Proceedings Article•10.1145/2393347.2396384•
Detection bank: an object detection based video representation for multimedia event recognition

[...]

Tim Althoff1, Hyun Oh Song1, Trevor Darrell1•
University of California, Berkeley1
29 Oct 2012
TL;DR: In this paper, the authors proposed an image representation called Detection Bank, which is based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections and extended to video by aggregating the key frame level image representations through mean and max pooling.
Abstract: While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition Recognition or retrieval of events and activities can be improved if specific discriminative objects are detected in a video sequence In this paper, we propose an image representation, called Detection Bank, based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections This representation is extended to video by aggregating the key frame level image representations through mean and max pooling We empirically show that it captures complementary information to state-of-the-art representations such as Spatial Pyramid Matching and Object Bank These descriptors combined with our Detection Bank representation significantly outperforms any of the representations alone on TRECVID MED 2011 data

35 citations

Book Chapter•10.1007/978-3-642-33863-2_30•
Spatio-temporal SIFT and its application to human action classification

[...]

Manal Al Ghamdi1, Lei Zhang2, Yoshihiko Gotoh1•
University of Sheffield1, Harbin Engineering University2
7 Oct 2012
TL;DR: A spatio-temporal difference-of-Gaussian (DoG) pyramid to detect the local extrema, aiming at processing video streams, and shows that the approach was able to produce results comparable to the state of thearts.
Abstract: This paper presents a space-time extension of scale-invariant feature transform (SIFT) originally applied to the 2-dimensional (2D) volumetric images. Most of the previous extensions dealt with 3-dimensional (3D) spacial information using a combination of a 2D detector and a 3D descriptor for applications such as medical image analysis. In this work we build a spatio-temporal difference-of-Gaussian (DoG) pyramid to detect the local extrema, aiming at processing video streams. Interest points are extracted not only from the spatial plane (xy) but also from the planes along the time axis (xt and yt). The space-time extension was evaluated using the human action classification task. Experiments with the KTH and the UCF sports datasets show that the approach was able to produce results comparable to the state-of-the-arts.
Proceedings Article•
Multiresolution Deep Belief Networks

[...]

Yichuan Tang1, Abdel-rahman Mohamed•
University of Toronto1
21 Mar 2012
TL;DR: A model based on the Deep Belief Network which learns features from the multiscale representation of images and demonstrates the superiority of MrDBNs at modeling face images in the domain of generative learning.
Abstract: Motivated by the observation that coarse and ne resolutions of an image reveal dierent structures in the underlying visual phenomenon, we present a model based on the Deep Belief Network (DBN) which learns features from the multiscale representation of images. A Laplacian Pyramid is rst constructed for each image. DBNs are then trained separately at each level of the pyramid. Finally, a top level RBM combines these DBNs into a single network we call the Multiresolution Deep Belief Network (MrDBN). Experiments show that MrDBNs generalize better than standard DBNs on NORB classication and TIMIT phone recognition. In the domain of generative learning, we demonstrate the superiority of MrDBNs at modeling face images.
Journal Article•10.1016/J.PROCS.2012.09.126•
A Hybrid Approach to Localize Farsi Text in Natural Scene Images

[...]

Maryam Darab1, Mohammad Rahmati1•
Amirkabir University of Technology1
01 Jan 2012-Procedia Computer Science
TL;DR: A new color based method is proposed for extracting candidate regions, then the texts in natural scene images are detected by combining edge and color features, and variation due to text size and orientation, are resolved by a new pyramid of images.
Journal Article•10.1145/2366145.2366152•
Sparse PDF maps for non-linear multi-resolution image operations

[...]

Markus Hadwiger1, Ronell Sicat1, Johanna Beyer1, Jens Krüger2, Torsten Möller3 •
King Abdullah University of Science and Technology1, Intel2, Simon Fraser University3
1 Nov 2012
TL;DR: The sparsity of sPDF-maps makes them feasible for gigapixel images, while enabling direct evaluation of a variety of non-linear operators from the same representation, and is illustrated for antialiased color mapping, O(n) local Laplacian filters, smoothed local histogram filters, and bilateral filters.
Abstract: We introduce a new type of multi-resolution image pyramid for high-resolution images called sparse pdf maps (sPDF-maps). Each pyramid level consists of a sparse encoding of continuous probability density functions (pdfs) of pixel neighborhoods in the original image. The encoded pdfs enable the accurate computation of non-linear image operations directly in any pyramid level with proper pre-filtering for anti-aliasing, without accessing higher or lower resolutions. The sparsity of sPDF-maps makes them feasible for gigapixel images, while enabling direct evaluation of a variety of non-linear operators from the same representation. We illustrate this versatility for antialiased color mapping, O(n) local Laplacian filters, smoothed local histogram filters (e.g., median or mode filters), and bilateral filters.
Proceedings Article•10.1145/2425333.2425391•
Novel color Gabor-LBP-PHOG (GLP) descriptors for object and scene image classification

[...]

Atreyee Sinha1, Sugata Banerji1, Chengjun Liu1•
New Jersey Institute of Technology1
16 Dec 2012
TL;DR: This paper proposes a novel Gabor-LBP-PHOG-GLP image descriptor which performs well on different image categories and makes a comparative assessment of the classification performance of the GLP descriptor in six different color spaces.
Abstract: This paper presents a novel set of color descriptors for object and scene image classification. We first introduce a new Gabor-PHOG (GPHOG) descriptor by concatenating the Pyramid of Histograms of Oriented Gradients (PHOG) of the local Gabor filtered images. Second, we derive the Gabor-LBP (GLBP) descriptor by accumulating the Local Binary Patterns (LBP) histograms of all the component images produced by applying Gabor filters. Then, by combining the GPHOG and the GLBP descriptors using an optimal feature representation method, we propose a novel Gabor-LBP-PHOG (GLP) image descriptor which performs well on different image categories. Next, we make a comparative assessment of the classification performance of the GLP descriptor in six different color spaces. Finally, we present a novel Fused Color GLP (FC-GLP) feature by integrating the PCA features of the six color GLP descriptors. The Principal Component Analysis (PCA) and the Enhanced Fisher Model (EFM) are applied for feature extraction and the nearest neighbor classification rule is used for classification. The effectiveness of the proposed GLP and FC-GLP feature vectors for image classification is evaluated using three grand challenge datasets, namely the Caltech 256 dataset, the MIT Scene dataset and the UIUC Sports Event dataset.
Patent•
Contour vector feature-based embedded real-time image matching method

[...]

Ruilin Bai, Jian Ni, Feng Ji
5 Sep 2012
TL;DR: In this paper, a contour vector feature-based embedded real-time image matching method is proposed, which uses the linear feature based on X and Y direction vectors, and has strong capability of resisting image distortion, noise, shading, illumination changes, polarity inversion and so on.
Abstract: The invention provides a contour vector feature-based embedded real-time image matching method. The method uses the linear feature based on X and Y direction vectors, and has strong capability of resisting image distortion, noise, shading, illumination changes, polarity inversion and so on. An image pyramid search strategy is used, templates are quickly matched in a high-layer low-resolution image to be tested, and then, a target position is found out accurately by stepwise downward search, so that matching time is reduced greatly. According to the template image specific information, the best pyramid hierarchy number and the best rotation angle step size for the pyramid template matching of each layer are calculated automatically. An image pyramid highest-layer three-level screening matching strategy is provided, treatment is carried out according to the specific content of the image to be tested, and the first level of screening and the second level of screening are carried out; the non-target position is eliminated just by the addition and subtraction and the conditional statements, which is more efficient in the embedded system than using the multiplication and division; and the third level only processes fewer positions meeting the requirements of the above two levels, so that the matching speed is improved greatly. The overall method can realize the work of matching and locating the target at any angle and any coordinate.
Patent•
Marine ship detection method in optical remote sensing image

[...]

Changren Zhu, Guo Jun
12 Sep 2012
TL;DR: In this paper, an optical remote sensing image marine ship detection method based on local contrast information and a space pyramid characteristic is presented. But the method is not suitable for the detection of ships.
Abstract: The invention provides an optical remote sensing image marine ship detection method based on local contrast information and a space pyramid characteristic. A technical scheme is characterized by: firstly, sliding a window in a sea area based on local contrast so as to carry out suspected object detection of a ship and reducing a false alarm of ship detection; then, for a suspected object area obtained through segmentation, taking a neighborhood according to a certain size of the window, using a space pyramid matching model to extract space context information so as to carry out classification, deleting background interference, acquiring a ship detection result and reducing the false alarm of the ship detection. By using the method of the invention, white polarity performance and black polarity performance problems of the ship can be effectively inhibited. Simultaneously, for a similarity problem of the ship object and the other interference and a difference problem possessed by ship object, the local neighborhood context information is introduced to carry out characteristic description and identification of the ship. The object and the background interference is distinguished and a false alarm rate of the ship detection can be effectively inhibited.
Patent•
Moving picture capturing device, information processing system, information processing device, and image data processing method

[...]

Akio Ohba1, Hiroyuki Segawa1•
Sony Computer Entertainment1
5 Apr 2012
TL;DR: In this paper, a data selection unit selects and extracts necessary data from respective streams of pieces of data of the synthesized image, a RAW image, and a 1/1 demosaiced image and generates a stream of data to be transmitted.
Abstract: An image synthesis unit receives respective pixel values for a single horizontal row of a ¼ demosaiced image, a 1/16 demosaiced image, and a 1/64 demosaiced image from a pyramid filter for reducing, in a plurality of stages, a frame of a moving image that is captured. The image synthesis unit then connects the pixel values in a predetermined rule so as to generate a virtual synthesized image and outputs the synthesized image in the form of streams. A control unit of an image transmission unit notifies a data selection unit of a request from a host terminal. The data selection unit selects and extracts necessary data from respective streams of pieces of data of the synthesized image, a RAW image, and a 1/1 demosaiced image, and generates a stream of data to be transmitted. A packetizing unit packetizes the stream and transmits the packetized stream to the host terminal.
Patent•
Steady automatic matching method for high-resolution satellite image connecting points

[...]

Ming Yang, Chen Chujiang, Yu Shaohuai, Zhang Xiao, Liyuan Wang, Yingdan Wu 
19 Sep 2012
TL;DR: In this article, a steady automatic matching method for high-resolution satellite image connecting points is proposed, which comprises the following steps of: A, performing automatic enhancement by using Wallis filter technology, generating pyramid images of each layer and extracting the images by using feature extraction operators; B, forecasting an initial point position of identical points by using satellite image orientation parameters and upper pyramid matching results of characteristic points, establishing an epipolar geometric constraint equation, and performing geometric coarse correction on matched window images; C, removing error matched points in the matching result of the pyramid image images of the
Abstract: The invention discloses a steady automatic matching method for high-resolution satellite image connecting points. The method comprises the following steps of: A, performing automatic enhancement by using Wallis filter technology, generating pyramid images of each layer and extracting the images by using feature extraction operators; B, forecasting an initial point position of identical points by using satellite image orientation parameters and upper pyramid matching results of characteristic points, establishing an epipolar geometric constraint equation, and performing geometric coarse correction on matched window images; C, removing error matched points in the matching result of the pyramid images of each layer by using an RFM model-based block adjustment method; and D, repeating the steps B, C and D till the primary image layer, and finally refining the matching result by using a double-sheet least square matching method. The method can greatly reduce the artificial editing workloadof connecting point measurement and improve the automation degree of satellite image data processing by combining block adjustment and satellite image matching, and has remarkable economic benefit and social benefit.
Proceedings Article•10.1109/WACV.2012.6163023•
Learning reconfigurable scene representation by tangram model

[...]

Jun Zhu1, Tianfu Wu, Song-Chun Zhu, Xiaokang Yang1, Wenjun Zhang1 •
Shanghai Jiao Tong University1
9 Jan 2012
TL;DR: The tangram model is capable of capturing meaningful spatial configurations as well as appearance for various scene categories, and achieves state-of-the-art classification performance on the LSP 15-class scene dataset and the MIT 67-class indoor scene dataset.
Abstract: This paper proposes a method to learn reconfigurable and sparse scene representation in the joint space of spatial configuration and appearance in a principled way. We call it the tangram model, which has three properties: (1) Unlike fixed structure of the spatial pyramid widely used in the literature, we propose a compositional shape dictionary organized in an And-Or directed acyclic graph (AOG) to quantize the space of spatial configurations. (2) The shape primitives (called tans) in the dictionary can be described by using any “off-the-shelf” appearance features according to different tasks. (3) A dynamic programming (DP) algorithm is utilized to learn the globally optimal parse tree in the joint space of spatial configuration and appearance. We demonstrate the tangram model in both a generative learning formulation and a discriminative matching kernel. In experiments, we show that the tangram model is capable of capturing meaningful spatial configurations as well as appearance for various scene categories, and achieves state-of-the-art classification performance on the LSP 15-class scene dataset and the MIT 67-class indoor scene dataset.
Patent•
Device and method for omnidirectional image display

[...]

Olav Christensen
21 Sep 2012
TL;DR: In this article, an inverted, truncated pyramid bearing semi-reflective facets visible from outside of the apparatus in all horizontal directions is described. But the display of images viewable from any direction and devices therefor is not discussed.
Abstract: The invention generally relates to the display of images viewable from any direction and devices therefor. In some aspects, the invention provides an apparatus for displaying a hologram-like image. The apparatus includes an inverted, truncated pyramid bearing semi-reflective facets visible from outside of the apparatus in all horizontal directions. A base frame supports the truncated pyramid, houses an imaging system, and provides image source surfaces that display images beneath each of the facets to be reflected by each of the facets.
Selection of relevant features and TerraSAR-X products for classification of high resolution SAR images

[...]

Corneliu Octavian Dumitru1, Jagmal Singh1, Mihai Datcu1•
German Aerospace Center1
23 Apr 2012
TL;DR: In this paper, the authors compared the performance of different features and products for SAR images and compared them for a multi-resolution pyramid generated for TerraSAR-X MGD products.
Abstract: Feature extraction and classification using synthetic aperture radar (SAR) images has been a very active research field over recent last years. Although a lot of features have been proposed and many classifiers have been employed, but there are few works on comparing these features for different TerraSAR-X (TSX) product. In principle, there are many features like gray level co-occurrence matrix, Gabor filters, quadrature mirror filters, and non-linear short time Fourier transform that can be very useful for TSX image classification. However, many of these features may be completely irrelevant for classification when different TSX products (standard or special process products) are used. Therefore, an important research direction is to identify the best features and appropriate TSX product for them using the Support Vector Machine and as a measure of the classification accuracy the precision -recall. The precision-recall was computed for all these features and products and after that we identify the feature and the product that perform better than the other. The results shows that: (1) the best feature extraction method is Gabor filters (with different scales and orientations) for almost of the TSX products with an average (for all the classes) of the precision between 89.72% and 97.41% and an average of the recall between 33.59% and 44.16% (depending by the TSX products) and (2) the best product from the multi-resolution product pyramid is the standard MGD-RE product. Our dataset was TerraSAR-X High Resolution Spotlight products taken over Venice and Toulouse where the actual ground cover was known to us. The novelty of this article lies in the fact that these features are applied for SAR images and compared to each other for a multi-resolution pyramid generated for TerraSAR-X MGD products.
Proceedings Article•
A new depth descriptor for pedestrian detection in RGB-D images

[...]

Ningbo Wang1, Xiaojin Gong1, Jilin Liu1•
Zhejiang University1
1 Nov 2012
TL;DR: This paper proposes a new feature descriptor, Pyramid Depth Self-Similarities (PDSS), based on the idea that depth information of people has high local self-similarities, and proves that PDSS is an effective complement to Histogram of Oriented Depth (HOD).
Abstract: With the development of depth camera technology, it is feasible to get high quality color and depth images synchronously in real time. Thus, RGB-D-based applications are becoming more and more popular, such as pedestrian detection in RGB-D data. As the key point in this application is to search for better descriptions, in this paper we propose a new feature descriptor, Pyramid Depth Self-Similarities (PDSS), for depth images. It is based on the idea that depth information of people has high local self-similarities. The experiments, where RGB-D data is collected by a Kinect sensor, prove that PDSS is an effective complement to Histogram of Oriented Depth (HOD). Furthermore, the combination of Histogram of Oriented Gradients (HOG), HOD and PDSS improves the detection performance.
Proceedings Article•10.1109/ICIP.2012.6467506•
A novel scheme for 4-D Light-Field compression based on 3-D representation by multi-focus images

[...]

Takashi Sakamoto1, Kazuya Kodama1, Takayuki Hamamoto2•
National Institute of Informatics1, Tokyo University of Science2
1 Sep 2012
TL;DR: A method of reconstructing Light-Field directly from 3-D information composed of multi-focus images without any scene estimation is derived, which is robust even at very low bit-rate.
Abstract: Light-Field enables us to observe scenes from free viewpoints. However, it generally consists of 4-D enormous data, that are not suitable for storing or transmitting without effective compression. 4-D Light-Field is very redundant because essentially it includes just 3-D scene information. Actually, although robust 3-D scene estimation such as depth recovery from Light-Field is not so easy, we successfully derived a method of reconstructing Light-Field directly from 3-D information composed of multi-focus images without any scene estimation. On the other hand, it is easy to synthesize multi-focus images from Light-Field. In this paper, based on the method, we propose novel Light-Field compression via synthesized multi-focus images as effective representation of 3-D scenes. Multi-focus images are easily compressed because they contain mostly low frequency components. We show experimental results by using synthetic and real images. Reconstruction quality of the method is robust even at very low bit-rate.
Proceedings Article•10.1109/IGARSS.2012.6351017•
Bilateral pyramid based pansharpening of multispectral satellite images

[...]

Nur Huseyin Kaplan1, Isin Erer1•
Istanbul Technical University1
22 Jul 2012
TL;DR: A new fusion method based on bilateral pyramid for multispectral and panchromatic images is presented and is compared with the widely used IHS, ATWT substitutive and ATWT additive fusion methods.
Abstract: A new fusion method based on bilateral pyramid for multispectral and panchromatic images is presented The fused image is obtained by two different rules: substitutive and additive methods Bilateral pyramid is a multiscale decomposition method which decomposes an input image into a base layer representing the low frequency content and several detail layers representing the high frequency part of the image In substitutive method, both MS and PAN images are decomposed using bilateral pyramid The detail layers of the PAN image are added to the base layer of the MS image In additive method, the detail layers of the PAN image are directly added to the MS image The proposed method is compared with the widely used IHS (intensity-hue-saturation), ATWT substitutive and ATWT additive fusion methods The resulting images as well as evaluation metrics demonstrate that the proposed algorithm has better performance
Proceedings Article•10.1109/CVPR.2012.6247851•
A-Optimal Non-negative Projection for image representation

[...]

Haifeng Liu1, Zheng Yang1, Zhaohui Wu1, Xuelong Li2•
Zhejiang University1, Chinese Academy of Sciences2
16 Jun 2012
TL;DR: This paper proposes a novel method, called A-Optimal Non-negative Projection (ANP), which imposes a constraint on the encoding factor as a regularizer during matrix factorization to preserve more intrinsic characteristics of the data regardless of any specific labels.
Abstract: As a central problem in computer vision and pattern recognition, data representation has attracted great attention in the past years. Non-negative matrix factorization (NMF) which is a useful data representation method makes great contribution on finding the latent structure of the data and leads to a parts-based representation by decomposing the data matrix into a few bases and encodings with nonnegative constraints. However, non-negative constraint is insufficient for getting more robust data representation. In this paper, we propose a novel method, called A-Optimal Non-negative Projection (ANP) for image data representation and further analysis. ANP imposes a constraint on the encoding factor as a regularizer during matrix factorization. In this way, the learned data representation leads to a stable linear model no matter what kind of data label is selected for further processing. Thus, it can preserve more intrinsic characteristics of the data regardless of any specific labels. We demonstrate the effectiveness of this novel algorithm through a set of evaluations on real world applications.
Proceedings Article•10.1109/SOCIALCOM-PASSAT.2012.69•
Facial Action Detection Using Block-Based Pyramid Appearance Descriptors

[...]

Bihan Jiang1, Michel Valstar2, Maja Pantic1•
Imperial College London1, University of Nottingham2
3 Sep 2012
TL;DR: Results show that the proposed descriptor B-PLPQ outperforms all other tested methods for the problem of FACS Action Unit analysis and that systems which utilise a pyramid representation outperform those that use basic appearance descriptors.
Abstract: Facial expression is one of the most important non-verbal behavioural cues in social signals. Constructing an effective face representation from images is an essential step for successful facial behaviour analysis. Most existing face descriptors operate on the same scale, and do not leverage coarse v.s. fine methods such as image pyramids. In this work, we propose the sparse appearance descriptors Block-based Pyramid Local Binary Pattern (B-PLBP) and Block-based Pyramid Local Phase Quantisation (B-PLPQ). The effectiveness of our proposed descriptors is evaluated by a real-time facial action recognition system. The performance of B-PLBP and B-PLPQ is also compared with Block-based Local Binary Patterns (B-LBP) and Block-based Local Phase Quantisation (B-LPQ). The system proposed here enables detection a much larger range of facial behaviour by detecting 22 facial muscle actions (Action Units, AUs), which can be practically applied for social behaviour analysis and synthesis. Results show that our proposed descriptor B-PLPQ outperforms all other tested methods for the problem of FACS Action Unit analysis and that systems which utilise a pyramid representation outperform those that use basic appearance descriptors.
Patent•
Object tracking method and system

[...]

Lei Wang, Yafeng Deng, Ying Huang
18 Apr 2012
TL;DR: Zhang et al. as discussed by the authors proposed a target tracking method and a system thereof, wherein, the method comprises the following steps: a video image of a tracking target was acquired; before a target model of the tracking target is established, the video image was analyzed to acquire an area corresponding to the complete target of the tracked target in the video, and a target was established according to the area corresponding with the acquired complete target.
Abstract: The invention discloses a target tracking method and a system thereof, wherein, the method comprises the following steps: a video image of a tracking target is acquired; before a target model of the tracking target is established, the video image is analyzed to acquire an area corresponding to the complete target of the tracking target in the video image, and a target model of the tracking targetis established according to the area corresponding to the acquired complete target; after the target model of the tracking target is established, the video image is analyzed to acquire possible locations in the target area of the tracking target in the current image, and the possible locations in the target area are taken as candidate target areas; the features of each candidate target area are respectively matched with the target mode based on detection of interest points, the partial image feature extraction technology and the pyramid matching algorithm, and the candidate target area with the largest matching result is taken as the current target area of the tracking target. The technical proposal provided by the invention can improve the success rate of tracking.
Journal Article•10.1016/J.IMAVIS.2012.04.002•
Compact and adaptive spatial pyramids for scene recognition

[...]

Noha M. Elfiky1, Jordi Gonzàlez1, F. Xavier Roca1•
Autonomous University of Barcelona1
01 Aug 2012-Image and Vision Computing
TL;DR: The role of texture along with its spatial layout for scene recognition is analyzed, and a novel spatial texture descriptor (PC-TPLBP) is presented for the problem of scene recognition, showing the importance of combining PC-T PLBP with pixel-based features (local) for improving performance.
Proceedings Article•10.1109/IJCNN.2012.6252374•
Multi-scale local pattern co-occurrence matrix for textural image classification

[...]

Xiangping Sun1, Jin Wang1, Ronghua Chen1, Mary F.H. She1, Lingxue Kong1 •
Deakin University1
10 Jun 2012
TL;DR: A novel multi-scale local pattern co-occurrence matrix (MS_LPCM) descriptor is proposed to characterize textural images through four major steps and has shown a higher classification accuracy and lower computing cost as compared with other state-of-the-art algorithms.
Abstract: Textural image classification technologies have been extensively explored and widely applied in many areas. It is advantageous to combine both the occurrence and spatial distribution of local patterns to describe a texture. However, most existing state-of-the-art approaches for textural image classification only employ the occurrence histogram of local patterns to describe textures, without considering their co-occurrence information. And they are usually very time-consuming because of the vector quantization involved. Moreover, those feature extraction paradigms are implemented at a single scale. In this paper we propose a novel multi-scale local pattern co-occurrence matrix (MS_LPCM) descriptor to characterize textural images through four major steps. Firstly, Gaussian filtering pyramid preprocessing is employed to obtain multi-scale images; secondly, a local binary pattern (LBP) operator is applied on each textural image to create a LBP image; thirdly, the gray-level co-occurrence matrix (GLCM) is utilized to extract local pattern co-occurrence matrix (LPCM) from LBP images as the features; finally, all LPCM features from the same textural image at different scales are concatenated as the final feature vectors for classification. The experimental results on three benchmark databases in this study have shown a higher classification accuracy and lower computing cost as compared with other state-of-the-art algorithms.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve