TL;DR: In this article, a pyramid correlation algorithm was proposed to increase the precision and robustness of time-resolved particle image velocimetry (TR-PIV) measurements.
Abstract: A novel technique is introduced to increase the precision and robustness of time-resolved particle image velocimetry (TR-PIV) measurements. The innovative element of the technique is the linear combination of the correlation signal computed at different separation time intervals. The domain of the correlation signal resulting from different temporal separations is matched via homothetic transformation prior to the averaging of the correlation maps. The resulting ensemble-averaged correlation function features a significantly higher signal-to-noise ratio and a more precise velocity estimation due to the evaluation of a larger particle image displacement. The method relies on a local optimization of the observation time between snapshots taking into account the local out-ofplane motion, continuum deformation due to in-plane velocity gradient and acceleration errors. The performance of the pyramid correlation algorithm is assessed on a synthetically generated image sequence reproducing a three-dimensional Batchelor vortex; experiments conducted in air and water flows are used to assess the performance on time-resolved PIV image sequences. The numerical assessment demonstrates the effectiveness of the pyramid correlation technique in reducing both random and bias errors by a factor 3 and one order of magnitude, respectively. The experimental assessment yields a significant increase of signal strength indicating enhanced measurement robustness. Moreover, the amplitude of noisy fluctuations is considerably attenuated and higher precision is obtained for the evaluation of time-resolved velocity and acceleration.
TL;DR: This paper presents a systematic framework for recognizing human actions without relying on impractical assumptions, such as processing of an entire video or requiring a large look-ahead of frames to label an incoming video.
Abstract: In this paper, we present a systematic framework for recognizing human actions without relying on impractical assumptions, such as processing of an entire video or requiring a large look-ahead of frames to label an incoming video. As a secondary goal, we examine incremental learning as an overlooked obstruction to the implementation of reliable real-time recognition. Assuming weak appearance constancy, the shape of an actor is approximated by adaptively changing intensity histograms to extract pyramid histograms of oriented gradient features. As action progresses, the shape update is carried out by adjustment of a few blocks within a tracking window to closely track evolving contours. The nonlinear dynamics of an action are learned using a recursive analytic approach, which transforms training into a simple linear representation. Such a learning strategy has two advantages: 1) minimized error rates, and significant savings in computational time; and 2) elimination of the widely accepted limitations of batch-mode training for action recognition. The effectiveness of our proposed framework is corroborated by experimental validation against the state of the art.
TL;DR: Comparison to clustering based on agglomerative information bottleneck (AIB) shows that the method obtains superior results at significantly lower computational costs and the optimal combination of multiple features in the context of the compact pyramid representation is investigated.
TL;DR: This paper presents design and implementation of a pipelined datapath for real-time face detection using cascades of boosted classifiers, and proposes following methods: symmetric image downscaling, classifier sharing, and cascade merging, to achieve the desired processing speed and area efficiency.
Abstract: This paper presents design and implementation of a pipelined datapath for real-time face detection using cascades of boosted classifiers. We propose following methods: symmetric image downscaling, classifier sharing, and cascade merging, to achieve the desired processing speed and area efficiency. First, an image pyramid with 16 levels is generated from the input image to simultaneously detect faces with different scales. The downscaled images are then transferred to the first stage of the cascade that is shared between the corresponding image pairs based on the pixel validity of the symmetric image pyramid. The last method exploits the different hit ratios of the cascade stages. We use a tree-structured cascade of classifiers since most of the nonface elements are eliminated during the early stages of the classifier. The use of a synthesis tool confirms that the proposed design reduces resource utilization by one-eighth without accuracy loss, compared to the fully parallelized implementation of the same algorithm. We implemented the proposed hardware architecture on a Xilinx Virtex-5 LX330 FPGA. The indicative throughput is 307 frames/s irrespective of the number of faces in the scene for standard VGA (640 × 480) images with an operating frequency of 125.59 MHz. We may ensure that face detection results are generated at each clock cycle after the initial pipeline delay, using this fully pipelined datapath for tree-structured cascade classifiers.
TL;DR: A Bag-of-Words representation in 3D, which is used in conjunction with a SVM classification machinery, and the 3D Spatial Pyramid Matching Kernel, which works by partitioning a working volume into fine sub-volumes, and computing a hierarchical weighted sum of histogram intersections at each level of the pyramid structure.
Abstract: This paper proposes a novel approach to recognize object categories in point clouds. By quantizing 3D SURF local descriptors, computed on partial 3D shapes extracted from the point clouds, a vocabulary of 3D visual words is generated. Using this codebook, we build a Bag-of-Words representation in 3D, which is used in conjunction with a SVM classification machinery. We also introduce the 3D Spatial Pyramid Matching Kernel, which works by partitioning a working volume into fine sub-volumes, and computing a hierarchical weighted sum of histogram intersections at each level of the pyramid structure. With the aim of increasing both the classification accuracy and the computational efficiency of the kernel, we propose selective hierarchical volume decomposition strategies, based on representative and discriminative (sub-)volume selection processes, which drastically reduce the pyramid to consider. Results on the challenging large-scale RGB-D object dataset show that our kernels significantly outperform the state-of-the-art results by using a single 3D shape feature type extracted from individual depth images.
TL;DR: This paper presents a novel approach to incorporate spatial information in the bag-of-visual-words model for category level and scene classification by taking advantage of the orientation of the segments formed by Pairs of Identical visual Words (PIW).
Abstract: This paper presents a novel approach to incorporate spatial information in the bag-of-visual-words model for category level and scene classification. In the traditional bag-of-visual-words model, feature vectors are histograms of visual words. This representation is appearance based and does not contain any information regarding the arrangement of the visual words in the 2D image space. In this framework, we present a simple and effi- cient way to infuse spatial information. Particularly, we are interested in explicit global relationships among the spatial positions of visual words. Therefore, we take advantage of the orientation of the segments formed by Pairs of Identical visual Words (PIW). An evenly distributed normalized histogram of angles of PIW is computed. Histograms pro- duced by each word type constitute a powerful description of intra type visual words relationships. Experiments on challenging datasets demonstrate that our method is com- petitive with the concurrent ones. We also show that, our method provides important complementary information to the spatial pyramid matching and can improve the overall performance.
TL;DR: Experimental results show that the proposed method outperforms single feature methods and existing methods for ulcer detection.
Abstract: The invention of wireless capsule endoscopy greatly helps physician to view small intestine images without causing much pain to patients It becomes very popular around the world for its usability and performance However, physician requires a long time (around 45 minutes) to examine a capsule endoscopy video generated from each examination In this paper, we propose a new image processing method using combination of local features for ulcer detection The proposed method is developed based on bag-of-words model and feature fusion technique Image patches are described by LBP and SIFT features The pyramid bag-of-words is employed to model and represent the images, and SVM classifiers are trained Finally feature fusion technique is employed to draw a final conclusion Experimental results show that the proposed method outperforms single feature methods and existing methods
TL;DR: In this paper, a multi-layer representation for a reference view including information about a scene may be generated based on at least one input view, and the representation may be expanded to obtain information about occluded regions.
Abstract: A method and apparatus for processing an image using a multi-layer representation is provided. A multi-layer representation for a reference view including information about a scene may be generated based on at least one input view. The multi-layer representation may be expanded to obtain information about a portion occluded by the at least one common input view. Output views viewed at different viewpoints may be generated using the expanded multi-layer representation.
TL;DR: In this paper, the authors proposed an image representation called Detection Bank, which is based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections and extended to video by aggregating the key frame level image representations through mean and max pooling.
Abstract: While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition Recognition or retrieval of events and activities can be improved if specific discriminative objects are detected in a video sequence In this paper, we propose an image representation, called Detection Bank, based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections This representation is extended to video by aggregating the key frame level image representations through mean and max pooling We empirically show that it captures complementary information to state-of-the-art representations such as Spatial Pyramid Matching and Object Bank These descriptors combined with our Detection Bank representation significantly outperforms any of the representations alone on TRECVID MED 2011 data
TL;DR: A spatio-temporal difference-of-Gaussian (DoG) pyramid to detect the local extrema, aiming at processing video streams, and shows that the approach was able to produce results comparable to the state of thearts.
Abstract: This paper presents a space-time extension of scale-invariant feature transform (SIFT) originally applied to the 2-dimensional (2D) volumetric images. Most of the previous extensions dealt with 3-dimensional (3D) spacial information using a combination of a 2D detector and a 3D descriptor for applications such as medical image analysis. In this work we build a spatio-temporal difference-of-Gaussian (DoG) pyramid to detect the local extrema, aiming at processing video streams. Interest points are extracted not only from the spatial plane (xy) but also from the planes along the time axis (xt and yt). The space-time extension was evaluated using the human action classification task. Experiments with the KTH and the UCF sports datasets show that the approach was able to produce results comparable to the state-of-the-arts.
TL;DR: A model based on the Deep Belief Network which learns features from the multiscale representation of images and demonstrates the superiority of MrDBNs at modeling face images in the domain of generative learning.
Abstract: Motivated by the observation that coarse and ne resolutions of an image reveal dierent structures in the underlying visual phenomenon, we present a model based on the Deep Belief Network (DBN) which learns features from the multiscale representation of images. A Laplacian Pyramid is rst constructed for each image. DBNs are then trained separately at each level of the pyramid. Finally, a top level RBM combines these DBNs into a single network we call the Multiresolution Deep Belief Network (MrDBN). Experiments show that MrDBNs generalize better than standard DBNs on NORB classication and TIMIT phone recognition. In the domain of generative learning, we demonstrate the superiority of MrDBNs at modeling face images.
TL;DR: A new color based method is proposed for extracting candidate regions, then the texts in natural scene images are detected by combining edge and color features, and variation due to text size and orientation, are resolved by a new pyramid of images.
TL;DR: The sparsity of sPDF-maps makes them feasible for gigapixel images, while enabling direct evaluation of a variety of non-linear operators from the same representation, and is illustrated for antialiased color mapping, O(n) local Laplacian filters, smoothed local histogram filters, and bilateral filters.
Abstract: We introduce a new type of multi-resolution image pyramid for high-resolution images called sparse pdf maps (sPDF-maps). Each pyramid level consists of a sparse encoding of continuous probability density functions (pdfs) of pixel neighborhoods in the original image. The encoded pdfs enable the accurate computation of non-linear image operations directly in any pyramid level with proper pre-filtering for anti-aliasing, without accessing higher or lower resolutions. The sparsity of sPDF-maps makes them feasible for gigapixel images, while enabling direct evaluation of a variety of non-linear operators from the same representation. We illustrate this versatility for antialiased color mapping, O(n) local Laplacian filters, smoothed local histogram filters (e.g., median or mode filters), and bilateral filters.
TL;DR: This paper proposes a novel Gabor-LBP-PHOG-GLP image descriptor which performs well on different image categories and makes a comparative assessment of the classification performance of the GLP descriptor in six different color spaces.
Abstract: This paper presents a novel set of color descriptors for object and scene image classification. We first introduce a new Gabor-PHOG (GPHOG) descriptor by concatenating the Pyramid of Histograms of Oriented Gradients (PHOG) of the local Gabor filtered images. Second, we derive the Gabor-LBP (GLBP) descriptor by accumulating the Local Binary Patterns (LBP) histograms of all the component images produced by applying Gabor filters. Then, by combining the GPHOG and the GLBP descriptors using an optimal feature representation method, we propose a novel Gabor-LBP-PHOG (GLP) image descriptor which performs well on different image categories. Next, we make a comparative assessment of the classification performance of the GLP descriptor in six different color spaces. Finally, we present a novel Fused Color GLP (FC-GLP) feature by integrating the PCA features of the six color GLP descriptors. The Principal Component Analysis (PCA) and the Enhanced Fisher Model (EFM) are applied for feature extraction and the nearest neighbor classification rule is used for classification. The effectiveness of the proposed GLP and FC-GLP feature vectors for image classification is evaluated using three grand challenge datasets, namely the Caltech 256 dataset, the MIT Scene dataset and the UIUC Sports Event dataset.
TL;DR: In this paper, a contour vector feature-based embedded real-time image matching method is proposed, which uses the linear feature based on X and Y direction vectors, and has strong capability of resisting image distortion, noise, shading, illumination changes, polarity inversion and so on.
Abstract: The invention provides a contour vector feature-based embedded real-time image matching method. The method uses the linear feature based on X and Y direction vectors, and has strong capability of resisting image distortion, noise, shading, illumination changes, polarity inversion and so on. An image pyramid search strategy is used, templates are quickly matched in a high-layer low-resolution image to be tested, and then, a target position is found out accurately by stepwise downward search, so that matching time is reduced greatly. According to the template image specific information, the best pyramid hierarchy number and the best rotation angle step size for the pyramid template matching of each layer are calculated automatically. An image pyramid highest-layer three-level screening matching strategy is provided, treatment is carried out according to the specific content of the image to be tested, and the first level of screening and the second level of screening are carried out; the non-target position is eliminated just by the addition and subtraction and the conditional statements, which is more efficient in the embedded system than using the multiplication and division; and the third level only processes fewer positions meeting the requirements of the above two levels, so that the matching speed is improved greatly. The overall method can realize the work of matching and locating the target at any angle and any coordinate.
TL;DR: In this paper, an optical remote sensing image marine ship detection method based on local contrast information and a space pyramid characteristic is presented. But the method is not suitable for the detection of ships.
Abstract: The invention provides an optical remote sensing image marine ship detection method based on local contrast information and a space pyramid characteristic. A technical scheme is characterized by: firstly, sliding a window in a sea area based on local contrast so as to carry out suspected object detection of a ship and reducing a false alarm of ship detection; then, for a suspected object area obtained through segmentation, taking a neighborhood according to a certain size of the window, using a space pyramid matching model to extract space context information so as to carry out classification, deleting background interference, acquiring a ship detection result and reducing the false alarm of the ship detection. By using the method of the invention, white polarity performance and black polarity performance problems of the ship can be effectively inhibited. Simultaneously, for a similarity problem of the ship object and the other interference and a difference problem possessed by ship object, the local neighborhood context information is introduced to carry out characteristic description and identification of the ship. The object and the background interference is distinguished and a false alarm rate of the ship detection can be effectively inhibited.
TL;DR: In this paper, a data selection unit selects and extracts necessary data from respective streams of pieces of data of the synthesized image, a RAW image, and a 1/1 demosaiced image and generates a stream of data to be transmitted.
Abstract: An image synthesis unit receives respective pixel values for a single horizontal row of a ¼ demosaiced image, a 1/16 demosaiced image, and a 1/64 demosaiced image from a pyramid filter for reducing, in a plurality of stages, a frame of a moving image that is captured. The image synthesis unit then connects the pixel values in a predetermined rule so as to generate a virtual synthesized image and outputs the synthesized image in the form of streams. A control unit of an image transmission unit notifies a data selection unit of a request from a host terminal. The data selection unit selects and extracts necessary data from respective streams of pieces of data of the synthesized image, a RAW image, and a 1/1 demosaiced image, and generates a stream of data to be transmitted. A packetizing unit packetizes the stream and transmits the packetized stream to the host terminal.
TL;DR: In this article, a steady automatic matching method for high-resolution satellite image connecting points is proposed, which comprises the following steps of: A, performing automatic enhancement by using Wallis filter technology, generating pyramid images of each layer and extracting the images by using feature extraction operators; B, forecasting an initial point position of identical points by using satellite image orientation parameters and upper pyramid matching results of characteristic points, establishing an epipolar geometric constraint equation, and performing geometric coarse correction on matched window images; C, removing error matched points in the matching result of the pyramid image images of the
Abstract: The invention discloses a steady automatic matching method for high-resolution satellite image connecting points. The method comprises the following steps of: A, performing automatic enhancement by using Wallis filter technology, generating pyramid images of each layer and extracting the images by using feature extraction operators; B, forecasting an initial point position of identical points by using satellite image orientation parameters and upper pyramid matching results of characteristic points, establishing an epipolar geometric constraint equation, and performing geometric coarse correction on matched window images; C, removing error matched points in the matching result of the pyramid images of each layer by using an RFM model-based block adjustment method; and D, repeating the steps B, C and D till the primary image layer, and finally refining the matching result by using a double-sheet least square matching method. The method can greatly reduce the artificial editing workloadof connecting point measurement and improve the automation degree of satellite image data processing by combining block adjustment and satellite image matching, and has remarkable economic benefit and social benefit.
TL;DR: The tangram model is capable of capturing meaningful spatial configurations as well as appearance for various scene categories, and achieves state-of-the-art classification performance on the LSP 15-class scene dataset and the MIT 67-class indoor scene dataset.
Abstract: This paper proposes a method to learn reconfigurable and sparse scene representation in the joint space of spatial configuration and appearance in a principled way. We call it the tangram model, which has three properties: (1) Unlike fixed structure of the spatial pyramid widely used in the literature, we propose a compositional shape dictionary organized in an And-Or directed acyclic graph (AOG) to quantize the space of spatial configurations. (2) The shape primitives (called tans) in the dictionary can be described by using any “off-the-shelf” appearance features according to different tasks. (3) A dynamic programming (DP) algorithm is utilized to learn the globally optimal parse tree in the joint space of spatial configuration and appearance. We demonstrate the tangram model in both a generative learning formulation and a discriminative matching kernel. In experiments, we show that the tangram model is capable of capturing meaningful spatial configurations as well as appearance for various scene categories, and achieves state-of-the-art classification performance on the LSP 15-class scene dataset and the MIT 67-class indoor scene dataset.
TL;DR: In this article, an inverted, truncated pyramid bearing semi-reflective facets visible from outside of the apparatus in all horizontal directions is described. But the display of images viewable from any direction and devices therefor is not discussed.
Abstract: The invention generally relates to the display of images viewable from any direction and devices therefor. In some aspects, the invention provides an apparatus for displaying a hologram-like image. The apparatus includes an inverted, truncated pyramid bearing semi-reflective facets visible from outside of the apparatus in all horizontal directions. A base frame supports the truncated pyramid, houses an imaging system, and provides image source surfaces that display images beneath each of the facets to be reflected by each of the facets.
TL;DR: In this paper, the authors compared the performance of different features and products for SAR images and compared them for a multi-resolution pyramid generated for TerraSAR-X MGD products.
Abstract: Feature extraction and classification using synthetic aperture radar (SAR) images has been a very active research field over recent last years. Although a lot of features have been proposed and many classifiers have been employed, but there are few works on comparing these features for different TerraSAR-X (TSX) product. In principle, there are many features like gray level co-occurrence matrix, Gabor filters, quadrature mirror filters, and non-linear short time Fourier transform that can be very useful for TSX image classification. However, many of these features may be completely irrelevant for classification when different TSX products (standard or special process products) are used. Therefore, an important research direction is to identify the best features and appropriate TSX product for them using the Support Vector Machine and as a measure of the classification accuracy the precision -recall. The precision-recall was computed for all these features and products and after that we identify the feature and the product that perform better than the other. The results shows that: (1) the best feature extraction method is Gabor filters (with different scales and orientations) for almost of the TSX products with an average (for all the classes) of the precision between 89.72% and 97.41% and an average of the recall between 33.59% and 44.16% (depending by the TSX products) and (2) the best product from the multi-resolution product pyramid is the standard MGD-RE product. Our dataset was TerraSAR-X High Resolution Spotlight products taken over Venice and Toulouse where the actual ground cover was known to us. The novelty of this article lies in the fact that these features are applied for SAR images and compared to each other for a multi-resolution pyramid generated for TerraSAR-X MGD products.
TL;DR: This paper proposes a new feature descriptor, Pyramid Depth Self-Similarities (PDSS), based on the idea that depth information of people has high local self-similarities, and proves that PDSS is an effective complement to Histogram of Oriented Depth (HOD).
Abstract: With the development of depth camera technology, it is feasible to get high quality color and depth images synchronously in real time. Thus, RGB-D-based applications are becoming more and more popular, such as pedestrian detection in RGB-D data. As the key point in this application is to search for better descriptions, in this paper we propose a new feature descriptor, Pyramid Depth Self-Similarities (PDSS), for depth images. It is based on the idea that depth information of people has high local self-similarities. The experiments, where RGB-D data is collected by a Kinect sensor, prove that PDSS is an effective complement to Histogram of Oriented Depth (HOD). Furthermore, the combination of Histogram of Oriented Gradients (HOG), HOD and PDSS improves the detection performance.
TL;DR: A method of reconstructing Light-Field directly from 3-D information composed of multi-focus images without any scene estimation is derived, which is robust even at very low bit-rate.
Abstract: Light-Field enables us to observe scenes from free viewpoints. However, it generally consists of 4-D enormous data, that are not suitable for storing or transmitting without effective compression. 4-D Light-Field is very redundant because essentially it includes just 3-D scene information. Actually, although robust 3-D scene estimation such as depth recovery from Light-Field is not so easy, we successfully derived a method of reconstructing Light-Field directly from 3-D information composed of multi-focus images without any scene estimation. On the other hand, it is easy to synthesize multi-focus images from Light-Field. In this paper, based on the method, we propose novel Light-Field compression via synthesized multi-focus images as effective representation of 3-D scenes. Multi-focus images are easily compressed because they contain mostly low frequency components. We show experimental results by using synthetic and real images. Reconstruction quality of the method is robust even at very low bit-rate.
TL;DR: A new fusion method based on bilateral pyramid for multispectral and panchromatic images is presented and is compared with the widely used IHS, ATWT substitutive and ATWT additive fusion methods.
Abstract: A new fusion method based on bilateral pyramid for multispectral and panchromatic images is presented The fused image is obtained by two different rules: substitutive and additive methods Bilateral pyramid is a multiscale decomposition method which decomposes an input image into a base layer representing the low frequency content and several detail layers representing the high frequency part of the image In substitutive method, both MS and PAN images are decomposed using bilateral pyramid The detail layers of the PAN image are added to the base layer of the MS image In additive method, the detail layers of the PAN image are directly added to the MS image The proposed method is compared with the widely used IHS (intensity-hue-saturation), ATWT substitutive and ATWT additive fusion methods The resulting images as well as evaluation metrics demonstrate that the proposed algorithm has better performance
TL;DR: This paper proposes a novel method, called A-Optimal Non-negative Projection (ANP), which imposes a constraint on the encoding factor as a regularizer during matrix factorization to preserve more intrinsic characteristics of the data regardless of any specific labels.
Abstract: As a central problem in computer vision and pattern recognition, data representation has attracted great attention in the past years. Non-negative matrix factorization (NMF) which is a useful data representation method makes great contribution on finding the latent structure of the data and leads to a parts-based representation by decomposing the data matrix into a few bases and encodings with nonnegative constraints. However, non-negative constraint is insufficient for getting more robust data representation. In this paper, we propose a novel method, called A-Optimal Non-negative Projection (ANP) for image data representation and further analysis. ANP imposes a constraint on the encoding factor as a regularizer during matrix factorization. In this way, the learned data representation leads to a stable linear model no matter what kind of data label is selected for further processing. Thus, it can preserve more intrinsic characteristics of the data regardless of any specific labels. We demonstrate the effectiveness of this novel algorithm through a set of evaluations on real world applications.
TL;DR: Results show that the proposed descriptor B-PLPQ outperforms all other tested methods for the problem of FACS Action Unit analysis and that systems which utilise a pyramid representation outperform those that use basic appearance descriptors.
Abstract: Facial expression is one of the most important non-verbal behavioural cues in social signals. Constructing an effective face representation from images is an essential step for successful facial behaviour analysis. Most existing face descriptors operate on the same scale, and do not leverage coarse v.s. fine methods such as image pyramids. In this work, we propose the sparse appearance descriptors Block-based Pyramid Local Binary Pattern (B-PLBP) and Block-based Pyramid Local Phase Quantisation (B-PLPQ). The effectiveness of our proposed descriptors is evaluated by a real-time facial action recognition system. The performance of B-PLBP and B-PLPQ is also compared with Block-based Local Binary Patterns (B-LBP) and Block-based Local Phase Quantisation (B-LPQ). The system proposed here enables detection a much larger range of facial behaviour by detecting 22 facial muscle actions (Action Units, AUs), which can be practically applied for social behaviour analysis and synthesis. Results show that our proposed descriptor B-PLPQ outperforms all other tested methods for the problem of FACS Action Unit analysis and that systems which utilise a pyramid representation outperform those that use basic appearance descriptors.
TL;DR: Zhang et al. as discussed by the authors proposed a target tracking method and a system thereof, wherein, the method comprises the following steps: a video image of a tracking target was acquired; before a target model of the tracking target is established, the video image was analyzed to acquire an area corresponding to the complete target of the tracked target in the video, and a target was established according to the area corresponding with the acquired complete target.
Abstract: The invention discloses a target tracking method and a system thereof, wherein, the method comprises the following steps: a video image of a tracking target is acquired; before a target model of the tracking target is established, the video image is analyzed to acquire an area corresponding to the complete target of the tracking target in the video image, and a target model of the tracking targetis established according to the area corresponding to the acquired complete target; after the target model of the tracking target is established, the video image is analyzed to acquire possible locations in the target area of the tracking target in the current image, and the possible locations in the target area are taken as candidate target areas; the features of each candidate target area are respectively matched with the target mode based on detection of interest points, the partial image feature extraction technology and the pyramid matching algorithm, and the candidate target area with the largest matching result is taken as the current target area of the tracking target. The technical proposal provided by the invention can improve the success rate of tracking.
TL;DR: The role of texture along with its spatial layout for scene recognition is analyzed, and a novel spatial texture descriptor (PC-TPLBP) is presented for the problem of scene recognition, showing the importance of combining PC-T PLBP with pixel-based features (local) for improving performance.
TL;DR: A novel multi-scale local pattern co-occurrence matrix (MS_LPCM) descriptor is proposed to characterize textural images through four major steps and has shown a higher classification accuracy and lower computing cost as compared with other state-of-the-art algorithms.
Abstract: Textural image classification technologies have been extensively explored and widely applied in many areas. It is advantageous to combine both the occurrence and spatial distribution of local patterns to describe a texture. However, most existing state-of-the-art approaches for textural image classification only employ the occurrence histogram of local patterns to describe textures, without considering their co-occurrence information. And they are usually very time-consuming because of the vector quantization involved. Moreover, those feature extraction paradigms are implemented at a single scale. In this paper we propose a novel multi-scale local pattern co-occurrence matrix (MS_LPCM) descriptor to characterize textural images through four major steps. Firstly, Gaussian filtering pyramid preprocessing is employed to obtain multi-scale images; secondly, a local binary pattern (LBP) operator is applied on each textural image to create a LBP image; thirdly, the gray-level co-occurrence matrix (GLCM) is utilized to extract local pattern co-occurrence matrix (LPCM) from LBP images as the features; finally, all LPCM features from the same textural image at different scales are concatenated as the final feature vectors for classification. The experimental results on three benchmark databases in this study have shown a higher classification accuracy and lower computing cost as compared with other state-of-the-art algorithms.