TL;DR: Quantization-based Hashing (QBH) is a generic framework which incorporates the advantages of quantization error reduction methods into conventional property preserving hashing methods and can be applied to both unsupervised and supervised hashing methods.
TL;DR: A novel compression algorithm is presented here, which first spectrally decorrelates the image using Vector Quantization and Principal Component Analysis, and then applies JPEG2000 to the Principal Components exploiting spatial correlations for compression.
Abstract: Compression of hyperspectral imagery increases the efficiency of image storage and transmission. It is especially useful to alleviate congestion in the downlinks of planes and satellites, where these images are usually taken from. A novel compression algorithm is presented here. It first spectrally decorrelates the image using Vector Quantization and Principal Component Analysis (PCA), and then applies JPEG2000 to the Principal Components (PCs) exploiting spatial correlations for compression. We take advantage of the fact that dimensionality reduction preserves more information in the first components, allocating more depth to the first PCs. We optimize the selection of parameters by maximizing the distortion-ratio performance across the test images. An increase of 1 to 3 dB in Signal Noise Ratio (SNR) for the same compression ratio is found over just using PCA + JPEG2000, while also speeding up compression and decompression by more than 10%. A formula is proposed which determines the configuration of the algorithm, obtaining results that range from heavily compressed-low SNR images to low compressed-near lossless ones.
TL;DR: A rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators is built and a simple penalty term is proposed that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other.
Abstract: We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and $K$-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings. (This paper is a significantly expanded version of A Spline Theory of Deep Learning from ICML 2018.)
TL;DR: The asymptotic characterization of the Gaussian NRDF is used to provide a new equivalent realization scheme with feedback, which is characterized by a resource allocation problem across the dimension of the vector source, and a predictive coding scheme via lattice quantization with subtractive dither and joint memoryless entropy coding is derived.
Abstract: We deal with zero-delay source coding of a vector-valued Gauss–Markov source subject to a mean-squared error (MSE) fidelity criterion characterized by the operational zero-delay vector-valued Gaussian rate distortion function (RDF). We address this problem by considering the nonanticipative RDF (NRDF), which is a lower bound to the causal optimal performance theoretically attainable function (or simply causal RDF) and operational zero-delay RDF. We recall the realization that corresponds to the optimal “test-channel” of the Gaussian NRDF, when considering a vector Gauss–Markov source subject to a MSE distortion in the finite time horizon. Then, we introduce sufficient conditions to show existence of solution for this problem in the infinite time horizon (or asymptotic regime). For the asymptotic regime, we use the asymptotic characterization of the Gaussian NRDF to provide a new equivalent realization scheme with feedback, which is characterized by a resource allocation (reverse-waterfilling) problem across the dimension of the vector source. We leverage the new realization to derive a predictive coding scheme via lattice quantization with subtractive dither and joint memoryless entropy coding. This coding scheme offers an upper bound to the operational zero-delay vector-valued Gaussian RDF. When we use scalar quantization, then for $r$ active dimensions of the vector Gauss–Markov source the gap between the obtained lower and theoretical upper bounds is less than or equal to $0.254r + 1$ bits/vector. However, we further show that it is possible when we use vector quantization, and assume infinite dimensional Gauss–Markov sources to make the previous gap to be negligible, i.e., Gaussian NRDF approximates the operational zero-delay Gaussian RDF. We also extend our results to vector-valued Gaussian sources of any finite memory under mild conditions. Our theoretical framework is demonstrated with illustrative numerical experiments.
TL;DR: GradiveQ as discussed by the authors leverages the strong linear correlations between CNN gradients, and proposes a gradient vector quantization technique to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction.
Abstract: Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all-reduce (RAR), mainly due to their inability to directly aggregate compressed gradients. In this paper, we empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. GradiveQ enables direct aggregation of compressed gradients, hence allows us to build a distributed learning system that parallelizes GradiveQ gradient compression and RAR communications. Extensive experiments on popular CNNs demonstrate that applying GradiveQ slashes the wall-clock gradient aggregation time of the original RAR by more than 5x without noticeable accuracy loss, and reduce the end-to-end training time by almost 50%. The results also show that \GradiveQ is compatible with scalar quantization techniques such as QSGD (Quantized SGD), and achieves a much higher speed-up gain under the same compression ratio.
TL;DR: Wu et al. as discussed by the authors developed a deep learning-based compression model to reduce the data rate of multichannel action potentials, which is built upon a deep compressive autoencoder with discrete latent embeddings.
Abstract: Objective Understanding the coordinated activity underlying brain computations requires large-scale, simultaneous recordings from distributed neuronal structures at a cellular-level resolution. One major hurdle to design high-bandwidth, high-precision, large-scale neural interfaces lies in the formidable data streams (tens to hundreds of Gbps) that are generated by the recorder chip and need to be online transferred to a remote computer. The data rates can require hundreds to thousands of I/O pads on the recorder chip and power consumption on the order of Watts for data streaming alone. One of the solutions is to reduce the bandwidth of neural signals before transmission. Approach We developed a deep learning-based compression model to reduce the data rate of multichannel action potentials. The proposed compression model is built upon a deep compressive autoencoder (CAE) with discrete latent embeddings. The encoder network of CAE is equipped with residual transformations to extract representative features from spikes, which are mapped into the latent embedding space and updated via vector quantization (VQ). The indexes of VQ codebook are further entropy coded as the compressed signals. The decoder network reconstructs spike waveforms with high quality from the quantized latent embeddings through stacked deconvolution. Main results Extensive experimental results on both synthetic and in vivo datasets show that the proposed model consistently outperforms conventional methods that utilize hand-crafted features and/or signal-agnostic transformations and compressive sensing by achieving much higher compression ratios (20-500×) and better or comparable reconstruction accuracies. Testing results also indicate that CAE is robust against a diverse range of imperfections, such as waveform variation and spike misalignment, and has minor influence on spike sorting accuracy. Furthermore, we have estimated the hardware cost and real-time performance of CAE and shown that it could support thousands of recording channels simultaneously without excessive power/heat dissipation. Significance The proposed model can reduce the required data transmission bandwidth in large-scale recording experiments and maintain good signal qualities, which will be helpful to design power-efficient and lightweight wireless neural interfaces. We have open sourced the code implementation of the work at https://github.com/tong-wu-umn/spike-compression-autoencoder.
TL;DR: A novel complexity-constrained distributed variable-rate quantized CS method, which minimizes a weighted sum between the mean square error signal reconstruction distortion and the average encoding rate.
Abstract: This paper addresses lossy distributed source coding for acquiring correlated sparse sources via compressed sensing (CS) in wireless sensor networks. Noisy CS measurements are separately encoded at a finite rate by each sensor, followed by the joint reconstruction of the sources at the decoder. We develop a novel complexity-constrained distributed variable-rate quantized CS method, which minimizes a weighted sum between the mean square error signal reconstruction distortion and the average encoding rate. The encoding complexity of each sensor is restrained by pre-quantizing the encoder input, i.e., the CS measurements, via vector quantization. Following the entropy-constrained design, each encoder is modeled as a quantizer followed by a lossless entropy encoder, and variable-rate coding is incorporated via rate measures of an entropy bound. For a two-sensor system, necessary optimality conditions are derived, practical training algorithms are proposed, and complexity analysis is provided. Numerical results show that the proposed method achieves superior compression performance as compared with baseline methods, and lends itself to versatile setups with different performance requirements.
TL;DR: This paper empirically demonstrate the strong linear correlations between CNN gradients, and proposes a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction.
Abstract: Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all-reduce (RAR), mainly due to their inability to directly aggregate compressed gradients. In this paper, we empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. GradiVeQ enables direct aggregation of compressed gradients, hence allows us to build a distributed learning system that parallelizes GradiVeQ gradient compression and RAR communications. Extensive experiments on popular CNNs demonstrate that applying GradiVeQ slashes the wall-clock gradient aggregation time of the original RAR by more than 5X without noticeable accuracy loss, and reduces the end-to-end training time by almost 50%. The results also show that GradiVeQ is compatible with scalar quantization techniques such as QSGD (Quantized SGD), and achieves a much higher speed-up gain under the same compression ratio.
TL;DR: An autoencoder model with a latent space defined by a hierarchy of categorical variables, utilizing a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value.
Abstract: Scripts define knowledge about how everyday scenarios (such as going to a restaurant) are expected to unfold. One of the challenges to learning scripts is the hierarchical nature of the knowledge. For example, a suspect arrested might plead innocent or guilty, and a very different track of events is then expected to happen. To capture this type of information, we propose an autoencoder model with a latent space defined by a hierarchy of categorical variables. We utilize a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value. This permits the decoder to softly decide what portions of the latent hierarchy to condition on by attending over the value embeddings for a given setting. Our model effectively encodes and generates scripts, outperforming a recent language modeling-based method on several standard tasks, and allowing the autoencoder model to achieve substantially lower perplexity scores compared to the previous language modeling-based method.
TL;DR: The proposed system overcomes the challenge of the DME grading and demonstrates a promising effectiveness, and the state-of-the-art approaches are compared in terms of performance.
Abstract: Background Diabetic macular edema (DME) is one of the severe complication of diabetic retinopathy causing severe vision loss and leads to blindness in severe cases if left untreated. Objective To grade the severity of DME in retinal images. Methods Firstly, the macular is localized using its anatomical features and the information of the macula location with respect to the optic disc. Secondly, a novel method for the exudates detection is proposed. The possible exudate regions are segmented using vector quantization technique and formulated using a set of feature vectors. A semi-supervised learning with graph based classifier is employed to identify the true exudates. Thirdly, the disease severity is graded into different stages based on the location of exudates and the macula coordinates. Results The results are obtained with the mean value of 0.975 and 0.942 for accuracy and F1-scrore, respectively. Conclusion The present work contributes to macula localization, exudate candidate identification with vector quantization and exudate candidate classification with semi-supervised learning. The proposed method and the state-of-the-art approaches are compared in terms of performance, and experimental results show the proposed system overcomes the challenge of the DME grading and demonstrate a promising effectiveness.
TL;DR: This paper reviewed various speech emotion database and reviewed various algorithms available on SER including hidden Markov model, Gaussian mixture mdoel, vector quantization, artificial neural networks, and deep neural networks.
Abstract: In recent years, there is a growing interest in speech emotion recognition (SER) by analyzing input speech. SER can be considered as simply pattern recognition task which includes features extraction, classifier, and speech emotion database. The objective of this paper is to provide a comprehensive review on various literature available on SER. Several audio features are available, including linear predictive coding coefficients (LPCC), Mel-frequency cepstral coefficients (MFCC), and Teager energy based features. While for classifier, many algorithms are available including hidden Markov model (HMM), Gaussian mixture mdoel (GMM), vector quantization (VQ), artificial neural networks (ANN), and deep neural networks (DNN). In this paper, we also reviewed various speech emotion database. Finally, recent related works on SER using DNN will be discussed.
TL;DR: This paper proposes an event-based method to train a feedforward spiking neural network (SNN) layer for extracting visual features with low reconstruction loss comparable with state-of-the-art visual coding approaches, yet the rule is local in both time and space, thus biologically plausible and hardware friendly.
TL;DR: A generic, low-cost, mm-wave radar-based gesture recognition system employing a two-radar setup that increases the estimation accuracy by 8-9%, arguing that performance suffers due to inaccurate AoA estimation.
Abstract: Gesture recognition is gaining attention as an attractive feature for the development of ubiquitous, context-aware, IoT applications. Use of radars as a primary or secondary system is tempting, as they can operate in darkness, high light intensity environments, and longer distances than many competitor systems. Starting from this observation, we present a generic, low-cost, mm-wave radar-based gesture recognition system. Among potential benefits of mm-wave radars are a high spatial resolution due to small wavelength, the availability of multiple antennas in a small area and the low interference due to the natural attenuation of mm-wave radiation. We experimentally evaluate our COTS solution considering eight different gestures and using two low-complexity classification algorithms: the unsupervised Self Organized Map (SOM) and the supervised Learning Vector Quantization (LVQ). To test robustness, we consider gestures performed by a human hand and a human body, at short and long distance. From our preliminary evaluations, we observe that LVQ and SOM correctly detect 75% and 60% of all gestures, respectively, from the raw, unprocessed data. The detection rate is significantly higher (>90%) for selected gesture groups. We argue that performance suffers due to inaccurate AoA estimation. Accordingly, we evaluate our system employing a two-radar setup that increases the estimation accuracy by 8-9%.
TL;DR: In this paper, the authors apply quantization techniques in many challenging finance applications, including pricing claims with path dependence and early exercise features, stochastic optimal control, filtering filtering, and stochastically optimal control.
Abstract: Quantization techniques have been applied in many challenging finance applications, including pricing claims with path dependence and early exercise features, stochastic optimal control, filtering ...
TL;DR: A brief overview of the speaker verification system with feature extraction and speaker modeling is presented, which is the authentication of individuals by doing analysis on speech utterances by relying upon speaker modeling.
Abstract: Biometrics is used as a form of identification in many access control systems. Some of them are fingerprint, iris, face, speech, and retina. Speech biometrics is used for speaker verification. Speech is the most convenient way to communicate with person and machine, so it plays a vital role in signal processing. Automatic speaker verification is the authentication of individuals by doing analysis on speech utterances. Speaker verification falls into pattern matching problem. Many technologies are used for processing and storing voice prints. Some of them are Frequency Estimation, Hidden Markov Models, Gaussian Mixture Models, Neural Networks, Vector Quantization, and Decision Trees. Mainly speaker verification depends upon speaker modeling and this paper represents a brief overview of the speaker verification system with feature extraction and speaker modeling. Bob spear toolkit is used for evaluation and experiment for the result and analysis. Bob spear is an open-source toolkit for speech processing. For evaluation purpose, three algorithms are proposed which are GMM, ISV, and JFA with the same preprocessing and feature extraction techniques.
TL;DR: The paper makes an effort to discuss different speaker modeling techniques like Vector Quantization (VQ), Gaussian Mixture Model (GMM).
Abstract: Speaker Recognition is the process of recognizing the speaker from the individual's speech biometrics. The voice characteristics of every speaker are different and thus can be used to construct a model. This model is later used to recognize an enrolled speaker from the list of available speakers. The paper makes an effort to discuss different speaker modeling techniques like Vector Quantization (VQ), Gaussian Mixture Model (GMM)., Neural Networks (NN)., etc. Also., different techniques for extraction of voice characteristics like Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC) are discussed. Further, an in-depth analysis of these surveyed techniques is made to identify their advantages and limitations. The work in the field of Speaker Recognition Systems began in the 1950's and is evolving since then, it has wide applications in the fields of security, forensics., authentication etc.
TL;DR: A steganographic scheme that employs vector quantization (VQ) transformation and the least significant bits (LSB) to embed secret data into a cover image and produces stego-images with slightly better quality in terms of PSNR.
Abstract: Internet of things (IoT) realizes the concept of bringing things connected together. Data are exchanged and controlled within one or more IoT networks. Sensitive data transferred between different IoT networks may also lead to data leakage. One way to reduce the risk of these problems is to employ steganography while delivering secret information over the IoT networks. This paper presents a steganographic scheme that employs vector quantization (VQ) transformation and the least significant bits (LSB) to embed secret data into a cover image. We devise a new technique, namely two-level encoding, to separate the pixels of a $$4\times 4$$
VQ-transformed image block into two groups, the LSB group and the secret data group, in the first level. Then we use an indirect approach that embeds VQ indexes in the LSB group and secret data in the secret data group in the second level. The embedded VQ indexes are used to represent the VQ-transformed image blocks, and the secret data are used as the difference values to adjust the VQ-transformed image blocks and to create stego-image blocks, such that the stego-image blocks become more similar to the original image blocks after embedding. Compared with other similar work, the experimental results show that the proposed scheme produces stego-images with slightly better quality in terms of PSNR; the experimental results also indicate that it provides about ten times as large as the embedding capacity of the prior similar schemes. Moreover, the proposed scheme is able to pass the popular detections, such as Chi-square test and AUMP LSB, both to detect whether an image uses LSB for data hiding.
TL;DR: In this article, a quantile-based estimator is presented, which is based on the Gauss-Markov theorem, and applied to both direct current and alternate current input signals with unknown characteristics.
Abstract: The estimation of signal parameters using quantized data is a recurrent problem in electrical engineering. As an example, this includes the estimation of a noisy constant value and of the parameters of a sinewave, that is, its amplitude, initial record phase, and offset. Conventional algorithms, such as the arithmetic mean, in the case of the estimation of a constant, are known not to be optimal in the presence of quantization errors. They provide biased estimates if particular conditions regarding the quantization process are not met, as it usually happens in practice. In this paper, a quantile-based estimator is presented, which is based on the Gauss-Markov theorem. The general theory is first described and the estimator is then applied to both direct current and alternate current input signals with unknown characteristics. Using simulations and experimental results, it is shown that the new estimator outperforms conventional estimators in both problems, by removing the estimation bias.
TL;DR: A system that can perform gender classification in most reliable, efficient and robust way is proposed that is combination of image processing algorithm and data mining methodologies.
Abstract: Face recognition is widely used in many applications and has been researched a lot since decades. Face recognition in combination with gender classification is the need for today's world. Gender classification has many specifications which have to be understood and calculated. This paper has proposed a system that can perform gender classification in most reliable, efficient and robust way. The technique is combination of image processing algorithm and data mining methodologies. The system applies the standard steps of image processing such as acquisition, pre-processing, feature extraction using LBG vector quantization method, the extracted features are passed to the data mining algorithms like Naive Bayes, SVM Poly Kernel, SVM RDF Kernel and KNN for classification. Classification results are obtained for above classification techniques and analysis is performed on these results.
TL;DR: A sub-selection based matrix manipulation algorithm is proposed, which can significantly reduce the computational cost of code learning and justify the resulting sub-selective quantization by proving its theoretic properties.
Abstract: Recently with the explosive growth of visual content on the Internet, large-scale image search has attracted intensive attention It has been shown that mapping high-dimensional image descriptors to compact binary codes can lead to considerable efficiency gains in both storage and performing similarity computation of images However, most existing methods still suffer from expensive training devoted to large-scale binary code learning To address this issue, we propose a sub-selection based matrix manipulation algorithm, which can significantly reduce the computational cost of code learning As case studies, we apply the sub-selection algorithm to several popular quantization techniques including cases using linear and nonlinear mappings Crucially, we can justify the resulting sub-selective quantization by proving its theoretic properties Extensive experiments are carried out on three image benchmarks with up to one million samples, corroborating the efficacy of the sub-selective quantization method in terms of image retrieval
TL;DR: The framework and the respective theoretical considerations and justifications before finalizing the numerical experiments hope to jump-start the incorporation of prototype-based learning in neural networks and vice versa.
Abstract: Neural networks currently dominate the machine learning community and they do so for good reasons. Their accuracy on complex tasks such as image classification is unrivaled at the moment and with recent improvements they are reasonably easy to train. Nevertheless, neural networks are lacking robustness and interpretability. Prototype-based vector quantization methods on the other hand are known for being robust and interpretable. For this reason, we propose techniques and strategies to merge both approaches. This contribution will particularly highlight the similarities between them and outline how to construct a prototype-based classification layer for multilayer networks. Additionally, we provide an alternative, prototype-based, approach to the classical convolution operation. Numerical results are not part of this report, instead the focus lays on establishing a strong theoretical framework. By publishing our framework and the respective theoretical considerations and justifications before finalizing our numerical experiments we hope to jump-start the incorporation of prototype-based learning in neural networks and vice versa.
TL;DR: This work proposes an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS, which exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty.
Abstract: Multiple-point statistics (MPS) is a prominent algorithm to simulate categorical variables based on a sequential simulation procedure. Assuming training images (TIs) as prior conceptual models, MPS extracts patterns from TIs using a template and records their occurrences in a database. However, complex patterns increase the size of the database and require considerable time to retrieve the desired elements. In order to speed up simulation and improve simulation quality over state-of-the-art MPS methods, we propose an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS. First, a variable representation is presented to make categorical variables applicable for vector quantization. Second, we adopt a tree-structured VQ to compress the database so that stationary simulations are realized. Finally, a transformed template and classified VQ are used to address nonstationarity. A two-dimensional (2D) stationary channelized reservoir image is used to validate the proposed VQ-MPS. In comparison with several existing MPS programs, our method exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty. Further demonstrations consist of a 2D four facies simulation, two 2D nonstationary channel simulations, and a three-dimensional (3D) rock simulation. The results reveal that our proposed method is also capable of solving multifacies, nonstationarity, and 3D simulations based on 2D TIs.
TL;DR: Two fragile watermarking schemes for tamper localization in digital images can resist collage, vector quantization, content only and constant average attacks and provide better tamper detection rate.
Abstract: This paper presents two fragile watermarking schemes for tamper localization in digital images. In the first scheme, the unprotected image is divided into $8\times 8$ blocks. A 128 bit hash value is computed for each block by using MD5 algorithm. They are embedded into the two LSBs of the pixel value. The same pattern is used in the second scheme with slight modifications; the block size is $16\times 16$ , SHA-256 algorithm is used to generate a 256 bit hash value and they are inserted into the LSB of the pixel value. Experimental results show that the schemes can resist collage, vector quantization, content only and constant average attacks. The schemes provide better tamper detection rate.
TL;DR: The proposed hybrid approach achieves fusion of the conventional global and patch-based approaches for target representation to synergize the advantages of both approaches and outperforms all the state-of-the-art algorithms in all considered scenarios.
Abstract: Arbitrary object tracking is a challenging task in computer vision, as many factors affecting the target representation must be considered. A target template based on only the global appearance or on only the local appearance is unable to capture the discriminating information required for the robust performance of a tracker. In this paper, the target appearance is represented using a hybrid of global and local appearances along with a framework to exploit the Integral Channel Features (ICF). The proposed hybrid approach achieves fusion of the conventional global and patch-based approaches for target representation to synergize the advantages of both approaches. The ICF approach under the hybrid approach integrates heterogeneous sources of information of the target and provides feature strength to the hybrid template. The use of ICF also expedites the extraction of the structural and color features from video frames as the features are collected over multiple channels. The target appearance representation is updated based on only samples with appearances similar to the target appearance using clustering and vector quantization. These factors offer the proposed algorithm robustness to occlusion, illumination changes, and in-plane rotation. Further experimentation analyzes the effects of a change in the scale of the bounding box on the tracking performance of the proposed algorithm. The proposed approach outperforms all the state-of-the-art algorithms in all considered scenarios.
TL;DR: A novel reversible data hiding scheme for Two-stage VQ (Vector quantization) compressed images based on SOC (Search-order coding) scheme is proposed, which improves VQ by obtaining better reconstructed image and generating indices with higher correlation.
TL;DR: CVQ-SA algorithm with codebook optimization by Simulated Annealing for the compression of CT images was validated in terms of metrics like Peak to Signal Noise Ratio, Mean Square Error and Compression Ratio and the result was superior when compared with classical VQ, CVQ, JPEG lossless and JPEG lossy algorithms.
Abstract: The role of compression is vital in telemedicine for the storage and transmission of medical images. This work is based on Contextual Vector Quantization (CVQ) compression algorithm with codebook optimization by Simulated Annealing (SA) for the compression of CT images. The region of interest (foreground) and background are separated initially by region growing algorithm. The region of interest is encoded with low compression ratio and high bit rate; the background region is encoded with high compression ratio and low bit rate. The codebook generated from foreground and background is merged, optimized by simulated annealing algorithm. The performance of CVQ-SA algorithm was validated in terms of metrics like Peak to Signal Noise Ratio (PSNR), Mean Square Error (MSE) and Compression Ratio (CR), the result was superior when compared with classical VQ, CVQ, JPEG lossless and JPEG lossy algorithms. The algorithms are developed in Matlab 2010a and tested on real-time abdomen CT datasets. The quality of reconstructed image was also validated by metrics like Structural Content (SC), Normalized Absolute Error (NAE), Normalized Cross Correlation (NCC) and statistical analysis was performed by Mann Whitney U Test. The outcome of this work will be an aid in the field of telemedicine for the transfer of medical images.
TL;DR: In this paper, a new lossy compression scheme is proposed which employs codebook concept for the generation of the codebook, a new technique denoted as ABC-GA technique which is a combination of artificial bee colony and genetic algorithms is employed.
Abstract: In recent years, the volume of image data that are being employed for Internet and other applications has been increasing at an enormous rate. To cope up with the existing limitations on the storage space and the network bandwidth, it has become necessary to develop more efficient compression techniques. Lossy compression is more popular compared to lossless compression as it is more widely used in a variety of applications. In lossy compression, it is necessary to maintain the quality of the reconstructed image when the compression scheme is applied. Thus, compression ratio and the reconstructed image quality are the two important parameters based on which the performance of a lossy compression scheme is judged. In this paper, a new lossy compression scheme is proposed which employs codebook concept. For the generation of the codebook, a new technique denoted as ABC-GA technique which is a combination of artificial bee colony and genetic algorithms is employed. The performance of the proposed compression scheme is evaluated using two different types of databases, namely, CLEF med 2009 and standard images (Lena, Barbara etc.). The experimental results show that the proposed technique performs better than the existing algorithms yielding average PSNR values of 43.05, 41.58, 40.06, 37.41, 35.24 for compression ratios 10, 20, 40, 60, 80 respectively in the case of standard images.
TL;DR: This project will be done speaker recognition technique (Speaker Recognition) to be able to classify the speaker’s voice in the evidence and the voice of the suspect, using the Learning Vector Quantization Neural Network method.
Abstract: Presently, Biometric features are often used to identify suspects in law enforcement processes. One of these biometric features is Speaker Recognition. Speaker recognition is used to discriminate people by their voice. In this study, the problem that can be solved is how to classify audio sample that exist on the evidence with the voice of the suspect.In this final project is made a application’s prototype that can be used to classify and in that case will be done speaker recognition technique (Speaker Recognition) to be able to classify the speaker’s voice in the evidence and the voice of the suspect. The stages used to compare the sound is by extracting the sound features using the Mel-frequency Cepstral Coefficients (MFCC) method and using the Learning Vector Quantization Neural Network (JST-LVQ) method as the classification method of the voice extraction result.By using LVQ, the accuracy in recognition the speaker’s voice is pretty good. The use of LVQ method produces best accuracy at 73,33% to recognize the speaker that with the same sentence, and 46,67% for different sentence. So the results obtained in accordance with the expected.
TL;DR: Results show higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similiraty Index Measure (SSIM) indicating better reconstruction and the superiority of proposed hybrid Adaptive Differential Evolution and Pattern Search (hADE-PS) optimized vector quantization over DE is demonstrated.
Abstract: A novel Vector Quantization (VQ) technique for encoding the Bi-orthogonal wavelet decomposed image using hybrid Adaptive Differential Evolution (ADE) and a Pattern Search optimization algorithm (hADEPS) is proposed. ADE is a modified version of Differential Evolution (DE) in which mutation operation is made adaptive based on the ascending/descending objective function or fitness value and tested on twelve numerical benchmark functions and the results are compared and proved better than Genetic Algorithm (GA), ordinary DE and FA. ADE is a global optimizer which explore the global search space and PS is local optimizer which exploit a local search space, so ADE is hybridized with PS. In the proposed VQ, in a codebook of codewords, 62.5% of codewords are assigned and optimized for the approximation coefficients and the remaining 37.5% are equally assigned to horizontal, vertical and diagonal coefficients. The superiority of proposed hybrid Adaptive Differential Evolution and Pattern Search (hADE-PS) optimized vector quantization over DE is demonstrated. The proposed technique is compared with DE based VQ and ADE based quantization and with standard LBG algorithm. Results show higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similiraty Index Measure (SSIM) indicating better reconstruction.
TL;DR: In this paper, the authors propose a method for learning policies and compact state representations separately but simultaneously for policy approximation in reinforcement learning, where the encoder autonomously selects observations online to train on, in order to maximize code sparsity.
Abstract: Deep reinforcement learning, applied to vision-based problems like Atari games, maps pixels directly to actions; internally, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it. By separating the image processing from decision-making, one could better understand the complexity of each task, as well as potentially find smaller policy representations that are easier for humans to understand and may generalize better. To this end, we propose a new method for learning policies and compact state representations separately but simultaneously for policy approximation in reinforcement learning. State representations are generated by an encoder based on two novel algorithms: Increasing Dictionary Vector Quantization makes the encoder capable of growing its dictionary size over time, to address new observations as they appear in an open-ended online-learning context; Direct Residuals Sparse Coding encodes observations by disregarding reconstruction error minimization, and aiming instead for highest information inclusion. The encoder autonomously selects observations online to train on, in order to maximize code sparsity. As the dictionary size increases, the encoder produces increasingly larger inputs for the neural network: this is addressed by a variation of the Exponential Natural Evolution Strategies algorithm which adapts its probability distribution dimensionality along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on the game's controls). These are still capable of achieving results comparable---and occasionally superior---to state-of-the-art techniques which use two orders of magnitude more neurons.