TL;DR: Vector quantization is intrinsically superior to predictive coding, transform coding, and other suboptimal and {\em ad hoc} procedures since it achieves optimal rate distortion performance subject only to a constraint on memory or block length of the observable signal segment being encoded.
Abstract: Vector quantization is intrinsically superior to predictive coding, transform coding, and other suboptimal and {\em ad hoc} procedures since it achieves optimal rate distortion performance subject only to a constraint on memory or block length of the observable signal segment being encoded. The key limitation of existing techniques is the very large randomly generated code books which must be stored, and the computational complexity of the associated encoding procedures. The quantization operation is decomposed into its rudimentary structural components. This leads to a simple and elegant approach to derive analytical properties of optimal quantizers. Some useful properties of quantizers and algorithmic approaches are given, which are relevant to the complexity of both storage and processing in the encoding operation. Highly disordered quantizers, which have been designed using a clustering algorithm, are considered. Finally, lattice quantizers are examined which circumvent the need for a code book by using a highly structured code based on lattices. The code vectors are algorithmically generated in a simple manner rather than stored in a code book, and fast algorithms perform the encoding algorithm with negligible complexity.
TL;DR: Experimental results show that the quantizer performance is very close to a theoretically predicted asymptotically optimal rate distortion relationship for Euclidean distance measures.
Abstract: In this paper, we present a multiple stage vector quantization technique which allows easy expansion of the original vector quantizer design to operate at higher bit rates for lower distortion. The computation and storage reduction is achieved by the fact that the overall requirements are the sum of the requirements of each stage instead of an exponentially increasing function of the bit rate as in the original one stage design. In the case of Euclidean distance measures such as the log area ratio measure, experimental results show that the quantizer performance is very close to a theoretically predicted asymptotically optimal rate distortion relationship.
TL;DR: Finite-state vector quantizers are designed and simulated for Gauss-Markov sources and sampled speech data, and the resulting performance and storage requirements are compared with ordinary memoryless vector quantization.
Abstract: A finite-state vector quantizer is a finite-state machine used for data compression: Each successive source vector is encoded into a codeword using a minimum distortion rule, and into a code book, depending on the encoder state. The current state and the selected codeword then determine the next encoder state. A finite-state vector quantizer is capable of making better use of the memory in a source than is an ordinary memoryless vector quantizer of the same dimension or blocklength. Design techniques are introduced for finite-state vector quantizers that combine ad hoc algorithms with an algorithm for the design of memoryless vector quantizers. Finite-state vector quantizers are designed and simulated for Gauss-Markov sources and sampled speech data, and the resulting performance and storage requirements are compared with ordinary memoryless vector quantization.
TL;DR: Asymptotic results from the statistical theory of k -means clustering are applied to problems of vector quantization and the behavior of quantizers constructed from long training sequences of data is analyzed.
Abstract: Asymptotic results from the statistical theory of k -means clustering are applied to problems of vector quantization. The behavior of quantizers constructed from long training sequences of data is analyzed by relating it to the consistency problem for k -means.
TL;DR: A comparison of the results for the real speech and the simulated speech provides a quantitative measure of the accuracy of such models and, hence, of the applicability of information theory bounds and code designs based on probabilistic models.
Abstract: An algorithm for the design of vector quantizers that are locally optimum in the sense of minimizing an average quantitative distortion measure is used to design 1 and 2 bit/sample vector quantizers for both real sampled speech and a simulated speech-like auto-regressive random process. Both weighted and unweighted squared-error distortion measures are considered. Several comparisons are made and discussed based on the average distortions of the vector quantization schemes. The results for the simulated speech are compared to mathematical performance bounds from information theory to provide an indication of how nearly globally optimal vector quantization is for such highly correlated sources. A comparison of the results for the real speech and the simulated speech provides a quantitative measure of the accuracy of such models and, hence, of the applicability of information theory bounds and code designs based on probabilistic models. The signal-to-quantization-noise ratios of vector quantizers designed to minimize squared-error distortion are compared to those of several popular speech waveform coding systems of similar rates.
TL;DR: Vector quantization appears to be a powerful and promising technique for image coding and results for coding rates from 0.5 to 1.5 bits/pixel are discussed.
Abstract: An image is partitioned into cells of pxp pixels. Each cell is regarded as a vector of dimension p2and is encoded by searching through a codebook for a nearest matching representative vector. A binary word identifying the selected representative vector is assigned as the codeword to describe the original cell. The decoder uses this codeword to address a codebook. Each entry of the codebook contains a full precision digital representation of one of the N representative vectors. The codebook design is based on a clustering technique for vector quantizer design preceded by a classification of training cells into edge or shade cells. Results for coding rates from 0.5 to 1.5 bits/pixel are discussed. Vector quantization appears to be a powerful and promising technique for image coding.
TL;DR: The approach is a generalization of a recently developed speech coding technique called speech coding by vector quantization based on the minimization of cross-entropy, and can be viewed as a refinement of a general classification method due to Kullback.
Abstract: This paper considers the problem of classifying an input vector of measurements by a nearest neighbor rule applied to a fixed set of vectors. The fixed vectors are sometimes called characteristic feature vectors, codewords, cluster centers, models, reproductions, etc. The nearest neighbor rule considered uses a non-Euclidean information-theoretic distortion measure that is not a metric, but that nevertheless leads to a classification method that is optimal in a well-defined sense and is also computationally attractive. Furthermore, the distortion measure results in a simple method of computing cluster centroids. Our approach is based on the minimization of cross-entropy (also called discrimination information, directed divergence, K-L number), and can be viewed as a refinement of a general classification method due to Kullback. The refinement exploits special properties of cross-entropy that hold when the probability densities involved happen to be minimum cross-entropy densities. The approach is a generalization of a recently developed speech coding technique called speech coding by vector quantization.
TL;DR: The distortion performance of the vector quantization approach for LPC voice coding is examined both analytically and experimentally to show its relationship with the residual minimization process in LPC analysis.
Abstract: The distortion performance of the vector quantization approach for LPC voice coding is examined both analytically and experimentally. Analytically, interpretations of the interparameter coupling effects of a distortion measure and the clustering nature of the algorithm for LPC vector quantization are obtained to show its relationship with the residual minimization process in LPC analysis. Experimentally, a large database of speech is used to compare its performance and properties to scalar quantization. The results lend further insight into the superior performance of vector quantization.
TL;DR: It is shown that at least three dimensions are required for a vector quantizer to outperform a scalar quantizer for this source and multiple distinct local optima are demonstrated.
Abstract: Two results are presented on vector quantizers meeting necessary conditions for optimality. First a simple generalization of well-known centroid and moment properties of the squared-error distortion measure to a weighted quadratic distortion measure with an input dependent weighting is presented. The second result is an application of the squared-error special case of the first result to a simulation study of the design of 1 bit per sample two- and three-dimensional quantizers for a memoryless Gaussian source using the generalized Lloyd technique. The existence of multiple distinct local optima is demonstrated, thereby showing that sufficient conditions for unique local optima do not exist for this simple common case. It is also shown that at least three dimensions are required for a vector quantizer to outperform a scalar quantizer for this source.
TL;DR: The type of stochastic stability obtained gives almost-sure convergence of time averages of functions of the joint input-state-output process.
Abstract: Feedback quantization schemes (such as delta modulation. adaptive quantization, differential pulse code modulation (DPCM), and adaptive differential pulse code modulation (ADPCM) encode an information source by quantizing the source letter at each time i using a quantizer, which is uniquely determined by examining some function of the past outputs and inputs called the state of the encoder at time i . The quantized output letter at time i is fed back to the encoder, which then moves to a new state at time i+1 which is a function of the state at time i and the encoder output at time i . In an earlier paper a stochastic stability result was obtained for a class of feedback quantization schemes which includes delta modulation and some adaptive quantization schemes. In this paper a similar result is obtained for a class of feedback quantization schemes which includes linear DPCM and some ADPCM encoding schemes. The type of stochastic stability obtained gives almost-sure convergence of time averages of functions of the joint input-state-output process. This is stronger than the type of stochastic stability obtained previously by Gersho, Goodman, Goldstein, and Liu, who showed convergence in distribution of the time i input-state-output as i \rightarrow \infty .
TL;DR: The paper reports also on results of MSC coding of speech, where both the strategy of adaptive quantization and of adaptive prediction were included in coder design.
Abstract: This paper deals with the application of multipath search coding (MSC) concepts to the coding of stationary memoryless and correlated sources and of speech signals at a rate of one bit per sample. We have made use of three MSC classes: 1) codebook coding (vector quantization), 2) tree coding, and 3) trellis coding. This paper explains the performances of these coders and compares them both with those of conventional coders and with rate-distortion bounds. Figs. 2 and 3 demonstrate the potentials of MSC coding strategies. The paper reports also on results of MSC coding of speech, where both the strategy of adaptive quantization and of adaptive prediction were included in coder design.
TL;DR: Vector quantizers of one and two bits per sample are designed for a training sequence of 640000 speech samples and tested on a speaker not in the training sequence and the tree searched codes are considered.
Abstract: Vector quantizers of one and two bits per sample are designed for a training sequence of 640000 speech samples and tested on a speaker not in the training sequence. Both full search vector quantizers and tree search vector quantizers are considered. The tree searched codes are suboptimal in an information theory sense, but they have a greatly reduced search effort and provide a vector successive approximation quantizer.
TL;DR: An isolated word recognizer based on vector quantization at the acoustic level and on stochastic modeling at the phonetic level is described and results obtained are encouraging and suggest that further optimization is possible.
Abstract: An isolated word recognizer based on vector quantization at the acoustic level and on stochastic modeling at the phonetic level is described. The power of this approach lies in its best utilization of the training data. The first experimental results obtained are encouraging and suggest that further optimization is possible.
TL;DR: The design and simulation of a multirate voice digitizer (MRVD) that switches between two speech compression systems, each based on a recently developed vector quantization (VQ) coding technique, which is shown to have a simpler architecture and to provide comparable speech quality.
Abstract: The importance of integrating voice and data over digital networks has increased during the last few years primarily because of the growing popularity of such networks. Of particular interest are efficient voice digitizing terminals, capable of operating at various data rates in both circuit-switched and packet-switched data networks. Several such terminals, including two or more speech compression algorithms, have been proposed and implemented. Typically the terminal switches between a low-rate (500 - 4000 bits/s) vocoding scheme and a medium-rate (7000 - 16000 bits/s) waveform coding algorithm, depending on, among other things, the network congestion and on the desired voice quality and robustness. We here describe the design and simulation of a multirate voice digitizer (MRVD) that switches between two speech compression systems, each based on a recently developed vector quantization (VQ) coding technique. This technique consists of the off-line interactive design of a codebook minimizing an average distortion measure, followed by the use of the codebook in an on-line nearest neighbor encoding scheme. One of the two systems is a rate-distortion speech coder that resembles a linear predictive coding (LPC) speech compression system but has a much lower rate (800 bits/s and below). We call this the LPC-VQ system, and it is similar to other previously reported systems [15],[19],[21]. The only difference is that the LPC parameters are extracted using the Burg method instead of the autocorrelation method. We here show that this provides both qualitative and quantitative improvements. The other system of our MRVD is a residual-excited linear predictive (RELP) speech compression system using VQ in both model selection and residual digitization. The residual waveform is digitized at 1 or 2 bits/sample, resulting in rates of 7300 and 13800 bits/s, respectively. We call this the RELP-VQ system. When compared to other RELP systems [6]-[8], it is shown to have a simpler architecture and to provide comparable speech quality. In a direct comparison with an APC scheme, our RELP-VQ system was determined to provide a more natural speech sound. Another interesting result presented is the quantitative comparison of the application of the VQ algorithm to the original speech waveform and its residuals.
TL;DR: Rabiner et al. as discussed by the authors proposed a method for speaker independent isolated digit recognition based on modeling entire words as discrete probabilistic functions of a Markov process, which is a three-part process comprising conventional methods of linear prediction analysis and vector quantization of the LPCs followed by an algorithm.
Abstract: A method for speaker independent isolated digit recognition based on modeling entire words as discrete probabilistic functions of a Markov process is described. Training is a three‐part process comprising conventional methods of linear prediction analysis and vector quantization of the LPCs followed by an algorithm [L. E. Baum, Inequalities 3, 1–8 (1972)] for estimating the parameters of a hidden Markov process. Recognition utilizes linear prediction and vector quantization steps prior to maximum likelihood classification based on the Viterbi algorithm [A. J. Viterbi, IEEE Trans. Inf. Theo. IT‐13, 260–269 (1967)]. After training based on a 1000‐token set, recognition experiments were conducted on a separate 1000‐token test set obtained from 100 new talkers. In this test a 3.5% error rate was observed which is comparable to that measured in an identical test of an LPC/DTW system [L. R. Rabiner et al., IEEE Trans. Acoust. Speech Signal Process. ASSP‐37, 336–349 (1979)]. The computational demand for recognit...
TL;DR: It is shown that an optimum multidimensional quantizer preserves the mean vector of the input and that the mean square quantization error is given by the sum of the component variances of theinput minus the sum-of-the- variance of the output.
Abstract: Two results in minimum mean square error quantization theory are presented. The first section gives a simplified derivation of a well-known upper bound to the distortion introduced by a k -dimensional optimum quantizer. It is then shown that an optimum multidimensional quantizer preserves the mean vector of the input and that the mean square quantization error is given by the sum of the component variances of the input minus the sum of the variances of the output.
TL;DR: A new, fast method for discrete utterance recognition of telephone bandwidth speech that obviates time normalization and uses approximately 6000 bits to represent each utterance in the recognition vocabulary is presented.
Abstract: We present a new, fast method for discrete utterance recognition of telephone bandwidth speech. The method is based on speech coding by vector quantization and minimum cross-entropy pattern classification. Separate vector quantization codebooks are designed from training sequences for each word in the recognition vocabulary. Inputs from outside the training sequence are classified by performing vector quantization and finding the codebook that achieves the lowest average distortion per speech frame. The new method obviates time normalization and uses approximately 6000 bits to represent each utterance in the recognition vocabulary. Preliminary limited testing on speaker dependent digit recognition has demonstrated excellent performance. Detailed tests are now in progress.
TL;DR: Results indicate that the proposed techniques for very low data rate compression are feasible for intelligible speech transmission at bit rates of 400 bps and 200 bps.
Abstract: : Speech compression techniques for very low data rate compression are studied. The techniques are based on a standard LPC analysis/synthesis (vocoder) system. Significant advances are made in the quantization algorithms to achieve bit rates of 200 to 400 bps. Frame predictive vector quantization is developed to compress the bit rate for the LPC model filter top under 250 bps. The vector quantization technique developed applies to continuous speech and is independent of both speaker and vocabulary. An innovative LPC compression technique, matrix quantization, is also developed to compress the LPC model filter to a rate under 150 bps. The design is applicable to continuous speech and unlimited vocabulary. At this stage of development, it is adapted to a single speaker, but theoretically it can be generalized to a selection of speakers or even the general population. Subjective evaluation of both the vector and matrix LPC quantization approaches using the diagnostic rhyme test (DRT) has been performed and the test scores are analyzed in detail. The results indicate that the proposed techniques are feasible for intelligible speech transmission at bit rates of 400 bps and 200 bps.
TL;DR: The importance of prequantization is demonstrated by the design of the optimum uniform two-dimensional (hexagonal) quantizer, which is used to design two- dimensional quantizers that operate in real time.
Abstract: The theoretical advantages of two-dimensional quantization over univariate quantization have been studied in the literature. However, in many cases there is no known implementation for the two-dimensional quantizer that can operate in real time. A new approach to the design of two-dimensional quantizers is presented. This technique, called prequantization, is used to design two-dimensional quantizers that operate in real time. The importance of prequantization is demonstrated by the design of the optimum uniform two-dimensional (hexagonal) quantizer. Additional examples are given to illustrate the flexibility of this design approach.
TL;DR: In this article, the design of an 800 bps LPC vocoder based on vector quantization is presented, and subjective evaluation under different channel error and acoustic-ambient noise conditions are discussed.
Abstract: Design of an 800 bps LPC vocoder based on vector quantization is presented. Subjective evaluation under different channel error and acoustic-ambient noise conditions are discussed. The results indicate that it preserves much of the intelligibility as well as robustness of LPC. Further reduction in bit rate is achieved by eliminating frame to frame redundancy in the vocoder parameters. Techniques include frame repeat coding and the newly developed matrix coding technique.
TL;DR: Two speech compression systems based on codebooks of inverse filters produced by off-line linear predictive coding (LPC) and vector quantization (VQ) techniques are considered.
Abstract: Two speech compression systems based on codebooks of inverse filters produced by off-line linear predictive coding (LPC) and vector quantization (VQ) techniques are considered. The first system is a pitch excited vocoder that is a variation on a speech coding system based upon vector quantization. The encoder selects an LPC reverse filter from a finite codebook that best "matches" an observed frame of sampled speech. This filter is in turn used to determine the voicing and digitized pitch information. Unlike LPC systems, the digitization is performed in a single step on the data rather than separate modeling and digitization steps. The second system is a tree encoding system that uses the filter selected by an inverse filter matching vocoder to "color" a tree that is then searched for a minimum distortion path for the original sampled speech waveform. This system can be viewed as a hybrid between an adaptive predictive coder and a universal tree encoder. The two systems are described, simulated, and compared with other similar systems.
TL;DR: The paper presents the full description and discusses the performances of a 4800 bit per second residual excited linear prediction vocoder using a type of binary-tree search vector-quantization approach and includes preference testings for comparison with other types of 4800 Kbit/sec vocoders.
Abstract: The paper presents the full description and discusses the performances of a 4800 bit per second residual excited linear prediction vocoder. The LPC analysis is efficiently performed using a type of binary-tree search vector-quantization approach. The technique, which is described in ref (1), uses a set of hyperplane equations to perform a hierarchical pattern classification of the input autocorrelation vector in the autocorrelation space. The end result of the search is the integer i 1 which is the index of the most appropriate (in the Itakura-distance sense) prediction filter out of a set of N preset filters. The search requires only \Log_{2}N dot products. In this case vector quantization presents two advantages over the classical approach of the Durbin algorithm followed by scalar quantization. First, a faster algorithm is obtained. Second, the same accuracy in filter representation is possible with less bits per second and consequently more bits can be allocated for representing the residual and gain. The residual is vector quantized in the time domain by blocks of 16 the samples according to the approach of ref (2). The 16 sample block is essentially encoded using the integer I 2 which is the index of the most appropriate 16-sample waveform out of set of M preset prototype waveforms stored in memory. The paper includes preference testings for comparison with other types of 4800 Kbit/sec vocoders. Some sample recordings will be presented at the conference. Finally, preliminary results in the attempt to implement the vocoder in real time on a MAP 200 array processor are discussed.
TL;DR: The results of the study indicate that through the use of adaptive prediction and quantization, a high level of image fidelity can be obtained for both intensity and density images at information rates well below one bit/pixel.
Abstract: This paper is a preliminary report on a study of the application of two-dimensional linear prediction in image quantization. The study has focused on three major concerns: implementation of an adaptive linear predictor, adaptive quantization of the prediction error signal, and the adaptive predictive coding of density (logarithm of intensity) images. The results of the study indicate that through the use of adaptive prediction and quantization, a high level of image fidelity can be obtained for both intensity and density images at information rates well below one bit/pixel.
TL;DR: An 800 bit/s vector quantization linear predictive coding (LPC) vocoder has been developed that preserves most of the intelligibility of an LPC system and compatibility with any LPC-10 vocoder is guaranteed.
Abstract: An 800 bit/s vector quantization linear predictive coding (LPC) vocoder has been developed. The recently developed LPC vector quantization theory is applied to reduce the bit rate for LPC coefficients coding by a factor of four. Branch search techniques and separation of voiced and unvoiced codebooks are applied for better algorithm efficiency. Differential coding is applied to reduce the bit rate for the pitch and gain parameters by one third. Formal subjective evaluation shows that the 800 bit/s vocoder preserves most of the intelligibility of an LPC system. It is also robust under different transmission error and acoustic conditions. Informal listening comparisons show the quality to be acceptable and sometimes very close to 2400 bit/s LPC speech. The computational cost of the 800 bit/s vocoder is equivalent to or even lower than the 2400 bit/s LPC-10. Compatibility with any LPC-10 vocoder is guaranteed because the 800 bit/s design only differs in the quantization and encoding algorithms. Further bit rate reduction can be achieved by removing frame to frame redundancy in the code.
TL;DR: Low-rate vector quantizers are designed and simulated for highly correlated Gauss-Markov sources and the resulting performance is compared with Arnstein's optimized predictive quantizer and with Huang and Schultheiss' optimized transform coder.
Abstract: Low-rate vector quantizers are designed and simulated for highly correlated Gauss-Markov sources and the resulting performance is compared with Arnstein's optimized predictive quantizer and with Huang and Schultheiss' optimized transform coder. Two implementations of vector quantizers are considered: full search vector quantizers-which are optimal but require large codebook searches-and tree searched vector quantizers-which are suboptimal but require far less searching. The various systems are compared on the basis of performance, complexity, and generality of design techniques.