TL;DR: The self-organizing map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications, and an algorithm which order responses spatially is reviewed, focusing on best matching cell selection and adaptation of the weight vectors.
Abstract: The self-organized map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications. The self-organizing map has the property of effectively creating spatially organized internal representations of various features of input signals and their abstractions. One result of this is that the self-organization process can discover semantic relationships in sentences. Brain maps, semantic maps, and early work on competitive learning are reviewed. The self-organizing map algorithm (an algorithm which order responses spatially) is reviewed, focusing on best matching cell selection and adaptation of the weight vectors. Suggestions for applying the self-organizing map algorithm, demonstrations of the ordering process, and an example of hierarchical clustering of data are presented. Fine tuning the map by learning vector quantization is addressed. The use of self-organized maps in practical speech recognition and a simulation experiment on semantic mapping are discussed. >
TL;DR: It is concluded that the channel-optimized vector quantizer design algorithm, if used carefully, can result in a fairly robust system with no additional delay.
Abstract: Several issues related to vector quantization for noisy channels are discussed. An algorithm based on simulated annealing is developed for assigning binary codewords to the vector quantizer code-vectors. It is shown that this algorithm could result in dramatic performance improvements as compared to randomly selected codewords. A modification of the simulated annealing algorithm for binary codeword assignment is developed for the case where the bits in the codeword are subjected to unequal error probabilities (resulting from unequal levels of error protection). An algorithm for the design of an optimal vector quantizer for a noisy channel is briefly discussed, and its robustness under channel mismatch conditions is studied. Numerical results for a stationary first-order Gauss-Markov source and a binary symmetric channel are provided. It is concluded that the channel-optimized vector quantizer design algorithm, if used carefully, can result in a fairly robust system with no additional delay. The case in which the communication channel is nonstationary (as in mobile radio channels) is studied, and some preliminary ideas for quantizer design are presented. >
TL;DR: In this article, a new voice conversion technique through vector quantization and spectrum mapping is proposed, which is based on mapping codebooks which represent the correspondence between different speakers' codebooks.
Abstract: A new voice conversion technique through vector quantization and spectrum mapping is proposed. This technique is based on mapping codebooks which represent the cor respondencebetween different speakers' codebooks. The mapping codebooks for spectrum parameters, power values, and pitch frequencies are separately generated using training utterances. This technique makes it possible to precisely control voice individuality. The performance of this technique is confirmed by spectrum distortion and pitch frequency difference. To evaluate the overall performance of this technique, listening tests are carried out on two kinds of voice conversions: one between male and female speakers, the other between male speakers. In the male-to-female conversion experiment, all converted utterances are judged as female, and in the male-to-male conversion, 57% of them are identified as the target speaker.
TL;DR: The binary switching algorithm is introduced, based on the objective of minimizing a useful upper bound on the average system distortion, which yields a significant reduction in average distortion, and converges in reasonable running times.
Abstract: A pseudo-Gray code is an assignment of n-bit binary indexes to 2" points in a Euclidean space so that the Hamming distance between two points corresponds closely to the Euclidean distance. Pseudo-Gray coding provides a redundancy-free error protection scheme for vector quantization (VQ) of analog signals when the binary indexes are used as channel symbols on a discrete memoryless channel and the points are signal codevectors. Binary indexes are assigned to codevectors in a way that reduces the average quantization distortion introduced in the reproduced source vectors when a transmitted index is corrupted by channel noise. A globally optimal solution to this problem is generally intractable due to an inherently large computational complexity. A locally optimal solution, the binary switching algorithm, is introduced, based on the objective of minimizing a useful upper bound on the average system distortion. The algorithm yields a significant reduction in average distortion, and converges in reasonable running times. The sue of pseudo-Gray coding is motivated by the increasing need for low-bit-rate VQ-based encoding systems that operate on noisy channels, such as in mobile radio speech communications. >
TL;DR: A new method for filling a color table is presented that produces pictures of similar quality as existing methods, but requires less memory and execution time.
Abstract: A new method for filling a color table is presented that produces pictures of similar quality as existing methods, but requires less memory and execution time. All colors of an image are inserted in an octree, and this octree is reduced from the leaves to the root in such a way that every pixel has a well defined maximum error. The algorithm is described in PASCAL notation.
TL;DR: A tree-structured TFM (TSTFM) is presented as a computationally inexpensive alternative to the TFM algorithm that has some new properties that prove to be useful for VQ and in the context of visual perception.
Abstract: The topological feature map (TFM) algorithm introduced by T. Kohenen (1982) implements two important properties: a vector quantization (VQ) and a topology-preserving mapping. A tree-structured TFM (TSTFM) is presented as a computationally inexpensive alternative to the TFM algorithm. The computational complexity of the TSTFM is O (log N ) rather than O ( N ) for the TFM. In addition, the TSTFM has some new properties that prove to be useful for VQ and in the context of visual perception: increased performance in VQ compared to the tree-structured VQ of A. Buzo et al. (1980) and hierarchical mapping of code vectors
TL;DR: A split vector quantization approach is used to overcome the complexity problem of LPC vector and each part is vector‐quantized separately.
Abstract: Linear prediction coding (LPC) parameters are widely used in various speech processing applications for representing the spectral envelope information of speech. For low‐bit‐rate speech coding application, it is important to quantize these parameters accurately using as few bits as possible without sacrificing the speech quality. Though the vector quantizers are more efficient than the scalar quantizers, their use for fine quantization of LPC information (using 24–26 bits/frames) is impeded due to their prohibitively high complexity. In this paper, a split vector quantization approach is used to overcome the complexity problem. Here, the LPC vector is divided into two parts and each part is vector‐quantized separately. The splitting of LPC vector is studied in the following three domains: (1) line spectral‐pair frequency (LSF), (2) arc‐sine reflection coefficient, and (3) log area ratio. Splitting in LSF domain is found to be the best. Using the localized spectral properties of the LSF parameters, a weigh...
TL;DR: The authors show how a collection of neural units can be used efficiently for VQ encoding, with the units performing the bulk of the computation in parallel, and describe two unsupervised neural network learning algorithms for training the vector quantizer.
Abstract: Using neural networks for vector quantization (VQ) is described. The authors show how a collection of neural units can be used efficiently for VQ encoding, with the units performing the bulk of the computation in parallel, and describe two unsupervised neural network learning algorithms for training the vector quantizer. A powerful feature of the new training algorithms is that the VQ codewords are determined in an adaptive manner, compared to the popular LBG training algorithm, which requires that all the training data be processed in a batch mode. The neural network approach allows for the possibility of training the vector quantizer online, thus adapting to the changing statistics of the input data. The authors compare the neural network VQ algorithms to the LBG algorithm for encoding a large database of speech signals and for encoding images. >
TL;DR: In this paper, a speech coding system which recursively executes a filter-applied "Toeplitz characteristic" by causing a drive signal (i.e., an excitation signal) to be converted into a "Toplitz matrix" when detecting a pitch period in which distortion of the input vector and the vector subsequent to the application of filter applied computation to the drive signal vector in the pitch forecast called either closed loop or compatible code book is minimized.
Abstract: This invention provides a novel speech coding system which recursively executes a filter-applied "Toeplitz characteristic" by causing a drive signal (i.e., an excitation signal) to be converted into a "Toeplitz matrix" when detecting a pitch period in which distortion of the input vector and the vector subsequent to the application of filter-applied computation to the drive signal vector in the pitch forecast called either "closed loop" or "compatible code book" is minimized. The vector quantization method substantially making up the speech coding system of the invention is characteristically used by the system.
TL;DR: In this article, a binary encoder for vector quantization is provided which comprises a plurality of identical two-level branch selectors (18, 19, 20, 21) connected in a turnaround cascade pipeline array.
Abstract: A binary encoder for vector quantization is provided which comprises a plurality of identical two-level branch selectors (18, 19, 20, 21,) connected in a turnaround cascade pipeline array. The upper levels of the two-level selectors are connected in series and the first selector (18) receives a formatted digital data vector input. The upper level of last selector (21) has its output (25) connected to its own lower level input and the outputs of the lower level selectors are connected in series so that the last lower level selector in the turnaround cascade resides in the first two level selector, The output of the last lower level selector (18) provides a desired compressed data vector output. (29)
TL;DR: A novel framework for digital image compression called visual pattern image coding, or VPIC, is presented; set of visual-patterns is defined independent of the images to be coded, and there is no training phase required.
Abstract: A novel framework for digital image compression called visual pattern image coding, or VPIC, is presented. In VPIC, set of visual-patterns is defined independent of the images to be coded. Each visual pattern is a subimage of limited spatial support that is visually meaningful to a normal human observer. The patterns are used as a basis for efficient image representation; since it is assumed that the images to be coded are natural optical images to be viewed by human observers, visual pattern design is developed using relevant psychophysical and physiological data. Although VPIC bears certain resemblances to block truncation (BTC) and vector quantification (VQ) image coding, there are important differences. First, there is no training phase required: the visual patterns derive from models of perceptual mechanisms; second, the assignment of patterns to image regions is not based on a standard (norm) error criterion; expensive search operations are eliminated. >
TL;DR: The quality (SNR value) of the images encoded by the proposed A-VQ method is the same as that of a memoryless vector quantizer, but the bit rate would be reduced by a factor of approximately two when compared to aMemoryless Vector quantizer.
Abstract: A novel vector quantization scheme, called the address-vector quantizer (A-VQ), is proposed. It is based on exploiting the interblock correlation by encoding a group of blocks together using an address-codebook. The address-codebook consists of a set of address-codevectors where each codevector represents a combination of addresses (indexes). Each element of this codevector is an address of an entry in the LBG-codebook, representing a vector quantized block. The address-codebook consists of two regions: one is the active (addressable) region, and the other is the inactive (nonaddressable) region. During the encoding process the codevectors in the address-codebook are reordered adaptively in order to bring the most probable address-codevectors into the active region. When encoding an address-codevector, the active region of the address-codebook is checked, and if such an address combination exist its index is transmitted to the receiver. Otherwise, the address of each block is transmitted individually. The quality (SNR value) of the images encoded by the proposed A-VQ method is the same as that of a memoryless vector quantizer, but the bit rate would be reduced by a factor of approximately two when compared to a memoryless vector quantizer. >
TL;DR: Special fast procedures for the code excited linear predictive coding (CELP) algorithm have been developed to make implementation on modest hardware possible and their storage requirement and numerical accuracy are discussed.
Abstract: Special fast procedures for the code excited linear predictive coding (CELP) algorithm have been developed to make implementation on modest hardware possible. The advantages, as well as the disadvantages, of the various fast procedures are discussed. A general formalism for the algorithm is developed, followed by the discussion of the individual procedures which are grouped according to their features. Along with the computational complexity of each procedure, its storage requirement and numerical accuracy are discussed. A large number of the fast procedures are designed to search through a particular type of codebook (most of the codebooks are stochastic in character, while a few are deterministic). Other fast procedures can be used for arbitrary codebooks and are thus also applicable to trained codebooks. Some of the fast procedures designed for stochastic codebooks can also be used for the computation of the closed pitch loop parameters, which can be interpreted as a search through a time-dependent codebook. >
TL;DR: The range of applicability of nonlinear interpolative vector quantization is illustrated with examples in which optimal nonlinear estimation from quantized data is needed for efficient signal compression.
Abstract: A process by which a reduced-dimensionality feature vector can be extracted from a high-dimensionality signal vector and then vector quantized with lower complexity than direct quantization of the signal vector is discussed. In this procedure, a receiver must estimate, or interpolate, the signal vector from the quantized features. The task of recovering a high-dimensional signal vector from a reduced-dimensionality feature vector can be viewed as a generalized form of interpolation or prediction. A way in which optimal nonlinear interpolation can be achieved with negligible complexity, eliminating the need for ad hoc linear or nonlinear interpolation techniques, is presented. The range of applicability of nonlinear interpolative vector quantization is illustrated with examples in which optimal nonlinear estimation from quantized data is needed for efficient signal compression. >
TL;DR: An approach to speaker recognition based on feedforward neural models is investigated, and recognition performance is shown to be comparable to that of a vector quantization approach based on personalized codebooks.
Abstract: An approach to speaker recognition based on feedforward neural models is investigated. Each person known to the system has a personalized neural net that is trained to be active for only that person's speech. By including speech from many people in the training data of each net this approach can directly model differences in people's speech. The chosen architecture and amount of training performed is shown to strongly affect the recognition performance. Large models with two hidden layers are shown to be inferior to models with only a single hidden layer and fewer weights. Recognition performance is shown to be comparable to that of a vector quantization approach based on personalized codebooks. The neural approach outperforms the codebook system for small model sizes but does slightly less well for larger models. A multitransputer implementation used in the training phase is described. Near linear speedup is obtained by splitting the training data to given independent subtasks, and a dynamic allocation scheme is used to assign these tasks to processors. >
TL;DR: A method of quantizing the shape of pitch contour segments of Mandarin speech by using orthogonal polynomial representation and vector quantization techniques is proposed.
Abstract: A method of quantizing the shape of pitch contour segments of Mandarin speech by using orthogonal polynomial representation and vector quantization techniques is proposed. Only a very limited number of representative pitch contour patterns of words can be found in Mandarin conversation; therefore, pitch information can be represented by the shape and the length of the pitch contour segment word by word instead of frame by frame. An average bit rate of 0.78 b/frame (34.67 b/s) for voiced sounds was achieved. The method is a variable-rate coding scheme with an average delay of 317 ms. >
TL;DR: A speech coder employs vector quantization of LPC parameters, interpolation, and trellis coding for improved speech coding at low bit rates (400 bps).
Abstract: A speech coder employs vector quantization of LPC parameters, interpolation, and trellis coding for improved speech coding at low bit rates (400 bps). The speech coder has an LPC analysis module for converting input speech to LPC parameters, an LSP conversion module for converting LPC parameters into line spectrum frequencies (LSP) data, and a vector quantization and interpolation (VQ/I) module for encoding the LSP data into vector indexes for transmission by applying LPC spectral amplitude as weighting coefficients to the LSP data. The VQ/I module outputs one vector index for every two LPC frames in order to reduce the transmission bit rate, and the omitted frames are interpolated on the receiving end. A decoder correspondingly decodes incoming indexes to LPC parameters and synthesizes them into output speech. Trellis coders with an adaptive tracking function encode the pitch and gain parameters of the LPC frames. A universal codebook stores codewords according to a plurality of accents. The speech coder automatically identifies a speaker's accent and selects the corresponding vocabulary of codewords in order to more intelligibly encode and decode the speaker's speech.
TL;DR: The third technique is a joint optimization of a vector quantizer and a noiseless variable-rate code, which has the potential to yield the highest performance of all three techniques.
Abstract: Three techniques for variable-rate vector quantizer design are applied to medical images. The first two are extensions of an algorithm for optimal pruning in tree-structured classification and regression due to Breiman et al. The code design algorithms find subtrees of a given tree-structured vector quantizer (TSVQ), each one optimal in that it has the lowest average distortion of all subtrees of the TSVQ with the same or lesser average rate. Since the resulting subtrees have variable depth, natural variable-rate coders result. The third technique is a joint optimization of a vector quantizer and a noiseless variable-rate code. This technique is relatively complex but it has the potential to yield the highest performance of all three techniques. >
TL;DR: An adaptive vector quantization source-coding system based on SPAN, a neural network that allows a network to adapt its structure by adding neurons, killing neurons, and modifying the structural relationships between neurons in the network, is proposed.
Abstract: A neural network model, called SPAN (space partition network), is presented. This model differs from most of the currently seen neural networks in that it allows a network to adapt its structure by adding neurons, killing neurons, and modifying the structural relationships between neurons in the network. An adaptive vector quantization source-coding system based on SPAN is proposed. The major advantage of using SPAN as the codebook of a vector quantizer is that SPAN can capture the local context of the source signal space and map onto a lattice structure. A fast codebook-searching method utilizing the local context of the lattice is proposed, and a coding scheme, called the path coding method, for eliminating the correlation buried in the source sequence is introduced. The performance of the proposed coder is compared to an LBG (Y. Linde, A. Buzo, and R.M. Gray, 1980) coder on synthesized Gauss-Markov sources. Simulation results show that, without using the path coding method, SPAN yields performance similar to an LBG coder; however, if the path coding method is used, SPAN displays a much better performance than the LBG for highly correlated signal sources. >
TL;DR: A simple method is investigated, to re-estimate the vector quantization codebook without continuous probability density function assumptions, and preliminary experiments show that such reestimation methods are as effective as the semicontinuous model, especially when the continuous probabilitydensity function assumption is inappropriate.
Abstract: The semicontinuous hidden Markov model is used in a 1000-word speaker-independent continuous speech recognition system and compared with the continuous mixture model and the discrete model. When the acoustic parameter is not well modeled by the continuous probability density, it is observed that the model assumption problems may cause the recognition accuracy of the semicontinuous model to be inferior to the discrete model. A simple method based on the semicontinuous model is investigated, to re-estimate the vector quantization codebook without continuous probability density function assumptions. Preliminary experiments show that such reestimation methods are as effective as the semicontinuous model, especially when the continuous probability density function assumption is inappropriate. >
TL;DR: In this paper, a transformation coding device which applies high by efficient coding to a digital image signal, and in which a plurality of quantization characteristics are prepared and one quantization characteristic is adaptively selected in accordance with a predicted distortion for an input signal and a sequence of transformation coefficients, is presented.
Abstract: A transformation coding device which applies high by efficient coding to a digital image signal, and in which a plurality of quantization characteristics are prepared and one quantization characteristic is adaptively selected in accordance with a predicted distortion for an input signal and a sequence of transformation coefficients, thereby allowing the quantization to be performed in accordance with statistical properties of an input signal to be implemented, and at the same time, allowing a signal to be efficiently compressed.
TL;DR: In this paper, a gain-shape vector quantization apparatus for compressing the data of voice signal is presented, where a code book portion is constituted by a plurality of shape vectors which produce a plurality selected shape vectors.
Abstract: A gain-shape vector quantization apparatus for compressing the data of voice signal. A code book portion is constituted by a plurality of shape vectors which produce a plurality of selected shape vectors. A plurality of variable gain circuits impart gains to each shape vector produced from the code book portion. A plurality of synthesis filters regenerate signals from the outputs of the variable gain circuits. An adder adds the signals regenerated by the synthesis filters. An evaluation unit produces an index to select a plurality of shape vectors in the code book portion in order to minimize an error between the output of the adder and an input speech signal and further produces gain adjusting signal for the variable gain circuits.
TL;DR: A scheme is proposed which is based on vector quantization (VQ) for the data-compression of multichannel ECG waveforms, and both m-AZTEC and CVQ provide data compression, and their performance improves as the number of channels increases.
Abstract: A scheme is proposed which is based on vector quantization (VQ) for the data-compression of multichannel ECG waveforms. N-channel ECG is first coded using m-AZTEC, a new, multichannel extension of the AZTEC algorithm. As in AZTEC, the waveform is approximated using only lines and slopes; however, in m-AZTEC, the N channels are coded simultaneously into a sequence of N+1 dimensional vectors, thus exploiting the correlation that exists across channels in the AZTEC duration parameter. Classified VQ (CVQ) of the m-AZTEC output is next performed to exploit the correlation in the other AZTEC parameter, namely, the value parameter. CVQ preserves the waveform morphology by treating the lines and slopes as two perceptually distinct classes. Both m-AZTEC and CVQ provide data compression, and their performance improves as the number of channels increases. >
TL;DR: An image coding scheme based on the properties of the early stages of the human visual system with satisfactory image quality is presented, which can be seen as an empirical confirmation of the suitability of vector quantization in subband coding.
Abstract: We present an image coding scheme based on the properties of the early stages of the human visual system. The image signal is decomposed via even and odd symmetric, frequency and orientation selective band-pass filters in analogy to the quadrature phase simple cell pairs in the visual cortex. The resulting analytic signal is transformed into a local amplitude and local phase representation in order to achieve a better match to its signal statistics. Both intra filter dependencies of the analytic signal and inter filter dependencies between different orientation filters are exploited by a suitable vector quantization scheme.
Inter orientation filter dependencies are demonstrated by means of a statistical evaluation of the multidimensional probability density function. The results can be seen as an empirical confirmation of the suitability of vector quantization in subband coding. Instead of generating a code book by use of an conventional design-algorithm, we suggest a feature specific partitioning of the multidimensional signal space matched to the properties of human vision. Using this coding scheme satisfactory image quality can be obtained with about 0.78 bit/pixel.
TL;DR: In this article, a residual signal quantization technique used in the adaptive predictive coding of speech signals is based in the frequency domain and the number of bits used to quantize each frequency coefficient is determined by an estimate of the power of the input signal at that frequency, and the quantization noise power spectrum is shaped, and can be selectively shaped so as to form a desired reconstruction noise power distribution.
Abstract: A residual signal quantization technique used in the adaptive predictive coding of speech signals is based in the frequency domain. In predictive coders, a residual signal that results after redundancies are removed from the input signal using linear prediction techniques is quantized. The technique invented involves a transformation of the residual signal to the frequency domain and a quantization of the frequency domain coefficients. Further, the number of bits used to quantize each frequency coefficient is determined by an estimate of the power of the input signal at that frequency. Once the number of bits to be used for quantization is determined, the quantization noise power spectrum is shaped, and can be selectively shaped so as to form a desired reconstruction noise power distribution.
TL;DR: An encoding/decoding system for sequentially encoding input digital signals and sequentially decoding the encoded signals on the basis of the frequency of occurrence of the input signals, thereby improving an encoding efficiency.
Abstract: An encoding/decoding system for sequentially encoding input digital signals and sequentially decoding the encoded signals on the basis of the frequency of occurrence of the input digital signals, thereby improving an encoding efficiency. In a variable length encoding/decoding system, sequential lists of encoded values are used in an encoder and sequential lists of decoded values are stored in a decoder for utilizing the frequency of occurrence of the signals. In a vector quantization encoding/decoding system, the frequency of occurrence of input image signals is used to produce code books for storing vectors that are adjacent to an input vector and to form a block using the input image signals located in the same area in consecutive frames.
TL;DR: Several attempts to improve recognition accuracy with the use of supervised clustering techniques are described, which improved the phonetic recognition capability of the vector quantization, but the overall word and sentence recognition accuracy did not improve.
Abstract: Several attempts to improve recognition accuracy with the use of supervised clustering techniques are described. These techniques modify the distance metric and/or the clustering procedure in a discrete hidden Markov model recognition system in an attempt to improve phonetic modeling. Three techniques considered are linear discriminant analysis, a hierarchical supervised vector quantization technique, and Kohonen's LVQ2 technique. All experiments were performed on the DARPA resource management speech corpus using the BBN BYBLOS system. Even though the techniques improved the phonetic recognition capability of the vector quantization, the overall word and sentence recognition accuracy did not improve. >
TL;DR: A connectionist approach to automatic speaker identification based on the learning vector quantization (VQ) algorithm is presented, based on a nearest-neighbor principle, with adaptation through learning.
Abstract: A connectionist approach to automatic speaker identification based on the learning vector quantization (VQ) algorithm is presented. For each adherent to the identification system, a number of references is fixed. The algorithm is based on a nearest-neighbor principle, with adaptation through learning. The identification is realized by comparing to a given threshold the distance of the unknown utterance to the nearest reference. Preliminary tests run on a ten-speaker set show an identification rate of 97% for MFC coefficients. The identification system and database used and the results obtained for different combinations of parameters are given. The system is evaluated by comparing its performances with a Bayesian system. >
TL;DR: In this paper, a large code book is divided into two sections, having higher and lower priority, representing common characteristics and specific characteristics of images, and the best match of the two sections is utilized.
Abstract: An adaptive vector quantization scheme suitable for packet video adapts to the varying characteristics of the actual images sequence being compressed. A large code book is divided into two sections, having higher and lower priority, representing common characteristics and specific characteristics of images. Each new image to be coded is first compared to the common characteristics section of the code book. If a match of acceptable quality is not found, then it is compared to the specific characteristics section. The best match of the two sections is utilized. Entries in the two sections are reorganized and/or exchanged as a function of the usage of the code vectors therein. The rate and extent of adaptation is dictated by the update interval and the desired level of quality, respectively, without requiring any transmission of the vectors themselves or the side information.