TL;DR: This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization, and focuses primarily on the coding of speech signals and parameters.
Abstract: Quantization, the process of approximating continuous-amplitude signals by digital (discrete-amplitude) signals, is an important aspect of data compression or coding, the field concerned with the reduction of the number of bits necessary to transmit or store analog data, subject to a distortion or fidelity criterion. The independent quantization of each signal value or parameter is termed scalar quantization, while the joint quantization of a block of parameters is termed block or vector quantization. This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization. Vector quantization is presented as a process of redundancy removal that makes effective use of four interrelated properties of vector parameters: linear dependency (correlation), nonlinear dependency, shape of the probability density function (pdf), and vector dimensionality itself. In contrast, scalar quantization can utilize effectively only linear dependency and pdf shape. The basic concepts are illustrated by means of simple examples and the theoretical limits of vector quantizer performance are reviewed, based on results from rate-distortion theory. Practical issues relating to quantizer design, implementation, and performance in actual applications are explored. While many of the methods presented are quite general and can be used for the coding of arbitrary signals, this paper focuses primarily on the coding of speech signals and parameters.
TL;DR: A vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker and was used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule.
Abstract: In this study a vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker. A set of such codebooks were then used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule. A series of speaker recognition experiments was performed using a 100-talker (50 male and 50 female) telephone recording database consisting of isolated digit utterances. For ten random but different isolated digits, over 98% speaker identification accuracy was achieved. The effects, on performance, of different system parameters such as codebook sizes, the number of test digits, phonetic richness of the text, and difference in recording sessions were also studied in detail.
TL;DR: Preliminary results indicate that higher quality or lower bit rates may be achieved with enough computational resources, and an extension of the centroid computation used in vector quantization is presented.
Abstract: Rate-distortion theory provides the motivation for using data compression techniques on matrices of N LPC vectors. This leads to a simple extension of speech coding techniques using vector quantization. The effects of using the generalized Lloyd algorithm on such matrices using a summed Itakura-Saito distortion measure are studied, and an extension of the centroid computation used in vector quantization is presented. The matrix quantizers so obtained offer substantial reductions in bit rates relative to full-search vector quantizers. Bit rates as low as 150 bits/s for the LPC matrix information (inclusive of gain, but without pitch and voicing) have been achieved for a single speaker, having average test sequence and codebook distortions comparable to those in the equivalent full-search vector quantizer operating at 350 bits/s. Preliminary results indicate that higher quality or lower bit rates may be achieved with enough computational resources.
TL;DR: Design methods for vector quantizers with the embedded coding property are presented and their performance simulated for the medium-band of 32-8 Kbits/sec.
Abstract: Embedded speech coders are characterized by the property that their output quality degrades gracefully as their bit rate is decreased. Design methods for vector quantizers with the embedded coding property are presented and their performance simulated for the medium-band of 32-8 Kbits/sec. Listening tests indicate that these coders can provide good quality speech at 32 and 24 Kbits/sec and intelligible speech down to 8 Kbits/sec.
TL;DR: Simulations on real images show significant improvement over the conventional DPCM and tree codes using these new techniques, and the strong robustness property of these coding schemes is also experimentally demonstrated.
Abstract: The purpose of this paper is to present new image coding schemes based on a predictive vector quantization (PVQ) approach. The predictive part of the encoder is used to partially remove redundancy, and the VQ part further removes the residual redundancy and selects good quantization levels for the global waveform. Two implementations of this coding approach have been devised, namely, sliding block PVQ and block tree PVQ. Simulations on real images show significant improvement over the conventional DPCM and tree codes using these new techniques. The strong robustness property of these coding schemes is also experimentally demonstrated.
TL;DR: The algorithm for FSVQ design for waveform coders is extended to FSVZ design of linear predictive coded (LPC) speech parameter vectors using an Itakura-Saito distortion measure and a new technique for the iterative improvement of the next-state function based on an algorithm from adaptive stochastic automata theory is introduced.
Abstract: A finite-state vector quantizer (FSVQ) is a switched vector quantizer where the sequence of quantizers selected by the encoder can be tracked by the decoder. It can be viewed as an adaptive vector quantizer with backward estimation, a vector generalization of an AQB system. Recently a family of algorithms for the design of FSVQ's for waveform coding application has been introduced. These algorithms first design an initial set of vector quantizers together with a next-state function giving the rule by which the next quantizer is selected. The codebooks of this initial FSVQ are then iteratively improved by a natural extension of the usual memoryless vector quantizer design algorithm. The next-state function, however, is not modified from its initial form. In this paper we present two extensions of the FSVQ design algorithms. First, the algorithm for FSVQ design for waveform coders is extended to FSVQ design of linear predictive coded (LPC) speech parameter vectors using an Itakura-Saito distortion measure. Second, we introduce a new technique for the iterative improvement of the next-state function based on an algorithm from adaptive stochastic automata theory. The design algorithms are simulated for an LPC FSVQ and the results are compared with each other and to ordinary memoryless vector quantization. Several open problems suggested by the simulation results are presented.
TL;DR: In vector quantization schemes usually speech and speaker dependent codebooks are applied in order to achieve good speech quality at medium bit rates, but this paper deals with another approach: the speech waveforms are transformed into signals which ideally do no longer containspeech and speaker specific features.
Abstract: In vector quantization schemes usually speech and speaker dependent codebooks are applied in order to achieve good speech quality at medium bit rates. This paper deals with another approach: The speech waveforms are transformed into signals which ideally do no longer contain speech and speaker specific features. Thus these signals can be encoded by an universal vector quantizer. This concept is realized by a system called RELP-VQ. The performance of this RELP-VQ scheme was evaluated by SNR-measurements as well as by informal listening tests including female and male English and German speakers.
TL;DR: This paper shows how to integrate efficient and accurate speech modeling methods and network search procedures to give a speaker-independent, syntax-directed, connected word recognition system which requires only a modest amount of computation, and whose performance is comparable to that of previous recognizers requiring an order of magnitude more computation.
Abstract: In the last several years, a wide variety of techniques have been developed which make practical the implementation and development of large networks for recognizing connected sequences of words. Included among these techniques are efficient and accurate speech modeling methods (e.g., vector quantization, hidden Markov models) and efficient, optimal network search procedures (i.e., level building). In this paper we show how to integrate these techniques to give a speaker-independent, syntax-directed, connected word recognition system which requires only a modest amount of computation, and whose performance is comparable to that of previous recognizers requiring an order of magnitude more computation. In particular, the recognizer we studied was an airlines information and reservation system using a 129 word vocabulary, and a deterministic syntax (grammar) with 144 states, 450 state transitions, and 21 final states, generating more than 6 × 109sentences. An evaluation of the system, using six talkers each speaking 51 test sentences, yielded a sentence accuracy of about 75 percent resulting from a word accuracy of about 93 percent, for an average speaking rate of about 210 words per minute.
TL;DR: A new class of adaptive vector quantizers is presented in which a codebook is gradually changed to track time-varying source statistics by dynamically updating some of the codevectors.
Abstract: This paper presents a new class of adaptive vector quantizers in which a codebook is gradually changed to track time-varying source statistics by dynamically updating some of the codevectors. The partial distortion of a codevector, defined as the product of the average distortion when a particular codevector is selected and the probability that selection occurs, is used as a measure to determine which codevectors are updated. Codevectors with small partial distortion are deleted from the codebook, and those with large partial distortion are shifted or split so that the input vectors that are mapped into those codevectors are quantized more finely to give smaller partial distortions. Experimental results illustrate the application of this adaptive vector quantizer to image coding.
TL;DR: Several vector quantization approaches to the problem of text-dependent speaker verification are described, and detailed experimental results are presented and discussed.
Abstract: Several vector quantization approaches to the problem of text-dependent speaker verification are described. In each of these approaches, a source codebook is designed to represent a particular speaker saying a particular utterance. Later, this same utterance is spoken by a speaker to be verified and is encoded in the source codebook representing the speaker whose identity was claimed. The speaker is accepted if the verification utterance's quantization distortion is less than a prespecified speaker-specific threshold. The best approach achieved a 0.7 percent false acceptance rate and a 0.6 percent false rejection rate on a speaker population comprising 16 admissible speakers and 111 casual imposters. The approaches are described, and detailed experimental results are presented and discussed.
TL;DR: A new approach to isolated-word speech recognition using vector quantization (VQ) is examined.
Abstract: A new approach to isolated-word speech recognition using vector quantization (VQ) is examined In this approach, words are recognized by means of sequences of VQ codebooks, called multisection codebooks A separate multisection codebook is designed for each word in the recognition vocabulary by dividing the word into equal-length sections and designing a standard VQ codebook for each section Unknown words are classified by dividing them into corresponding sections, encoding them with the multisection codebooks, and finding the multisection codebook that yields the smallest average distortion For speaker-independent recognition of the digits, this approach achieved a recognition accuracy of 98 percent In addition, the approach achieved greater than 99 percent accuracy for speaker-dependent recognition of the digits with only one distortion computation per input frame per vocabulary word The approach is described, detailed experimental results are presented and discussed, and computational requirements are analyzed
TL;DR: These product codes separate the mean and orientation information from each source vector and encode this information independently to allow the residual to be vector quantized more accurately.
Abstract: There is a growing interest in the use of vector quantization for coding digital images. A key issue to be resolved is how to achieve perceptually pleasing results while limiting encoding complexity to tolerable levels. In this paper, product codes are described which improve the quality of the encoded edges and textures for a given level of complexity. These product codes separate the mean and orientation information from each source vector and encode this information independently to allow the residual to be vector quantized more accurately. The color image coder also reduces the required bit rate by taking advantage of spectral redundancy. Experimental results indicate that an improvement of almost 1.4 dB in SNR can be achieved over a Discrete Cosine Transform block coder of comparable complexity, with negligible computational complexity added by the product structure.
TL;DR: A network-based approach to speaker-independent isolated digit recognition using a pronunciation network whose arcs represent classes of acoustic-phonetic segments based on vector quantization of LPC spectra.
Abstract: This paper describes a network-based approach to speaker-independent digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra. Recognition involves finding a minimum quantization distortion path through the network by dynamic programming. The system has been evaluated in an extensive series of speaker-independent isolated digit (one-nine, oh and zero) recognition experiments using a 225-talker. multidialect database developed by Texas Instruments (TI). The best recognizer configurations achieved accuracies of 97-99 percent on the TI database.
TL;DR: The results show that the proposed preprocessor has the capability of reducing computation for recognition by up to an order of magnitude, while maintaining the same performance as that obtained using a DTW comparison without the pre-processor.
Abstract: In this paper, we propose a speaker-independent isolated ward recognition system whose performance is comparable to that of a conventional isolated word recognizer, but whose computation is greatly reduced. The structure of the proposed recognizer consists of a word-based vector quantization (VQ) preprocessor, followed by a conventional DTW postprocessor. The purpose of the preprocessor is essentially to eliminate from further consideration all words in the vocabulary which are unlikely recognition candidates. In some cases, the preprocessor will be able to eliminate all word candidates except one; for such cases, there is no further processing required for word recognition. In all other cases (i.e., when more than one word candidate is passed on), a dynamic time warping (DTW) processor is used to re-solve finer acoustical distinctions among the remaining word candidates. The performance of this type of recognizer (i.e., using a word-based preprocessor and a standard DTW comparison to make finer distinctions) is affected by a number of factors involved with the details of exactly how the system is implemented-e.g., the distortion measure used in the preprocessor and in the DTW comparison, the size of the VQ codebook for each vocabulary word, the decision thresholds of the preprocessor, etc. Several of these factors were studied experimentally using testing databases consisting of isolated digits and words from a vocabulary of 129 airline terms. The results show that the proposed preprocessor has the capability of reducing computation for recognition by up to an order of magnitude, while maintaining the same performance as that obtained using a DTW comparison without the pre-processor. A somewhat smaller reduction in memory over the straight DTW implementation is also obtained in the proposed approach.
TL;DR: An application of source coding to speaker recognition is described, where each speaker is represented by a sequence of vector quantization codebooks; known input utterances are classified using these codebook sequences and the resulting classification distortion is compared to a rejection threshold.
Abstract: An application of source coding to speaker recognition is described. The method is text-dependent - the text spoken is known, and the problem is to determine who said it. Each speaker is represented by a sequence of vector quantization codebooks; known input utterances are classified using these codebook sequences and the resulting classification distortion is compared to a rejection threshold. On a 16 speaker test population with an additional 111 imposters, this method achieved a false rejection rate of 0.8%, an imposter acceptance rate of 1.8%, and within the 16 speakers, an identification error rate of 0.0%.
TL;DR: A new algorithm is proposed which efficiently allocates a given quota of bits, based on the actually measured quantizer performance, without any prior assumptions on its behavior, which can enhance the performance-of traditional scalar coders as well as any modern coder which employs dynamic bit allocation.
Abstract: Traditional solutions to the bit allocation problem assume nicely behaved quantizers whose distortion versus rate characteristic is a standard exponentially decreasing function. This model is often inadequate and leads to suboptimal bit allocations for actual quantizer characteristics. This inadequacy prevails particularly in coding systems based on vector quantization (VQ). The distortion-rate characteristic for VQ suffers from irregularities because of suboptimality of the codebook and of the codebook search method. We propose a new algorithm which efficiently allocates a given quota of bits, based on the actually measured quantizer performance, without any prior assumptions on its behavior. The algorithm can enhance the performance-of traditional scalar coders as well as any modern coder which employs dynamic bit allocation.
TL;DR: In this paper, a video coding apparatus has a quantizer that is controlled by a quantization control signal, such that quantization is performed more coarsely as the calculated distance becomes longer.
Abstract: A video coding apparatus having a quantizer that is controlled by a quantization control signal. Stationary picture elements are coded such that quantization is coarse, which reduces the output information for the picture elements in the stationary image region and increases compression of the signal. The apparatus may include a calculation system for calculating a distance of a stationary picture element from the moving image region. The quantization is performed more coarsely as the calculated distance becomes longer. The quantizer may also be controlled such that coarse quantization is performed over a plurality of frames for the picture elements in the stationary region, and at a predetermined interval finer quantization is performed with respect to the same stationary picture elements, for further improving the picture. Transition of a picture element from the stationary region to the moving region may be detected, and the quantizer may be controlled to perform fine quantization when the transition is detected, and thereafter perform coarse quantization.
TL;DR: The application of vector quantization to code images in the spatial and the frequency domain are discussed, residual vector quantizers, predictive vectorquantizers, sub-band vector quantizer as well as binary vector quantized are mentioned.
Abstract: This paper presents a review of the vector quantization techniques in image coding. The application of vector quantization to code images in the spatial and the frequency domain are discussed, residual vector quantizers, predictive vector quantizers, sub-band vector quantizer as well as binary vector quantizers are also mentioned. Finally, this paper is believed to give a short review of the current vector quantization techniques.
TL;DR: The new vector quantization methods are presented for high efficiency coding and transmission of color images using only three factors ; mean, deviation and normalized vector.
TL;DR: This paper investigates the zero-input stability properties of the exact second-order recursive digital filter having both overflow and quantization non-linearities and presents three sets of conditions to ensure asymptotic overflow-stability in the presence of quantization.
Abstract: This paper investigates the zero-input stability properties of the exact second-order recursive digital filter having both overflow and quantization non-linearities. Two examples demonstrate the adverse influence of quantization on the overflow-stability property of the filter. Three sets of conditions are presented to ensure asymptotic overflow-stability in the presence of quantization. Using these criteria, various regions in the coefficient plane corresponding to different minimum internal wordlengths required to ensure the non-interaction of overflow and quantization are derived. These results thus form a useful design criterion.
TL;DR: The main objective is improving the excitation representation in a linear predictive coding scheme and, hence, the subjective quality of synthesized speech signals.
Abstract: Considerable effort has been and is currently being concentrated on improving the speech quality at low and very low bit rates. Recently new models of LPC excitation have been devised, which are able to yield good quality speech by exploiting our knowledge of the human speech production and perception processes. Unfortunately, these models generally require too much computational load to be easily implemented on currently available hardware. This paper describes an efficient speech coder, capable of providing acceptable quality speech, within the limitations of both low bit rate (approximately 2.4 kbit/s) and real-time implementation. The coder is based upon pattern classification and cluster analysis with perceptually-meaningful error minimization criteria. Our main objective is improving the excitation representation in a linear predictive coding scheme and, hence, the subjective quality of synthesized speech signals.
TL;DR: New algorithms are presented for image sequence coding based upon the combination of frame replenishment and vector quantization, demonstrating that good quality reconstructed image sequences at bit rate as low as 0.6 - 0.7 bits/pixel are obtained using adaptive algorithms.
Abstract: In this paper, new algorithms are presented for image sequence coding based upon the combination of frame replenishment and vector quantization The results reported include both non-adaptive and adaptive vector quantization The adaptivity is performed by replenishing the codebook according to the local statistics of the image sequence A statistical model is used to initiate the adaptive operation The experimental results demonstrate that good quality reconstructed image sequences at bit rate as low as 06 - 07 bits/pixel are obtained using adaptive algorithms
TL;DR: In this paper, a linear-prediction inverse filter is used to filter digital samples of speech signal by quantized residual vectors, whose coefficients are chosen out of a codebook of quantized filter coefficient vectors, obtaining a residual signal subdivided into vectors.
Abstract: This method provides a filtering of digital samples of speech signal by a linear-prediction inverse filter, whose coefficients are chosen out of a codebook of quantized filter coefficient vectors, obtaining a residual signal subdivided into vectors. The weighted mean-square error made in quantizing said vectors with quantized residual vectors contained in a codebook and forming excitation waveforms is computed. The coding signal for each block of samples consists of the coefficient vector index chosen for the inverse filter as well as of the indices of the vectors of the excitation waveforms which have generated minimum weighted mean-square error. During the decoding phase, a synthesis filter, having the same coefficients as chosen for the inverse filter, is excited by quantized-residual vectors chosen during the coding phase (FIGS. 1, 2).
TL;DR: A text‐independent speaker clustering approach to speaker‐indepencent speaker recognition through vector quantization (VQ) was investigated, where the distortion value was used as a clustering measure.
Abstract: A text‐independent speaker clustering approach to speaker‐indepencent speaker recognition through vector quantization (VQ) was investigated, where the distortion value was used as a clustering measure. To show the possibility of the text‐independent speaker clustering, speaker recognition experiments were carried out using the Harvard sentence database. Nine male speakers uttered ten different Harvard sentences each. Codebooks were generated from the first five sentences for each speaker using Weighted Likelihood Ratio measure (WLR) through LPC analysis. Using 128 vectors in each codebook, a speaker recognition rate of 98% was attained on the latter five Harvard sentences. Effects of codebook size and input length are also discussed. The above approach based on framewise VQ only utilizes the static distribution of LPC spectra. VQ for multiframe codebooks was used to represent the coarticulation units. The results of speaker recognition experiments based on multi‐frame codebooks will be compared with fixed length VQ approaches.
TL;DR: The performance of a block cosine image coding system with an adaptive quantizer matched to the statistics of the transform coefficients is described, which results in significant improvement in reconstructed image quality compared to fixed quantization schemes designed under the Gaussian assumption.
Abstract: Quantizers for block transform image coding systems are typically designed under the assumption of Gaussian statistics for the transform coefficients. While convincing arguments can be provided in support of this approach, empirical evidence is presented demonstrating that, except possibly for the dc term, wide departures from Gaussian behavior can be expected for real-world imagery at typical block sizes. In this paper we describe the performance of a block cosine image coding system with an adaptive quantizer matched to the statistics of the transform coefficients. The adaptive quantizer is based upon a recently developed algorithm which employs a training sequence in the design procedure. At encoding rates of approximately 1 bit/pixel and above, this approach results in significant improvement in reconstructed image quality compared to fixed quantization schemes designed under the Gaussian assumption. For rates much below 1 bit/pixel the relative improvement is negligible.
TL;DR: A network-based approach to speaker-independent connected digit recognition using a pronunciation network whose arcs represent classes of acoustic-phonetic segments based on vector quantization of LPC spectra and the use of gross acoustic features for pruning.
Abstract: This paper describes a network-based approach to speaker-independent connected digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra and the use of gross acoustic features for pruning. Recognition involves finding a minimum quantization distortion path through the network by dynamic programming. The system has been evaluated using a portion of a large multi-dialect database developed by Texas Instruments (TI). Using a baseline network of concatenated independent digit models, string and digit accuracies of 86% and 97% respectively have been obtained.
TL;DR: Experimental results show that a significant gain in segmental SNR can be obtained over nonadaptive VQ with a negligible increase in complexity.
Abstract: A class of adaptive vector quantizers (VQs) that can dynamically adjust the 'gain' of codevectors according to the input signal level is introduced The encoder uses a gain estimator to determine a suitable normalization of each input vector prior to VQ coding The normalized vectors have reduced dynamic range and can then be more efficiently coded At the receiver, the VQ decoder output is multiplied by the estimated gain Both forward and backward adaptation are considered and several different gain estimators are compared and evaluated An approach to optimizing the design of gain estimators is introduced Some of the more obvious techniques for achieving gain adaptation are substantially less effective than the use of optimized gain estimators A novel design technique that is needed to generate the appropriate gain-normalized codebook for the vector quantizer is introduced Experimental results show that a significant gain in segmental SNR can be obtained over nonadaptive VQ with a negligible increase in complexity
TL;DR: A new combined encoding scheme is introduced for the luminance and chrominance signals based on the technique of vector quantization, taking into account the properties of the human visual system.
Abstract: A demand for color picture transmission is growing rapidly for various integrated services. For transmission, color pictures must be encoded, but it is widely understood that this can be most conveniently done in the form of luminance and chrominance signals, since the properties of the human visual system can be exploited to achieve considerable redundancy reduction. For encoding these component signals, it is most efficient if they can be jointly encoded. In this paper, a new combined encoding scheme is introduced for the luminance and chrominance signals based on the technique of vector quantization, taking into account the properties of the human visual system. However, these encoded signals are converted into primary RGB signals for display, and the corresponding increase of the encoding noise has not been discussed in the literature. In this paper, an encoding process is considered as a part of the total system between a camera and a display. Two different adaptive quantization schemes are incorporated in the encoding, and it will be shown that a good picture quality can be obtained with 2.0-2.5 bits/pel.
TL;DR: Recently a new structure for isolated-word recognition was proposed in which a separate Vector Quantization (VQ) code book was designed for each word in the vocabulary, and a technique for incorporating temporal structure into the preprocessor was proposed.
Abstract: Recently a new structure for isolated-word recognition was proposed in which a separate Vector Quantization (VQ) code book was designed for each word in the vocabulary. The word-based VQs were used as a front-end preprocessor to eliminate word candidates whose distortion scores were large; a dynamic time-warping processor then resolved the choice among the remaining word candidates. The above scheme worked very well for small vocabularies; however, the major flaw was the lack of temporal information in the word-based VQ processor. As such, as the vocabulary grew in size and complexity, the ability of the VQ processor to resolve among similar sounding words decreased dramatically, and the effectiveness of the proposed recognition structure similarly decreased. To alleviate this difficulty a technique for incorporating temporal structure into the preprocessor is proposed. In particular, the probability density function of the time of occurrence for each vector in the code book is estimated from a training sequence. In the recognizer, the spectral distance score of the VQ is combined with a temporal distance score, for each frame in the word. An evaluation of the modified recognizer showed slightly improved performance on the digits vocabulary and greatly improved performance on a vocabulary of 129 airlines terms.
TL;DR: This investigation was to isolate the effects of different distance measures in a recognizer from the other types of processing typically used in recognition, using a vector quantization approach to give the single-frame reference patterns required by the recognizer.
Abstract: One of the most fundamental concepts used in the standard pattern recognition model for speech recognition is that of distance between pairs of frames of speech. Several distance measures have been proposed and studied in the context of an overall speech recognizer. The purpose of this investigation was to isolate the effects of different distance measures in a recognizer from the other types of processing typically used in recognition. The way in which this isolation was achieved was to use a recognizer based on single-frame distance scores, using a vector quantization approach to give the single-frame reference patterns required by the recognizer. The vocabulary for recognition was the set of continuant vowels extracted from carrier words. A speaker-dependent vowel recognition experiment was carried out using seven talkers (four male, three female) and five distance measures. Results indicated that there were differences in performance for the different distance measures when the number of code-book patterns per vowel was one or two; however, when the number of code-book patterns was four or more, these differences in performance became insignificant.