Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Pitch detection algorithm
  4. 1997
  1. Home
  2. Topics
  3. Pitch detection algorithm
  4. 1997
Showing papers on "Pitch detection algorithm published in 1997"
Patent•
Speech synthesizing system and redundancy-reduced waveform database therefor

[...]

Hirofumi Nishimura1, Toshimitsu Minowa1, Arai Yasuhiko1•
Panasonic1
23 Oct 1997-Journal of the Acoustical Society of America
TL;DR: In this paper, a speech synthesizing system using a redundancy-reduced waveform database is disclosed, where each waveform of a sample set of voice segments necessary and sufficient for speech synthesis is classified into groups of pitch waveforms closely similar to one another.
Abstract: A speech synthesizing system using a redundancy-reduced waveform database is disclosed. Each waveform of a sample set of voice segments necessary and sufficient for speech synthesis is divided into pitch waveforms, which are classified into groups of pitch waveforms closely similar to one another. One of the pitch waveforms of each group is selected as a representative of the group and is given a pitch waveform ID. The waveform database at least comprises a pitch waveform pointer table each record of which comprises a voice segment ID of each of the voice segments and pitch waveform IDs the pitch waveforms of which, when combined in the listed order, constitute a waveform identified by the voice segment ID and a pitch waveform table of pitch waveform IDs and corresponding pitch waveforms. This enables the waveform database size to be reduced. For each of pitch waveforms the database lacks, one of the pitch waveform IDs adjacent to the lacking pitch waveform ID in the pitch waveform pointer table is used without deforming the pitch waveform.

146 citations

Journal Article•10.1016/S0167-6393(97)00002-2•
A pitch determination and voiced/unvoiced decision algorithm for noisy speech

[...]

Jean Rouat1, Yong Chun Liu1, Daniel Morissette1•
Université du Québec à Chicoutimi1
15 Apr 1997-Speech Communication
TL;DR: The voiced/unvoiced, unvoiced/voiced performance and pitch estimation errors for the proposed PDA and the reference system while utilising three speech databases are reported in details.

108 citations

Patent•
Pitch determiner for a speech analyzer

[...]

Jian-Cheng Huang1, Floyd D. Simpson1, Xiaojun Li1•
Motorola1
29 Dec 1997
TL;DR: A pitch function generator (414) as discussed by the authors generates a plurality of pitch components representing a pitch function for one or more sequential segments of speech, which are represented by a predetermined number of digitized speech samples.
Abstract: A pitch determiner (414) for use with a speech analyzer includes a pitch function generator (414) which generates a plurality of pitch components representing a pitch function for one or more sequential segments of speech. which are represented by a predetermined number of digitized speech samples. A pitch enhancer (1116) enhances the pitch function of a current segment of speech utilizing the pitch function of one or more sequential segments of speech to generate a plurality of enhanced pitch components. A pitch detector (1118) detects the pitch of the current segment of speech by determining the pitch of an enhanced pitch component having a largest amplitude of the plurality of enhanced pitch components.

43 citations

Patent•
Statistical methods and apparatus for pitch extraction in speech recognition, synthesis and regeneration

[...]

Chengjun Julian Chen1•
IBM1
31 Oct 1997-Journal of the Acoustical Society of America
TL;DR: In this article, a method and apparatus for extracting pitch value information from speech is presented, which selects at least three highest peaks from a normalized autocorrelation function and produces a plurality of frequency candidates for pitch value determination, and is further used to perform both forward and backward searching when an anchor point cannot be readily identified.
Abstract: A method and apparatus for extracting pitch value information from speech. The method selects at least three highest peaks from a normalized autocorrelation function and produces a plurality of frequency candidates for pitch value determination. The plurality of frequency candidates are used to identify anchor points in pitch values, and is further used to perform both forward and backward searching when an anchor point cannot be readily identified. The running mean or average of determined pitch values is maintained and used in conjunction with the identified valid pitch values in a final determination of the pitch estimation using a weighted least squares fit for identified non-valid frames.

37 citations

Journal Article•10.1109/89.554783•
Real-time fundamental frequency estimation by least-square fitting

[...]

A. Choi1•
University of Hong Kong1
01 Mar 1997-IEEE Transactions on Speech and Audio Processing
TL;DR: Characterization of the error function of fitting a sinusoid to the signal segment allows its spectrum to be deduced and the algorithm to be implemented efficiently.
Abstract: For real-time applications, a fundamental frequency estimation algorithm must be able to obtain accurate estimates from short signal segments. Characterization of the error function of fitting a sinusoid to the signal segment allows its spectrum to be deduced and the algorithm to be implemented efficiently. Musical signals are discussed in particular.

35 citations

Proceedings Article•10.1109/ICASSP.1997.599658•
A novel frequency domain filtered-X LMS algorithm for active noise reduction

[...]

T. Kosaka, Stephen J. Elliott1, C.C. Boucher•
University of Southampton1
21 Apr 1997
TL;DR: The frequency domain filtered-X LMS algorithm showed better performance than the conventional time domain algorithm in simulations of single channel active control systems and is able to improve the convergence of multiple channel systems by compensating for the coupling between the control channels.
Abstract: A frequency domain implementation of the LMS algorithm has significant advantages. In broadband applications it is important to use the correct window function before Fourier transformation to obtain an unbiased estimation of the required cross correlation function and to eliminate wrap-around effects. In the frequency domain filtered-X LMS algorithm described in this paper, the control filter is updated in the frequency domain as a background task, while control filtering is performed in the time domain, to minimize the processing delays. The frequency domain algorithm showed better performance than the conventional time domain algorithm in simulations of single channel active control systems. The algorithm is also able to improve the convergence of multiple channel systems by compensating for the coupling between the control channels.

26 citations

Patent•
Speech encoding/decoding method and apparatus using a pitch reliability measure

[...]

Kazuyuki Iijima1, Masayuki Nishiguchi1, Jun Matsumoto1•
Sony Broadcast & Professional Research Laboratories1
11 Sep 1997
TL;DR: In this paper, a pitch detection method and apparatus capable of realizing high-precision pitch detection even for speech signals in which half-pitch or doublepitch exhibits stronger autocorrelation than the pitch for detection is presented.
Abstract: A pitch detection method and apparatus capable of realizing high-precision pitch detection even for speech signals in which half-pitch or double-pitch exhibits stronger autocorrelation than the pitch for detection. An input speech signal is judged as to voicedness or unvoicedness and a voiced portion and an unvoiced portion of the input speech signal are encoded by a sinusoidal analytic encoding unit 114 and by a code excitation encoding unit 120, respectively, for producing respective encoded outputs. The sinusoidal analytic encoding unit 114 performs pitch search on the encoded outputs for finding the pitch information from the input speech signal and sets the high-reliability pitch information based on the detected pitch information. The results of pitch detection are determined based on the high-reliability pitch information.

20 citations

Proceedings Article•10.1109/CCECE.1997.614827•
Evaluation of various FFT methods for single tone detection and frequency estimation

[...]

Y.T. Chan, Q. Ma1, Hing Cheung So2, R. Inkol•
Royal Military College of Canada1, City University of Hong Kong2
25 May 1997
TL;DR: The standard periodogram generally gives the best detection performance and the minimum mean square frequency error for a fixed length of signal data, however, if the FFT length is fixed the Welch method gives thebest performance.
Abstract: The periodogram, implemented using the fast Fourier transform (FFT), is widely used for the detection and frequency measurement of single tones. This paper evaluates the detection and frequency estimation performance of the periodogram and its variants, such as the Welch and Bartlett methods and the polyphase-FFT. Performance results for the detection and frequency estimation performance of the periodogram and its variants are presented and compared. The standard periodogram generally gives the best detection performance and the minimum mean square frequency error for a fixed length of signal data. However, if the FFT length is fixed the Welch method gives the best performance.

20 citations

Patent•
Method and apparatus for encoding/decoding voiced speech based on pitch intensity of input speech signal

[...]

Masayuki Nishiguchi1, Kazuyuki Iijima1, Jun Matsumoto1•
Sony Broadcast & Professional Research Laboratories1
8 Sep 1997
TL;DR: In this paper, the pitch intensity information, which is a parameter containing the information representing not only pitch intensity of the input speech signal but also the proximity to the voiced speech or the unvoiced speech of the speech signal, is generated by a voiced/unvoiced (V/UV) discrimination unit and pitch intensity generating circuit.
Abstract: A speech encoding method, a speech decoding method and corresponding apparatus capable of outputting non-buzzing spontaneous playback speech in a voiced portion includes a sinusoidal analysis encoding unit on the decoder side that detects the pitch of the voiced portion of the input speech signal. The pitch intensity information, which is a parameter containing the information representing not only the pitch intensity of the input speech signal but also the information representing proximity to the voiced speech or the unvoiced speech of the speech signal, is generated by a voiced/unvoiced (V/UV) discrimination unit and pitch intensity information generating circuit. The pitch intensity data is sent along with the encoded speech signal to the encoding side which then adds the noise component controlled on the basis of the pitch intensity information to the voiced portion of the encoded speech signal in a voiced speech synthesis portion and decodes and outputs the resulting signal.

17 citations

Patent•10.1121/1.429357•
Pitch extraction method and device utilizing autocorrelation of a plurality of frequency bands

[...]

Kazuyuki Iijima1, Masayuki Nishiguchi1, Jun Matsumoto1, Shiro Omori1•
Sony Broadcast & Professional Research Laboratories1
24 Jan 1997-Journal of the Acoustical Society of America
TL;DR: In this paper, a pitch extraction method and apparatus whereby the pitch of a speech signal having various characteristics can be extracted accurately is presented. But the pitch reliability of the input speech signals, limited by the HPF 12 and the LPF 16, is computed in elevation parameter calculation units.
Abstract: A pitch extraction method and apparatus whereby the pitch of a speech signal having various characteristics can be extracted accurately. The frame-based input speech signal, band-limited by an HPF 12 and an LPF 16, is sent to autocorrelation computing units 13, 17 where autocorrelation data is found. The pitch lag is computed and normalized in the pitch intensity/pitch lag computing units 14, 18. The pitch reliability of the input speech signals, limited by the HPF 12 and the LPF 16, is computed in elevation parameter calculation units. A selection unit 20 selects one of the parameters obtained from the input speech signal, limited by the HPF 12 and the LPF 16, using the pitch lag and the evaluation parameter.

16 citations

Patent•
Speech analysis method and speech encoding method and apparatus

[...]

Kazuyuki Iijima1, Akira Inoue1, Jun Matsumoto1, Masayuki Nishiguchi1•
Sony Broadcast & Professional Research Laboratories1
17 Oct 1997
TL;DR: In this article, the frequency spectrum of the input speech is split on the frequency axis into plural bands in each of which pitch search and evaluation of amplitudes of the harmonics are carried out simultaneously using an optimum pitch derived from the spectral shape.
Abstract: A speech analysis method and a speech encoding method and apparatus in which, even if the harmonics of the speech spectrum are offset from integer multiples of the fundamental wave, the amplitudes of the harmonics can be evaluated correctly for producing a playback output of high clarity. To this end, the frequency spectrum of the input speech is split on the frequency axis into plural bands in each of which pitch search and evaluation of amplitudes of the harmonics are carried out simultaneously using an optimum pitch derived from the spectral shape. Using the structure of an harmonics as the spectral shape, and based on the rough pitch previously detected by an open-loop rough pitch search, a high-precision pitch search comprised of a first pitch search for the frequency spectrum in its entirety and a second pitch search of higher precision than the first pitch search is carried out. The second pitch search is performed independently for each of the high range side and the low range side of the frequency spectrum.
Patent•
Pitch detection method and apparatus uses voiced/unvoiced decision in a frame other than the current frame of a speech signal

[...]

Kazuyuki Iijima1, Masayuki Nishiguchi1, Jun Matsumoto1•
Sony Broadcast & Professional Research Laboratories1
11 Sep 1997
TL;DR: In this paper, an input speech signal is judged as to voicedness or unvoicedness, and a voiced portion and an unvoicing portion of the input signal are encoded by a sinusoidal analytic encoding unit 114 and by a code excitation encoding unit 120, respectively, for producing respective encoded outputs.
Abstract: For realizing high-precision pitch detection even for speech signals in which half-pitch or double-pitch exhibits stronger autocorrelation than the pitch to be detected, an input speech signal is judged as to voicedness or unvoicedness and a voiced portion and an unvoiced portion of the input speech signal are encoded by a sinusoidal analytic encoding unit 114 and by a code excitation encoding unit 120, respectively, for producing respective encoded outputs The sinusoidal analytic encoding unit 114 performs pitch search on the encoded outputs for finding the pitch information from the input speech signal and sets the high-reliability pitch information based on the detected pitch information The results of pitch detection are determined using the high-reliability pitch information and the results of decision voicedness/unvoicedness of the frames other than the current frame
Patent•
First formant location determination and removal from speech correlation information for pitch detection

[...]

Mark A. Ireton1, John G. Bartkowiak1•
Advanced Micro Devices1
24 Oct 1997
TL;DR: In this article, an autocorrelation function is calculated for a range of time-delay values over which the dominant formant period and its multiples might be expected to occur.
Abstract: A vocoder system and method for estimating the pitch of a speech signal The speech signal comprises a stream of digitized speech samples The speech samples are partitioned into frames For each frame of the speech signal, the following processing steps are performed First, an optimal order-two inverse filter is determined based on the samples of the speech frame Second, a dominant formant frequency is calculated from the coefficients of the optimal order-two inverse filter Third, an autocorrelation function is calculated on the samples of the speech frame The autocorrelation is performed for a range of time-delay values over which the pitch period and its multiples might be expected to occur Fourth, the peaks of the autocorrelation function are analyzed incorporating the knowledge of the dominant formant period (which is the inverse of the dominant formant frequency) Normally, the dominant formant is the first formant Thus, the dominant formant period defines the expected time-delay for the first formant peak in the autocorrelation function As such, any peak in the autocorrelation function occurring with a time-delay equal to the dominant formant period is treated with increased caution before being accepted as the pitch period
A pitch pulse evolution model for linear predictive coding of speech

[...]

Peter Kabal, Jacek Stachurski
1 Jan 1997
TL;DR: A new speech compression technique designed for near toll quality speech coding at bit rates as low as 4 kb/s is presented, and a robust algorithm for extracting noisy pitch pulses from the LP residual based on error minimization with respect to a set of model pulses is developed.
Abstract: Speech coding is important in the effort to make more efficient use of digital telecommunication networks, particularly wireless systems, and to reduce the memory requirements in speech storage systems. The desire for a low-rate digital representation of speech is often contrary to the demand for a high quality speech reconstruction. In this thesis we present a new speech compression technique designed for near toll quality speech coding at bit rates as low as 4 kb/s. In low-rate speech coding based on linear prediction (LP), poor modelling of the LP excitation for voiced, quasi-periodic segments contributes to the degradation of the quality of the reconstructed speech. In this dissertation, we present a new speech coding method designed for improved modelling of the LP excitation. Conceptually, the LP excitation is decomposed into a series of underlying pitch pulses and a simultaneous unvoiced noise-like signal. The underlying pitch pulses are estimated from noisy observations, i.e. the pitch pulses extracted from the LP residual. Since the pulses change little from one time instant to another, we call our representation the Pitch Pulse Evolution (PPE) model. The PPE model provides a framework to analyze and effectively control the periodicity of voiced speech. We have developed a robust algorithm for extracting noisy pitch pulses from the LP residual based on error minimization with respect to a set of model pulses, and we have examined a number of methods for calculating the underlying pulses. The evolving pitch pulse waveshapes, the pulse positions, and the unvoiced signal are encoded separately. The positions and the shapes of the underlying pulses need only be coded infrequently, and the characteristics of intermediate pulses are obtained by interpolation. The software implementation of a 4 kb/s PPE coder is described. The main features of the implemented PPE coder are: a novel approach to pitch analysis; estimation of evolving pitch pulses which enables control over the pulse characteristics; and a unique coding scheme which avoids the time dilation and contraction of individual pitch pulses found in other waveform interpolation coders.
Proceedings Article•10.1109/TENCON.1997.648274•
Harmonic-plus-noise decomposition and its application in voiced/unvoiced classification

[...]

R. Ahn1, W.H. Holmes•
University of New South Wales1
2 Dec 1997
TL;DR: In this article, an improved algorithm to decompose the harmonic and the noise components of voiced speech is presented, which makes the method more accurate and robust by employing a harmonic extrapolation and a noise extrapolation in alternating iterative steps and by including a new pitch detection algorithm.
Abstract: In this paper, we present an improved algorithm to decompose the harmonic and the noise components of voiced speech. The improvements make the method more accurate and robust by employing a harmonic extrapolation and a noise extrapolation in alternating iterative steps and by including a new pitch detection algorithm. This new technique has been found to improve both the convergence and accuracy of separation of the harmonic and the noise components. In separating the noise and the harmonic components, this improved harmonic-plus-noise (H+N) decomposition method provides many useful ways to measure the strength of voicing. Two such measures are investigated with respect to their ability to discern voiced and unvoiced segments of speech. They are the harmonic-to-noise energy ratio and the sub-band harmonic-to-noise energy ratio. Tests show that these measures perform more reliably and more robustly in comparison to classical measures such as the zero-crossing rate, the LPC prediction gain, the 1/sup st/ LP coefficient and the RMS energy.
Robust hybrid pitch detector for pathologic voice analysis

[...]

B. Boyanov, S. Hadjitodorov, B. Teston, D. Doskov
1 Jun 1997
TL;DR: A hybrid speech period (To) detector characterizided by parallel analyses of three speech signals in temporal spectral and cepstral domains and preprocessing for periodic/aperiodic (unvoiced) separation (PAS) is proposed.
Abstract: A hybrid speech period (To) detector characterizided by parallel analyses of three speech signals in temporal spectral and cepstral domains and preprocessing for periodic/aperiodic (unvoiced) separation (PAS) is proposed. The preprocessing is realized by analysis in these three domains and PAS by multi layer Perceptron neural network.Two phonations of the wowel "a" of 40 speakers and 62 patients were analyzed. For the proposed detector errors were significantly minimized.
Proceedings Article•10.1109/SCFT.1997.623886•
An improved harmonic-plus-noise decomposition method and its application in pitch determination

[...]

R. Ahn1, W.H. Holmes•
University of New South Wales1
7 Sep 1997
TL;DR: In this article, an improved method to decompose the harmonic and the noise components of voiced speech is presented, which makes the method more accurate and robust by applying noise and harmonic separation in alternative iterations and by including a new pitch detection algorithm.
Abstract: This paper presents an improved method to decompose the harmonic and the noise components of voiced speech. The improvements make the method more accurate and robust by applying noise and harmonic separation in alternative iterations and by including a new pitch detection algorithm. This new pitch detection algorithm utilises the proposed decomposition as a main component. The results show that the improved decomposition converges faster than the original method.
Patent•10.1121/1.429364•
Efficient pitch estimation method

[...]

Ma Wei
19 Jun 1997-Journal of the Acoustical Society of America
TL;DR: In this paper, a method and means to estimate the pitch of a speech or acoustic signal within a vocoder is presented, where center clipping and low-pass filtering are used to eliminate the formants from the speech and acoustic signals.
Abstract: A method and means to estimate the pitch of a speech or acoustic signal within a vocoder begins with the center clipping and low-pass filtering of the speech or acoustic signal to eliminate the formants from the speech or acoustic signal. An error function for each pitch is calculated for each pitch within the speech or acoustic signal. A fast tracking method is used to select the estimated pitch for the pitch or acoustic signal. A final check for the doubling of the pitch will minimize any incorrect estimation of the pitch.
Proceedings Article•10.1109/ICASSP.1997.596216•
Robust pitch detection of speech signals using steerable filters

[...]

Jinhai Cai1, Zhi-Qiang Liu•
University of Melbourne1
21 Apr 1997
TL;DR: The novel pitch determination algorithms employ steerable filters to obtain the direction of pitch change and make full use of the information within an analysis frame, but also optimally utilize the information from neighbor frames by taking the advantage of the pitch direction.
Abstract: Most of the well known and widely used pitch determination algorithms are frame-based. They only consider the speech local stationarity within the analysis frame. However, our novel pitch determination algorithms employ steerable filters to obtain the direction of pitch change. Therefore, the proposed algorithms not only make full use of the information within an analysis frame, but also optimally utilize the information from neighbor frames by taking the advantage of the pitch direction. This allows us to use more than one frame to enhance pitch peaks for non-stationary, noisy speech signals. As a result, the proposed algorithms are superior to conventional methods in term of accuracy and reliability, and is robust to noise. Besides, the direction of pitch change can be estimated in different domains. Therefore, our algorithms can be applied in either time or frequency domain, or both of them.
Proceedings Article•10.1109/PACRIM.1997.620358•
OSLP: a new technique in linear prediction of speech

[...]

S. Dhanjal
20 Aug 1997
TL;DR: In the work reported in this paper, pitch detection algorithms, other than the SIFT, were employed to investigate if the overall performance of OSLP can be further improved.
Abstract: Odd sample linear prediction (OSLP) is a relatively new and efficient technique for analysis/synthesis of speech signals. OSLP is based on the classical theory of linear prediction. The performance of OSLP was further improved by the author. Pitch detection constitutes an important component of this technique. In the work reported so far, the SIFT algorithm, due to Markel and Grey (1976), was used for pitch detection mainly because the SIFT is also based on the classical theory of linear prediction. However, many, virtually dozens of, pitch detection algorithms have been reported in literature. In the work reported in this paper, pitch detection algorithms, other than the SIFT, were employed to investigate if the overall performance of OSLP can be further improved.
Proceedings Article•10.1109/ICICS.1997.647151•
Investigation of the spectral envelope estimation vocoder and improved pitch estimation based on the sinusoidal speech model

[...]

Weihua Zhang1, Hyun-Soo Kim, W.H. Holmes•
University of New South Wales1
9 Sep 1997
TL;DR: This paper investigates the properties and limitations of the SEEVOC algorithm, and analysis of the effect of inaccurate coarse pitch gives a new insight into the spectral envelope estimator, and shows that it is important to start with a reasonably accurate coarse pitch value.
Abstract: In most low rate speech coders the quality of the synthesized speech depends greatly on the performance of the spectral coding stage, in which the spectral envelope is encoded. The spectral envelope estimation vocoder (SEEVOC) is a successful spectral envelope coding method, and also plays an important role in speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, which requires an estimate of the coarse pitch. Our analysis of the effect of inaccurate coarse pitch gives a new insight into the spectral envelope estimator, and shows that it is important to start with a reasonably accurate coarse pitch value. Also, we optimize and generalize the properties of the SEEVOC algorithm and propose a new method to improve pitch estimation based on the sinusoidal speech model.
Patent•
Detection method of musical performance position and detection method of pitch

[...]

Szalay Andreas1•
Yamaha Corporation1
10 Jan 1997
TL;DR: In this article, a neural network was used to detect a playing position and also to accurately and quickly detect a pitch in a MIDI output part and convert it into MIDI data, which were then sent to a sound source.
Abstract: PURPOSE: To detect a playing position and also to accurately and quickly detect a pitch CONSTITUTION: A 1st pitch detection part 13 detects the pitch quickly by making good use of a neural net 15 and also detects the playing position A 2nd pitch detection part 12 detects the accurate pitch from a zero-cross point A comparison part 17 outputs the pitch which is detected early and supplies it to a QUANTIZER 18 Playing position data and pitch data quantized by the QUANTIZER 18 are supplied to a MIDI output part 19 and converted into MIDI data, which are supplied to a sound source(TG) 20
Patent•
Very low bit rate time domain speech analyzer for voice messaging

[...]

Jian-Cheng Huang1, Floyd D. Simpson1, Xiaojun Li1•
Motorola1
7 Jan 1997
TL;DR: A speech analyzer (107) as mentioned in this paper compresses a voice message for transmission and includes an LPC analyzer which derives spectral vectors from segments of speech; a memory (1910) which stores predetermined spectral vectors identified by indexes, the indexes also identifying predetermined voicing vectors stored within a receiver.
Abstract: A speech analyzer (107) compresses a voice message for transmission and includes an LPC analyzer (406) which derives spectral vectors from segments of speech; a memory (1910) which stores predetermined spectral vectors identified by indexes, the indexes also identifying predetermined voicing vectors stored within a receiver; a quantizer (422) which compares the spectral vector derived with the predetermined spectral vectors to select one of the predetermined spectral vectors; and an output buffer for storing the index identifying the predetermined spectral vector selected. The speech analyzer (107) also includes a pitch determiner (414) which includes a pitch function generator (414) which generates a pitch function from a segment of speech. A pitch enhancer (1116) enhances the pitch function of a current segment of speech utilizing the pitch function of one or more sequential segments of speech and a pitch detector (1118) detects the pitch of the current segment of speech.
Proceedings Article•10.1109/SCFT.1997.623876•
Pitch estimation using spectral covariance method for low-delay MBE vocoder

[...]

Yong Duk Cho1, Hong Kook Kim, Moo Young Kim, Sang Ryong Kim•
Samsung1
7 Sep 1997
TL;DR: In this article, the authors proposed a weighted spectral AbS method which can reduce gross pitch errors without extra-lookahead in comparison with spectral ABS method, where the weight in the proposed method is defined by the covariance of excitation spectrum of speech signal to compensate for the gross pitch error.
Abstract: The synthesized speech quality of the multiband excitation (MBE) vocoder highly depends on the accuracy of the pitch estimation, and it is uses pitch tracking and spectral analysis-by-synthesis (AbS) procedures to determine the fine pitch with reduced gross pitch errors. However, pitch tracking has limits in practice because it requires a large lookahead. Also, the spectral AbS, if used alone, results in frequent gross pitch errors. So we propose a weighted spectral AbS method which can reduce gross pitch errors without extra-lookahead in comparison with spectral AbS method. The weight in the proposed method is defined by the covariance of excitation spectrum of speech signal to compensate for the gross pitch error. From the comparison of the pitch contours between the spectral AbS and the weighted spectral AbS methods, it is confirmed that the proposed method considerably reduces gross pitch errors rather than spectral AbS method.
Patent•
Hermonic pitch detector

[...]

Takahashi Satoshi, Kubota Tatsuo
15 Aug 1997
TL;DR: In this paper, a low pass filter was used to detect a pitch automatically with a low erroneous detection rate, even when a target signal was under a low S/N ratio.
Abstract: PROBLEM TO BE SOLVED: To detect a pitch automatically with a low erroneous detection rate even when a target signal is under a low S/N ratio. SOLUTION: Sound emitted from a target is received by a microphone 1 and after passing through a low pass filter 2, the sound undergoes an A/D conversion 3 to make a digital data. Thereafter, a power spectrum is determined by an FFT processor 4 and a detection processor 5. A frequency averaging processing 6 and a time averaging processing are performed for the power spectrum to improve the S/N ratio, moreover, logarithm for the signal is obtained by a logarithm processor 8 and then, the results undergo an FFT 9 again to determine capstrum. The capstrum obtained is subjected to a averaging processing 10 and time averaging processing 11 to improve the S/N. Finally, a peak detection and a judgment processing are performed by a post processor 12 to detect a target hermonic pitch.
Proceedings Article•10.1049/IC:19971372•
Poincaré maps and pitch detection in speech

[...]

I. N. Mann1, S. McLaughlin1•
University of Edinburgh1
1 Dec 1997
TL;DR: In this paper, a nonlinear algorithm is proposed for epoch marking in voiced speech signals, which operates entirely in the state space, by operating on a 3D reconstruction of the speech signal which is formed by embedding.
Abstract: A novel nonlinear algorithm is proposed for epoch marking in voiced speech signals. In speech coding, synthesis and recognition epoch detection is necessary, as it estimates the moment of glottal closure and the instantaneous pitch. The technique functions entirely in the state space, by operating on a 3 dimensional reconstruction of the speech signal which is formed by embedding. The fact that one revolution of this reconstructed attractor is equal to one pitch period is used to find points which are pitch synchronous by the use of a Poincare section. The epoch pulses are pitch synchronous and therefore can be marked. Results from applying the technique to real speech signals are presented to illustrate its performance. (5 pages)
Patent•
Pitch detector for waveform of speech

[...]

Terada Takahiko, Fukuda Hiroaki, Higashiyama Mikio, Hirata Takayoshi
18 Mar 1997
TL;DR: In this article, a pitch detector for speech recognition, speech synthesis, automatic melody writing, grading of karaoke, diagnosis of machine, etc., in which the detection accuracy is enhanced while simplifying the processing and the structure.
Abstract: PROBLEM TO BE SOLVED: To obtain a pitch detector being employed in speech recognition, speech synthesis, automatic melody writing, grading of karaoke, diagnosis of machine, etc., in which the detection accuracy is enhanced while simplifying the processing and the structure. SOLUTION: The pitch detector 1 for receiving a speech waveform and detecting the pitch of basic wave comprises means 3 for extracting a plurality of orthogonal function components at each period forming the waveform of speech sequentially on the order of energy contribution thereto and outputting the extracted components, and means 4 for extracting one of a plurality of orthogonal function components as a pitch based on the relative periodical relationship among them.
Proceedings Article•10.1109/TENCON.1997.648529•
A new approach to pitch and voicing detection through spectrum periodicity measurement

[...]

S. Ghaemmaghami1, Mohamed Deriche, Boualem Boashash•
University of Queensland1
2 Dec 1997
TL;DR: In this paper, a new method for detecting pitch and voicing information of speech with a high accuracy was proposed based on a novel approach to using the concept of instantaneous frequency (IF), in which an estimation technique in the frequency domain is employed to expose the harmonic structure of the signal using a periodicity measure.
Abstract: A new method for detecting pitch and voicing information of speech with a high accuracy is addressed. The method is based on a novel approach to using the concept of instantaneous frequency (IF). In this method an IF estimation technique in the frequency domain is employed to expose the harmonic structure of the signal using a periodicity measure. This measure is based on the flatness of the IF, which describes the spectrum periodicity within a certain frequency band where the pitch harmonics are most likely found. The flatness measurement also yields voicing information extracted using an auto-thresholding technique. The proposed method was evaluated through comparison with cepstral pitch and voicing detection considering accuracy and reconstructed speech quality.
Proceedings Article•10.1109/TENCON.1997.648271•
Investigation on the spectral envelope estimator (SEEVOC) and refined pitch estimation based on the sinusoidal speech model

[...]

Hyun Soo Kim1, H. Nolmes, Weihua Zhang•
University of New South Wales1
2 Dec 1997
TL;DR: This paper investigates the properties and limitations of the SEEVOC algorithm, and analysis of the effect of inaccurate coarse pitch gives a new insight into the spectral envelope estimator, and shows that it is important to start with a reasonably accurate coarse pitch value.
Abstract: The quality of the synthesized speech in most low bit rate speech coders, depends greatly on the performance of the spectral coding stage, in which the spectral envelope is encoded. The spectral envelope estimation vocoder (SEEVOC) is a successful spectral envelope coding method, and also plays an important role in speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, which requires an estimate of the coarse pitch. Our analysis of the effect of inaccurate coarse pitch gives a new insight into the spectral envelope estimator, and shows that it is important to start with a reasonably accurate coarse pitch value. Also, we optimize and generalize the properties of the SEEVOC algorithm and propose a new method to improve pitch estimation based on the sinusoidal speech model.
Patent•
Method and system for recognition synthesis encoding and decoding of speech

[...]

Masami Akamine1, Akinori Koshiba, 亮典 小柴, 政巳 赤嶺•
Toshiba1
18 Mar 1997
TL;DR: In this paper, a speech encoding/decoding system based upon recognition synthesis is proposed, which can be applied with incomplete speech recognition technology to encode a speech signal at a very low rate of 1kbps or less and transmit even non-linguistic information on a feeling of a speaker.
Abstract: PROBLEM TO BE SOLVED: To provide a speech encoding/decoding system based upon recognition synthesis which can be applied with incomplete speech recognition technology to encode a speech signal at a very low rate of 1kbps or less and transmit even nonlinguistic information on a feeling, etc, of a speaker SOLUTION: On a transmission side, input speech data are inputted to a pitch detection part 101, a phoneme recognition part 102, and a continuance detection part 103 to detect a pitch period, recognize a syllable, and the continuance of a phoneme, information on the pitch period, syllable, and continuance is encoded by encoding circuits 104, 105, and 106, and then the code sequence is transmitted to a channel through a multiplexer 107 On a reception side, a demultiplexer 110 decode the code sequence into the information on the pitch period, syllable, and continuance and on the basis of the decoded information, a synthesizer 114 synthesizes the original speech signal

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve