Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Pitch detection algorithm
  4. 2004
  1. Home
  2. Topics
  3. Pitch detection algorithm
  4. 2004
Showing papers on "Pitch detection algorithm published in 2004"
Patent•
Pitch detection of speech signals

[...]

Kabi Prakash Padhi1, George Sapna1•
STMicroelectronics1
23 Sep 2004
TL;DR: In this article, the pitch detection of speech signals finds numerous applications in karaoke, voice recognition and scoring applications using frequency domain methods, while most of the existing techniques rely on time domain methods.
Abstract: Pitch detection of speech signals finds numerous applications in karaoke, voice recognition and scoring applications. While most of the existing techniques rely on time domain methods, the invention utilizes frequency domain methods. There is provided a method and system for determining the pitch of speech from a speech signal. The method includes the steps of: producing or obtaining the speech signal; distinguishing the speech signal into voiced, unvoiced or silence sections using speech signal energy levels; applying a Fourier Transform to the speech signal and obtaining speech signal parameters; determining peaks of the Fourier transformed speech signal; tracking the speech signal parameters of the determined peaks to select partials; and determining the pitch from the selected partials using a two-way mismatch error calculation.

46 citations

Musical computer games played by singing

[...]

Perttu Hämäläinen1, Teemu Mäki-Patola1, Ville Pulkki1, Matti Airas1•
Helsinki University of Technology1
1 Jan 2004
TL;DR: Pitch-based control for novel games for musical education by mapping pitch to the position of a game character provides visual feedback that helps to learn to control your voice and sing in tune.
Abstract: Although voice has been used as an input modality in various user interfaces, there are no reports of using pitch of the user’s voice for real-time control of computer games. This paper explores pitch-based control for novel games for musical education. Mapping pitch to the position of a game character provides visual feedback that helps you to learn to control your voice and sing in tune. As demonstrated by two example games in this paper, the approach can be applied to both single and two-player games even with just one microphone.

45 citations

Journal Article•10.1016/J.SPECOM.2003.05.001•
Modification of pitch using DCT in the source domain

[...]

R. Muralishankar1, A. G. Ramakrishnan1, P. Prathibha1•
Indian Institute of Science1
01 Feb 2004-Speech Communication
TL;DR: Results indicate that the novel algorithm for pitch modification results in acceptable speech in terms of all these parameters for pitch change factors required for speech synthesis work.

44 citations

Proceedings Article•10.1145/1027527.1027588•
Unsupervised soccer video abstraction based on pitch, dominant color and camera motion analysis

[...]

F. Coldefy1, Patrick Bouthemy1•
French Institute for Research in Computer Science and Automation1
10 Oct 2004
TL;DR: A deterministic combination of excited speech detection, dominant color identification and camera motion analysis is performed in order to discriminate between excitedspeech sequences of the game and excited speech sequences in commercials or in studio shots included in the processed TV programs.
Abstract: We present a soccer video abstraction method based on the analysis of the audio and video streams. This method could be applied to other sports as rugby or american football. The main contribution of this paper is the design of an unsupervised summarization method, and more specifically, the introduction of an efficient detector of excited speech segments. An excited commentary is supposed to correspond to an interesting moment of the game. It is simultaneously characterized by an increase of the pitch (or fundamental frequency) within the voiced segments and an increase of the energy supported by the harmonics of the pitch. The pitch is estimated from the autocorrelation function and its local increases are detected from a multiresolution technique. We introduce a specific energy measure for the voiced segments. A statistical analysis of the energy measures is performed to detect the most excited parts of the speech. A deterministic combination of excited speech detection, dominant color identification and camera motion analysis is then performed in order to discriminate between excited speech sequences of the game and excited speech sequences in commercials or in studio shots included in the processed TV programs.The method presented here does not need any learning stage. It has been tested on seven soccer videos for a total duration of almost 20 hours.

40 citations

Patent•10.1121/1.2434312•
System and method for combined frequency-domain and time-domain pitch extraction for speech signals

[...]

Tenkasi V. Ramabadran1, Alexander Sorin1•
IBM1
31 Mar 2004-Journal of the Acoustical Society of America
TL;DR: In this article, a system, computer readable medium, and method for sampling a speech signal, dividing the sampled speech signal into overlapped frames, extracting first pitch information from a frame using frequency domain analysis, providing at least one pitch candidate, each being associated with a spectral score, from the first pitch, each of the pitch candidate representing a possible pitch estimate for the frame, extracting second pitch information, and selecting one of the at least pitch candidate to represent the pitch estimate of the frame.
Abstract: A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.

36 citations

Patent•10.1121/1.2748587•
Speech information processing method and apparatus and storage medium using a segment pitch pattern model

[...]

Toshiaki Fukada1•
Canon Inc.1
18 Oct 2004-Journal of the Acoustical Society of America
TL;DR: In this article, a segment pitch pattern model is used to model time change in a fundamental frequency of a phoneme belonging to a predetermined phonemic environment with a polynomial segment model.
Abstract: A speech information processing apparatus and method performs speech recognition. Speech is input, and feature parameters of the input speech are extracted. The feature parameters are recognized based on a segment pitch pattern model. The segment pitch pattern model may be obtained by modeling time change in a fundamental frequency of a phoneme belonging to a predetermined phonemic environment with a polynomial segment model. The segment pitch pattern model may also be obtained by modeling with at least one of a single mixed distribution and a multiple mixed distribution.

36 citations

Patent•
Pitch detection method and apparatus

[...]

Kwang-cheol Oh1•
Samsung1
21 Oct 2004
TL;DR: A pitch detection method and apparatus as mentioned in this paper includes a data rearrangement unit which rearranges voice data on the basis of a center peak of the voice data included in a single frame.
Abstract: A pitch detection method and apparatus, the pitch detection apparatus includes: a data rearrangement unit which rearranges voice data on the basis of a center peak of the voice data included in a single frame; a decomposition unit which decomposes rearranged voice data into even symmetrical components on the basis of a center peak; a pitch determination unit which obtains a segment correlation value between a reference point and at least one or more local peaks in relation to even symmetrical components, and determines the location of a local peak corresponding to a maximum segment correlation value among the obtained segment correlation values, as a pitch period.

27 citations

Book Chapter•10.1007/978-3-540-30548-4_67•
A novel pitch period detection algorithm based on hilbert-huang transform

[...]

Zhihua Yang1, Daren Huang1, Lihua Yang1•
Sun Yat-sen University1
13 Dec 2004
TL;DR: A novel event detection pitch detector is presented Hilbert-Huang Transform is employed to locate the instant at which the glottal pulse takes place and the pitch period is detected accurately by measuring the time interval between twoglottal pulses.
Abstract: In this paper, a novel event detection pitch detector is presented Hilbert-Huang Transform is employed to locate the instant at which the glottal pulse takes place Then, the pitch period is detected accurately by measuring the time interval between two glottal pulses Experiments show encouraging detection results

26 citations

Journal Article•
High accuracy and octave error immune pitch detection algorithms

[...]

Marek Dziubinski1, Bozena Kostek1•
Gdańsk University of Technology1
01 Jan 2004-Archives of Acoustics
TL;DR: A method improving pitch estimation accuracy, showing high performance for both synthetic harmonic signals and musical instrument sounds is presented, and octave error optimized pitch detection algorithm, based on spectral analysis is introduced.
Abstract: The aim of this paper is to present a method improving pitch estimation accuracy, showing high performance for both synthetic harmonic signals and musical instrument sounds. This method employs an Artificial Neural Network of a feed-forward type. In addition, octave error optimized pitch detection algorithm, based on spectral analysis is introduced. The proposed algorithm is very effective for signals with strong harmonic, as well as nearly sinusoidal contents. Experiments were performed on a variety of musical instrument sounds and sample results exemplifying main issues of both engineered algorithms are shown.

16 citations

Proceedings Article•10.1109/ISCAS.2004.1328785•
Multiple pitch estimation of poly-phonic audio signals in a frequency-lag domain using the bispectrum

[...]

Saman S. Abeysekera1•
Nanyang Technological University1
23 May 2004
TL;DR: This paper describes a multiple pitch estimation technique that is based on the bispectrum of the audio signal that is a relatively easier task than removing components from the one dimensional autocorrelation, as conventionally done.
Abstract: This paper describes a multiple pitch estimation technique that is based on the bispectrum of the audio signal. Via the bispectrum a 2D (frequency-lag) distribution is computed, that is used for the pitch estimation. The estimated pitch components are then removed from the frequency-lag distribution, recursively, to estimate the subsequent pitch frequencies. The use of two dimensions makes the filtering a relatively easier task than removing components from the one dimensional autocorrelation, as conventionally done. Excellent multiple pitch estimation results are demonstrated via simulations.

15 citations

Journal Article•10.1109/TNN.2004.832818•
A temporal-analysis-based pitch estimation system for noisy speech with a comparative study of performance of recent systems

[...]

A. Khurshid, Susan L. Denham
01 Sep 2004-IEEE Transactions on Neural Networks
TL;DR: The proposed pitch estimation system is designed to be robust to challenging noise conditions by developing a new representation of the speech signal, based on the operation of damped harmonic oscillators and temporal mode analysis of their output.
Abstract: In this paper, a new system of pitch estimation is presented. The system is designed to be robust to challenging noise conditions. This robustness to the presence of noise in the signal is achieved by developing a new representation of the speech signal, based on the operation of damped harmonic oscillators (DHOs), and temporal mode analysis of their output. The resulting representation is shown to possess qualities that are only gradually degraded in the presence of noise. A harmonic grouping based system is used to estimate the pitch frequency. This method is easily extended to simultaneously track the pitch of more than one speaker. In a series of experiments the accuracy and noise robustness of the proposed system was compared with that of a number of prominent pitch estimation and tracking systems. The results show that the proposed system's overall performance is much better than any of the other systems tested, especially in the presence of very large amounts of noise. Furthermore, the proposed system is comparatively inexpensive in terms of processing and memory requirements.
Patent•
Apparatus and method for detecting a pitch for a voice signal in a voice codec

[...]

Jeong Wook Seo, Hwan Kim, Lee Yang Hyun, Keun Sung Bae, Siho Kim, Lee Seung Won 
6 Jul 2004
TL;DR: In this article, the pitch detection apparatus for use in a vocoder includes a bandwidth expansion unit for performing an inverse-filtering process and bandwidth expansion process on an input voice signal, and generating a bandwidth-expanded residual signal.
Abstract: An apparatus and method for detecting a pitch of a voice signal in a codec. The pitch detection apparatus for use in a vocoder includes a bandwidth expansion unit for performing an inverse-filtering process and a bandwidth expansion process on an input voice signal, and generating a bandwidth-expanded residual signal; a pitch analyzer for calculating a time autocorrelation function and a spectral autocorrelation function of the bandwidth-expanded residual signal, mixing the time autocorrelation function and the spectral autocorrelation function, comparing an autocorrelation function calculated by dividing a pitch acquired from the mixed autocorrelation function by an integer multiple with another autocorrelation function acquired at a predetermined pitch, and determining a point or position having the highest value to be an open-loop pitch; a pitch smoothing unit for smoothing the open-loop pitch using an average pitch value when the detected open-loop pitch is outside of a predetermined range of a previous frame; and a pitch quantizer for quantizing the smoothened open-loop pitch into predetermined levels, and generating the quantized result.
Proceedings Article•10.1145/1027527.1027591•
A robust on-the-fly pitch (OTFP) estimation algorithm

[...]

S. Sood1, Ashok K. Krishnamurthy1•
Ohio State University1
10 Oct 2004
TL;DR: Application of the idea for a robust On-the-Fly pitch (OTFP) detection is demonstrated and comparison with robust YIN pitch detector has yielded encouraging results.
Abstract: Pitch detection or fundamental frequency (f0) estimation is a classical research topic and has been extensively studied for many years. Pitch estimation by embedding speech signal into multiple state-space dimensions is a relatively recent technique. Also YIN pitch detection algorithm [1] has been cited recently as an improvement over other standard pitch estimation algorithms. In this paper an attempt is made to present a unifying view on some of these existing and seemingly disparate techniques. The unified view enables the development of robust formulations of some existing definitions and also helps to interpret the limitations of the classical/existing approaches in use. Application of the idea for a robust On-the-Fly pitch (OTFP) detection is demonstrated and comparison with robust YIN pitch detector has yielded encouraging results. The On-The-Fly imposes a constraint that pitch or aperiodicity estimates from past or future speech frames are not to be used at a post processing stage and OTFP outperforms the YIN estimator with this constraint.
Patent•
Method and apparatus to modify pitch estimation function in acoustic signal musical note pitch extraction

[...]

Timo Kosonen1•
Nokia1
24 Sep 2004
TL;DR: In this paper, the authors proposed a method to estimate pitch in an acoustic signal, which can be applied to pitch extraction with various different input acoustic signal characteristics, such as just intonation, pitch shift in the frequency domain, and non-12-step equal-temperament tuning.
Abstract: In one aspect thereof this invention provides a method to estimate pitch in an acoustic signal. The method includes initializing a function ƒ t and a time t, where t=0, x′ 0 =ƒ 0 (F 0 ), x′ 0 is a pitch estimate at time zero and F 0 is a frequency of the acoustic signal at time zero; determining at least one pitch estimate using the function x′ t =ƒ t (F t ) by an iterative process of creating ƒ t+1 (F t+1 ) based at least partly on pitch estimates x′ t , x′ t−1 , x′ t−2, x′ t−3 , . . . , and functions ƒ t (F t ), ƒ t−1 (F t−1 ), ƒ t−2 (F t−2 ), ƒ t−3 (F t−3 ) . . . and incrementing t; and calculating at least one final pitch estimate. Embodiments of this invention can be applied to pitch extraction with various different input acoustic signal characteristics, such as just intonation, pitch shift in the frequency domain, and non-12-step-equal-temperament tuning.
Patent•10.1121/1.2372374•
Speech recognition using dual-pass pitch tracking

[...]

Eric Chang1, Jian-Lai Zhou1•
Microsoft1
02 Jun 2004-Journal of the Acoustical Society of America
TL;DR: A computationally efficient and robust pitch detection and tracking system and related methods are presented in this paper, where a method is presented comprising identifying an initial set of pitch period candidates using a first estimation algorithm, filtering the initial set and passing the filtered candidates through a second, more accurate pitch estimation algorithm to generate a final set of candidates from which the most likely pitch value is selected.
Abstract: A computationally efficient and robust pitch detection and tracking system and related methods are presented. According to certain exemplary implementations a method is presented comprising identifying an initial set of pitch period candidates using a first estimation algorithm, filtering the initial set of candidates and passing the filtered candidates through a second, more accurate pitch estimation algorithm to generate a final set of pitch period candidates from which the most likely pitch value is selected.
Patent•
Device and program for imparting sound effect

[...]

Rosukosu Alex, Yasuo Yoshioka, ロスコス アレックス, 靖雄 吉岡
24 Jun 2004
TL;DR: In this paper, a pitch detection part 2 analyzes a sound input from an input part 1 to detect its pitch and an output pitch calculation part 3 adds together a short-time mean value of the pitch of the input sound and the value obtained by multiplying the difference between the shorttime mean values of the current pitch and the said pitch by a coefficient A to calculate a new output pitch.
Abstract: PROBLEM TO BE SOLVED: To impart variation in intonation to an input sound by a simple method. SOLUTION: A pitch detection part 2 analyzes a sound input from an input part 1 to detect its pitch. An output pitch calculation part 3 adds together a short-time mean value of the pitch of the input sound and the value obtained by multiplying the difference between the short-time mean values of the current pitch and the said pitch by a coefficient A to calculate a new output pitch. A pitch converter 5 converts the input sound to the new output pitch calculated by the output pitch calculation part 3 and outputs the result. Not the pitch, but sound volume etc., can be varied as well. A time as an object of the short-time mean calculation and the coefficient A are specified from a parameter specification part 4 to optionally vary the degree of intonation variation. COPYRIGHT: (C)2006,JPO&NCIPI
Quantification of the Tonal Prominence of Complex Tones in Machinery Noise

[...]

Kyoung Hoon Lee, Patricia Davies, Aimée M. Surprenant
1 Jan 2004
TL;DR: This paper describes how many of the sounds that are usually heard evoke pitch sensation, and proposes various models to explain the mechanism of the pitch perception of complex tones.
Abstract: This paper describes how many of the sounds that are usually heard evoke pitch sensation. A singing voice, speech and the sounds of various musical instruments are some of examples. In general, those sounds are composed of many harmonic or inharmonic tones but they can produce a pitch sensation as a whole, known as “virtual pitch” or “residue pitch”. For an engineering point of view, it is important to measure the strength of the pitch sensation quantitatively. Metrics like Tone-to-Noise Ratio, Prominence Ratio and Aures’ Tonality only measure the tonal prominence of individual tones, so these metrics are not appropriate to be used for harmonic complex signals. Various models have been proposed in this paper to explain the mechanism of the pitch perception of complex tones. Those models can be largely divided into two large categories based on pattern recognition theory and temporal theory. In pattern recognition theory it is assumed that the pitch of a complex tone is derived by a central processor which uses neural signals generated by individual tonal components as inputs. On the other hand, in temporal theory it is assumed that the pitch perception is related to the time intervals between nerve firings evoked by the sound stimulus. The spectral-pitch pattern (SP pattern) and the virtual-pitch pattern (VP pattern) are the outputs of the algorithm. The SP pattern represents the pitch sensation directly related to the spectral shape of a complex tone, whereas the VP pattern is derived from the SP pattern by a process called “subharmonic coincidence assessment”. The application of interest is noise in cabs of large earth moving machinery which can contain several harmonic families. A robust signal processing algorithm has been written to automatically detect the families of tones given some basic information on the diesel engine characteristics. The next stage is to process this information to predict the strength of perceived pitch or pitches that are associated with these families. As a first step in addressing this challenging problem, some preliminary work examining the utility model for analyzing one harmonic tone complex combined with an additional single tone has been conducted. A series of subjective experiments are described below and the results are compared to predictions.
Patent•
Device and method to analyze sound signal

[...]

Funaki Tomoyuki
16 Dec 2004
TL;DR: In this article, the authors proposed a pitch detection method based on waveform segment analysis and a filtering process having a prescribed frequency characteristic is conducted for the sound signals, then an analysis is performed to determine the degree of agreement between adjacent waveforms based on continuously sampled amplitude values of the sound signal after the filtering process.
Abstract: PROBLEM TO BE SOLVED: To make analyzable a stationary portion of musical sound other than a fluctuating portion, i.e., a portion equivalent to one note even though pitch or level of inputted sound from a microphone or the like is delicately fluctuated. SOLUTION: Arbitrary sound signals are inputted and a filtering process having a prescribed frequency characteristic is conducted for the sound signals. Then, an analysis is performed to determine the degree of agreement between adjacent waveforms based on continuously sampled amplitude values of the sound signals after the filtering process. A segment, which is made of a plurality of waveforms that are analyzed and determined to be agreed with each other within a range following a prescribed condition in the analysis result, is detected as a same waveform segment and the pitch of the sound signals in the detected same waveform segment is detected. In the same waveform segment, the pitch of the inputted sound signals is stable and the segment is suitable as the objective segment of a pitch detection process. Thus, appropriate pitch detection of the inputted sound signals can be accurately and easily conducted. COPYRIGHT: (C)2005,JPO&NCIPI
Journal Article•
On the Detection of Melodic Pitch in a Percussive Background

[...]

Preeti Rao, Saurabh Shandilya
15 Apr 2004-Journal of The Audio Engineering Society
TL;DR: In this article, the performance of pitch detection in the presence of percussive interference has been investigated in the context of music recognition and metadata in audio content retrieval systems, where the authors present an experimental study of the pitch estimation errors obtained on a set of synthetic musical signals, and the effectiveness of the auditory-perception-based modules of the Meddis-Hewitt pitch detection algorithm in improving the robustness of fundamental frequency tracking in the case of Percussian interference is discussed.
Abstract: The extraction of pitch (or fundamental frequency) information from polyphonic audio signals remains a challenging problem. The specific case of detecting the pitch of a melodic instrument playing in a percussive background is presented. Time-domain pitch detection algorithms based on a temporal autocorrelation model, including the Meddis–Hewitt algorithm, are considered. The temporal and spectral characteristics of percussive interference degrade the performance of the pitch detection algorithms to various extents. From an experimental study of the pitch estimation errors obtained on a set of synthetic musical signals, the effectiveness of the auditory-perception–based modules of the Meddis–Hewitt pitch detection algorithm in improving the robustness of fundamental frequency tracking in the presence of percussive interference is discussed. The problem of pitch (or fundamental frequency) extraction of periodic signals in the presence of interfering sounds and noise is an important problem in both speech and music applications. Apart from the value of pitch information per se, a knowledge of the time-varying fundamental frequency can be useful in the separation and reconstruction of a harmonic source from a sound mixture. A number of pitch detection algorithms (PDAs) have been proposed over the decades. But while each has had a measure of success in the targeted application, no single PDA is found suitable for all types of signals and conditions. This engineering report presents an investigation of the performance of some well-known PDAs in estimating the fundamental frequency of a melodic instrument playing in the presence of percussive background. This is a restricted case of the larger problem of musical pitch detection in polyphony. Nevertheless it is an important problem. For instance, classical Indian vocal and instrumental music is always accompanied by percussive instruments providing the rhythmic structure. The melody itself is strongly characterized by the presence of microtones and continuous pitch variation. Detecting the melodic pitch contour has important applications in music recognition and for generating metadata in audio content retrieval systems.
Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Criterion for Simultaneous Speech

[...]

Hirokazu Kameoka1, Takuya Nishimoto1, Shigeki Sagayama1•
University of Tokyo1
1 Jan 2004
TL;DR: A co-channel multi-pitch detection algorithm that operates without a priori information of F0 contours and a restriction of a number of speakers, and it also extracts accurate F0s as continuous values with simple procedures in spectral domain.
Abstract: In this paper, a co-channel multi-pitch detection algorithm is described. We suggest the importance of this when prosodic information is need to be extracted separately from respective F0 patterns of concurrent utterances. Though temporal continuity of speech prosody should be considered, we discuss a process done independently on each single frame as the first step. A model of multiple harmonic structures is constructed with a mixture of tied Gaussian mixtures with which a single harmonic structure is modeled. Our algorithm enables to detect both a number of concurrent speakers, and each spectral envelope of underlying harmonic structure based on a maximum likelihood estimation of the model parameters using EM algorithm and an information criterion. It operates without a priori information of F0 contours and a restriction of a number of speakers, and it also extracts accurate F0s as continuous values with simple procedures in spectral domain. Experiments showed our algorithm outperformed well-known cepstrum for both speech signals of a single speaker and simultaneous two speakers.
Proceedings Article•10.21437/INTERSPEECH.2004-761•
A study of tone classification for continuous Thai speech recognition.

[...]

Tan Li, Montri Karnjanadecha1, Thanate Khaorapapong1•
Prince of Songkla University1
4 Oct 2004
TL;DR: Experimental results showed that the pitch value of a tone with no final consonants has more variation than one withfinal consonants, and the classification for tones with voiced final consonant gave better performance than tones with unvoiced final consonantes.
Abstract: This paper presents a study of tone classification for continuous Thai speech recognition. A modified autocorrelation algorithm was implemented with pitch detection, and the tone classifier utilized 3-layer feed-forward neural network with back-propagation. The best performance configuration of tone features was obtained with semi-tone scaling and mean-normalization producing a classification accuracy of 72.21%. Also, after considering the effects of final consonants, the average performance of the tone classifier improved to 77.13%. Experimental results showed that the pitch value of a tone with no final consonants has more variation than one with final consonants. Also the classification for tones with voiced final consonant gave better performance than tones with unvoiced final consonants.
Proceedings Article•10.1109/ICASSP.2004.1325936•
A novel method for computation of periodicity, aperiodicity and pitch of speech signals

[...]

Om D. Deshmukh1, J. Singh1, Carol Y. Espy-Wilson1•
University of Maryland, College Park1
17 May 2004
TL;DR: Improvements to the previously proposed algorithm to compute the proportion of periodic and aperiodic energies in speech signals and to estimate the pitch period significantly outperforms a method based on cepstral coefficients in the task of estimating the SNRs.
Abstract: The paper presents improvements to our previously proposed algorithm to compute the proportion of periodic and aperiodic energies in speech signals and to estimate the pitch period. Although previously the periodic and aperiodic energies were estimated independently of each other at each frame, a binary decision was made at each of the non-silent channels. We present an extension that replaces the binary decision with a measure of the degree of periodicity and aperiodicity in each channel. Evaluation on synthetic speech-like data shows a better agreement in the estimated SNR and the actual SNR by using this improvement. Moreover, in the task of estimating the SNRs, this method significantly outperforms a method based on cepstral coefficients. When the method is evaluated on a speech database, the periodicity and aperiodicity accuracy increase significantly. The previous pitch detector was prone to committing pitch doubling and pitch halving errors and was unable to detect pitch reliably in weakly periodic regions. Significant changes have reduced the error rate by 28.7%. The pitch detector is also able to detect accurately the pitch of the synthetic speech-like signals and to capture the jitter present in the signals.
Patent•
Systems and methods for speech recognition using frequency domain linear prediction polynomials to form temporal and spectral envelopes from frequency domain representations of signals

[...]

Marios Athineos1, Hynek Hermansky1, Daniel P. W. Ellis1•
Columbia University1
1 Dec 2004
TL;DR: In this article, a time-to-frequency domain transformation is performed on at least a portion of the received signal to generate a frequency domain representation, which is then converted from a time domain representation to the frequency domain.
Abstract: In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.
Book Chapter•10.1007/978-3-540-30548-4_68•
Noisy speech pitch detection based on mathematical morphology and weighted MACF

[...]

Xia Wang1, Hongmei Tang1, Xiaoqun Zhao2•
Hebei University of Technology1, Tongji University2
13 Dec 2004
TL;DR: Experiments show that the combination of these algorithms provides robust performance and makes better result in noisy speech pitch detection.
Abstract: In speech processing, pitch period is a very important characteristic parameter, but accurate pitch is not easy to be detected, especially in noisy environments, because speech signal is nonstationary and quasiperiodical This paper describes a new method based upon mathematical morphology and weighted modified autocorrelation function(MACF) Morphology is a nonlinear method which is based on set-theoretical algebra, we can form kinds of morphology filters using different structuring elements Weighted MACF modifies traditional autocorrelation method with reciprocal of AMDF Experiments show that the combination of these algorithms provides robust performance and makes better result in noisy speech pitch detection.
Patent•
Speaker recognition device

[...]

Sekine Naoki, Kakino Tomonari, Ikumi Tomonori, Yoshizaki Keisuke
2 Dec 2004
TL;DR: In this article, the authors proposed a frame trimming method to detect the presence or absence of pitch in a frame trimmed by the frame cutting means is detected by a pitch detection result.
Abstract: PROBLEM TO BE SOLVED: To precisely recognize a speaker and to easily calculate the distance to a speaker model. SOLUTION: In a frame trimming means 21, speech waveform data are successively trimmed to the frames with a frame width L at a shift interval T for outputting to a feature vector generating means 22. The presence or absence of pitch in a frame trimmed by the frame cutting means is detected by a pitch detection means 23. The frame trimming means changes the shift interval T and the frame width L according to the pitch detection result. More specifically, the shift interval T and the frame width L between sounds, where a sound frequency exists to a silent section, are changed to be shorter. A feature vector generating means converts the sound waveform data of each frame to a feature vector for outputting to a next-stage distance calculation means 24. The distance calculation means calculates the distance between a feature vector series and a speaker model stored in a speaker model storage means 25 for outputting to a next-stage recognition means 26. The recognition means compares the distance data from the distance calculation means and a preset threshold to recognize the speaker. COPYRIGHT: (C)2005,JPO&NCIPI
Book Chapter•10.1007/978-3-540-24768-5_44•
Speech Hiding Based on Auditory Wavelet

[...]

Li-Ran Shen1, Xueyao Li1, Huiqiang Wang1, Rubo Zhang1•
Harbin Engineering University1
14 May 2004
TL;DR: A novel method to embed secret speech into open speech is proposed, and it is shown that the method is strongly robust to many attacks such as compression, filter and so on.
Abstract: A novel method to embed secret speech into open speech is proposed. The secret speech is coded into binary parameter bits with Mix-Excitation Linear Prediction (MELP) algorithm, and the bits are used to form hiding information sequence. The open speech is automatically divided into voiced frames and unvoiced frame using auditory wavelet transform. One voice frame, the auditory wavelet transform was used to detect pitch, and the pitch is utilized to the current embedding position in open speech. The information hiding procedure is completed by modifying relevant wavelet coefficients. At the receiver, based on the same pitch detection method, the embedding position is found and the hiding bit is recovered. The secret speech can be received after MELP decoding. The experiments show that the method is strongly robust to many attacks such as compression, filter and so on.
Proceedings Article•10.1109/IROS.2004.1389406•
New wavelet-based pitch detection method for human-robot voice interface

[...]

T.H. Tran1, Quang Phuc Ha1, Gamini Dissanayake1•
University of Technology, Sydney1
28 Sep 2004
TL;DR: A new method for pitch detection based on the phase of the continuous wavelet transform is presented that can serve not only as an accurate pitch detector, but also can offer an efficient solution to the end-point detection problem.
Abstract: Speech activated interface between human and autonomous/semi-autonomous systems requires accurate voice detection and recognition. In this area, pitch and end-point detection is of vital importance. This paper presents a new method for pitch detection based on the phase of the continuous wavelet transform. The advantage of the proposed technique is that it can serve not only as an accurate pitch detector, but also can offer an efficient solution to the end-point detection problem. Experimental results are provided for the detection of pitch periods and end points in a neural-network based voice enabled wheelchair system.
Proceedings Article•10.1109/ICASSP.2004.1326033•
A model-based tone labeling method for Min-Nan/Taiwanese speech

[...]

Wei-Chih Kuo1, Yih-Ru Wang1, Sin-Horng Chen1•
National Chiao Tung University1
17 May 2004
TL;DR: A model-based tone labeling method for Min-Nan/Taiwanese speech is proposed that outperforms the VQ classification method which suffers from the interference resulting from neighboring syllables and from the global prosodic phrase patterns.
Abstract: A model-based tone labeling method for Min-Nan/Taiwanese speech is proposed. It takes the mean and shape of syllable pitch contours as two modeling units and considers some major affecting factors that control their variations. By using the EM algorithm to estimate all parameters of the pitch mean and shape models from a speech database, we can decide the best tone sequences pronounced in all utterances of the database. Experimental results show that it outperforms the VQ classification method which suffers from the interference resulting from neighboring syllables and from the global prosodic phrase patterns.
Journal Article•
A new approach of pitch detection based on morphology filter

[...]

Wu Rui1•
Hebei University of Technology1
01 Jan 2004-Journal of China Institute of Communications
TL;DR: The results show that the new method performance is better than the conventional autocorrelation algorithm and cepstrum method, especially in the part that the surd and the sonant are not evident.
Abstract: We propose a new method of pitch detection, which combined of morphological filter and conventional linear band pass filter. The results show that the new method performance is better than the conventional autocorrelation algorithm and cepstrum method, especially in the part that the surd and the sonant are not evident.
Proceedings Article•10.21437/INTERSPEECH.2004-603•
Implementation of an intonational quality assessment system for a handheld device.

[...]

Kisun You1, Hoyoun Kim1, Wonyong Sung1•
Seoul National University1
4 Oct 2004
TL;DR: The Viterbi algorithm is employed to conduct the forced alignments that indicate the boundary of each phonemes and a pitch detector is used to extract the intonational features of the segmented syllables.
Abstract: In this paper, we describe an implementation of an intonational quality assessment system for foreign language learning using a handheld portable device. The Viterbi algorithm is employed to conduct the forced alignments that indicate the boundary of each phonemes and a pitch detector is used to extract the intonational features. The tonal pitch type of the segmented syllables is classified and the tendency of the pitch movement is measured. Then, the score of the spoken sentence is generated based on this information. We have implemented this system on an ARM7 RISC processor based system. For real time operation, we applied fixed-point arithmetic to the signal processing kernels and rearranged the algorithm flow of the system. As a result, the system runs in real time on a 60MHz CPU clock frequency.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve