TL;DR: In this article, a speech coding method is proposed to reduce error propagation due to voice packet loss, which is achieved by limiting or reducing a pitch gain only for the first subframe or the first two subframes within a speech frame.
Abstract: A speech coding method of reducing error propagation due to voice packet loss, is achieved by limiting or reducing a pitch gain only for the first subframe or the first two subframes within a speech frame. The method is used for a voiced speech class. A pitch cycle length is compared to a subframe size to decide to reduce the pitch gain for the first subframe or the first two subframes within the frame. A strongly voiced class is decided by checking if the pitch lags are stable and the pitch gains are high enough with the frame; for the strongly voiced frame, the pitch lags and the pitch gains can be encoded more efficiently than other speech classes.
TL;DR: Experimental results show that the new model and the estimator lead to both improved pitch estimates and reconstruction quality, but also that the improvements in pitch are usually quite small, typically in the order of a few Hertz.
Abstract: Recently, parametric methods have proven capable of overcoming the problems of correlation-based methods for pitch estimation. However, the argument against such methods is that the underlying model is wrong, particularly for non-stationary signals, like speech. To investigate whether this is true, we propose a new, non-stationary harmonic chirp model for pitch estimation, and we derive an estimator for determining its parameters. Experimental results show that the new model and the estimator lead to both improved pitch estimates and reconstruction quality, but also that the improvements in pitch are usually quite small, typically in the order of a few Hertz.
TL;DR: Results for different speech signals show that this new method of pitch detection is the best in terms of speech quality and computational complexity.
Abstract: A new pitch detection scheme has been proposed based on the short-time autocorrelation function (ACF) and average magnitude difference function (AMDF). The performance of the proposed scheme has been evaluated, through simulation, in a complete speech analysis-synthesis system. For detection of pitch, local maxima of ACF and local minima of AMDF values are computed. To reduce computational complexity, the original speech signal is converted into a three level signal before computing ACF and AMDF. Synthesized speech quality, computational complexity and time taken during simulation are the parameters that have been considered while comparing this system with the analysis-synthesis systems that use autocorrelation, cepstrum and wavelet based pitch detection methods. Results for different speech signals show that this new method of pitch detection is the best in terms of speech quality and computational complexity.
TL;DR: A polyphonic pitch detection approach is presented, which is based on the iterative analysis of the autocorrelation function, and yields good results in the range of state of the art systems.
Abstract: In this paper, a polyphonic pitch detection approach is presented, which is based on the iterative analysis of the autocorrelation function. The idea of a two-channel front-end with periodicity estimation by using the autocorrelation is inspired by an algorithm from Tolonen and Karjalainen. However, the analysis of the periodicity in the summary autocorrelation function is enhanced with a more advanced iterative peak picking and pruning procedure. The proposed algorithm is compared to other systems in an evaluation with common data sets and yields good results in the range of state of the art systems.
TL;DR: In this paper, a pitch detection method for content-based music retrieval is presented, which comprises the following steps of: converting a music signal to a frequency domain by virtue of Fourier transform to calculate, carrying out the first step of pitch detection on the signal according to a harmonic peak value method to find five low-frequency harmonic peaks, carried out ascending sort according to the values of frequencies, then calculating the ratio among the frequencies, determining a group of pitch candidate sequences according to data measured by an experiment, then carrying out pitch detecting on the original music signal by a c
Abstract: The invention discloses a pitch detection method which researches aiming at the problem of a poor pitch detection technology in content-based music retrieval. The pitch detection method comprises the following steps of: converting a music signal to a frequency domain by virtue of Fourier transform to calculate, carrying out the first step of pitch detection on the signal according to a harmonic peak value method to find five low-frequency harmonic peaks, carrying out ascending sort according to the values of frequencies, then calculating the ratio among the frequencies, determining a group of pitch candidate sequences according to the data measured by an experiment, then carrying out pitch detection on the original music signal by a cepstrum method, combining the pitch sequences obtained by the two methods into a new pitch candidate sequence, finally finding a pitch corresponding to the lowest cost, that is, the standard pitch obtained by the method, by virtue of a confidence degree and viterbi optimal algorithm. The method disclosed by the invention is great in robustness and good in anti-noise performance.
TL;DR: A novel hybrid algorithm for blind source separation of three speech signals in a real room environment that exploits an information-theoretic approach, based on higher order statistics, to achieve source separation and is well suited for real-time implementation due to its fast adaptive methodology.
Abstract: In this paper we present a novel hybrid algorithm for blind source separation of three speech signals in a real room environment. The algorithm in addition to using second-order statistics also exploits an information-theoretic approach, based on higher order statistics, to achieve source separation and is well suited for real-time implementation due to its fast adaptive methodology. It does not require any prior information or parameter estimation. The algorithm also uses a novel post-separation speech harmonic alignment that results in an improved performance. Experimental results in simulated and real environments verify the effectiveness of the proposed method, and analysis demonstrates that the algorithm is computationally efficient.
TL;DR: Results of the evaluation show that the enhanced autocorrelation outperform other state-of-the-art features in case of the challenge data set, which lies in between real world data sets showing naturalistic emotional utterances, and the widely applied and well-understood acted emotional data sets.
Abstract: Multimodal emotion recognition in real world environments is still a challenging task of affective computing research. Recognizing the affective or physiological state of an individual is difficult for humans as well as for computer systems, and thus finding suitable discriminative features is the most promising approach in multimodal emotion recognition. In the literature numerous features have been developed or adapted from related signal processing tasks. But still, classifying emotional states in real world scenarios is difficult and the performance of automatic classifiers is rather limited. This is mainly due to the fact that emotional states can not be distinguished by a well defined set of discriminating features. In this work we present an enhanced autocorrelation feature as a multi pitch detection feature and compare its performance to feature well known, and state-of-the-art in signal and speech processing. Results of the evaluation show that the enhanced autocorrelation outperform other state-of-the-art features in case of the challenge data set. The complexity of this benchmark data set lies in between real world data sets showing naturalistic emotional utterances, and the widely applied and well-understood acted emotional data sets.
TL;DR: This paper describes the different algorithms for finding pitch markers in speech signal and it also explains how EEMD is better than EMD algorithm.
Abstract: In this paper we describes the different algorithms for finding pitch markers in speech signal and it also explain how EEMD is better than EMD algorithm One of the major problem in EMD algorithm is mode mixing. EEMD algorithm helps in solving mode mixing problem. EEMD algorithm is a noise assisted data analysis (NADA) for extracting pitch information for the speech signal. In EEMD signal is decomposed into intermediate functions called IMF. Using these IMFs, information regarding pitch markers can be evaluated. Keywords— EMD, EEMD, IMF, NADA.
TL;DR: Wavelet transform (WT) provides a way to explore the spectral characteristics of non-stationary speech signals and the tree structure of WP analysis can be customized to match the critical bands of human hearing giving better spectral estimation for speech signal than other methods.
Abstract: Wavelet transform (WT) provides a way to explore the spectral characteristics of non-stationary speech signals. Multiresolution analysis based on the wavelet theory permits the introduction of the concepts of signal filtering with different bandwidths or frequency resolutions. As both time and frequency analysis can be conducted by WT, the tree structure of WP analysis can be customized to match the critical bands of human hearing giving better spectral estimation for speech signal than other methods. Wavelet-based pitch estimation assumes that the glottis closures are correlated with the maxima in the adjacent scales of the WT. This approach ensures more accurate estimation of pitch period.
TL;DR: This work proposes an unsupervised method for obtaining multiple pitch tracks using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM, which achieves multi-pitch tracking and results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.
Abstract: Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error <= 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.
TL;DR: This paper introduces a novel melody extraction algorithm based on the Fan Chirp Transform that has the best performance in voicing detection, voicing false alarm, and overall accuracy and is capable of correcting outliers in pitch detection.
Abstract: Current melody extraction approaches perform poorly on the genre of opera [1, 2]. The singer’s formant is defined as a prominent spectral-envelope peak around 3 kHz found in the singing of professional Western opera singers [3]. In this paper we introduce a novel melody extraction algorithm based on this feature for opera signals. At the front end, it automatically detects the singer’s formant according to the Long-Term Average Spectrum (LTAS). This detection function is also applied to the short-term spectrum in each frame to determine the melody. The Fan Chirp Transform (FChT) [4] is used to compute pitch salience as its high time-frequency resolution overcomes th e difficulties introduced by vibrato. Subharmonic attenuation is adopted to handle octave errors which are comm on in opera vocals. We improve the FChT algorithm so that it is capable of correcting outliers in pitch detection. The performance of our method is compared to 5 state-ofthe-art melody extraction algorithms on a newly created dataset and parts of the ADC2004 dataset. Our algorithm achieves an accuracy of 87.5% in singer’s formant detection. In the evaluation of melody extraction, it has the best performance in voicing detection (91.6%), voicing false alarm (5.3%) and overall accuracy (82.3%).
TL;DR: A novel approach for the computation of a pitch salience function is presented which does not rely on energy but only on frequency location and is evaluated for a task of multiple-pitch estimation using the MAPS test-set.
Abstract: In this paper, a novel approach for the computation of a pitch salience function is presented. The aim of a pitch (considered here as synonym for fundamental frequency) salience function is to es- timate the relevance of the most salient musical pitches that are present in a certain audio excerpt. Such a function is used in nu- merous Music Information Retrieval (MIR) tasks such as pitch, multiple-pitch estimation, melody extraction and audio features computation (such as chroma or Pitch Class Profiles). In order to compute the salience of a pitch candidate f , the classical approach uses the weighted sum of the energy of the short time spectrum at its integer multiples frequencies hf. In the present work, we pro- pose a different approach which does not rely on energy but only on frequency location. For this, we first estimate the peaks of the short time spectrum. From the frequency location of these peaks, we evaluate the likelihood that each peak is an harmonic of a given fundamental frequency. The specificity of our method is to use as likelihood the deviation of the harmonic frequency locations from the pitch locations of the equal tempered scale. This is used to cre- ate a theoretical sequence of deviations which is then compared to an observed one. The proposed method is then evaluated for a task of multiple-pitch estimation using the MAPS test-set.
TL;DR: In this paper, the problem of extraction of pitch contour of singing voice in the context of the polyphonic recordings of ICM is addressed and novel algorithms developed in Fourier of Fourier Transform domain are addressed.
Abstract: In Indian Classical Music (ICM) singing voice is accompanied by continuous drone and percussive instruments. At the onsets of percussion, smooth pitch contour cannot be obtained by conventional pitch detection algorithms. In this paper, the problem of extraction of pitch contour of singing voice in the context of the polyphonic recordings of ICM is addressed. In this method, frames were classified as monophonic / polyphonic and harmonic / inharmonic using novel algorithms developed in Fourier of Fourier Transform domain. The pitch was estimated for those frames which were monophonic and harmonic and for those polyphonic frames where predominant melody was of singing voice. Estimation of pitch was done using Fourier of Fourier Transform doing parabolic interpolation to spectral peaks. The developed method is immune to octave errors and accuracy in pitch estimation is suitable for microtones in ICM.
TL;DR: In this paper, a pitch detection method based on morphological filtering is proposed, which can accurately locate the moment of glottal opening and closing through tracking mutation of instantaneous energy, so that variation of pitch period can be accurately tracked.
Abstract: A new method of pitch detection based on morphological filtering is proposed. Noisy speech signal is filtered by morphological filtering to remove the noise and highlight pitch, and then HHT is employed to get Hilbert-Huang spectrum and to calculate instantaneous energy and its derivative. The moment of glottal opening and closing can be accurately located through tracking mutation of instantaneous energy, so that variation of pitch period can be accurately tracked. Compared with other traditional method of pitch detection, this method not only truly describes non-stationary and non-linear characteristics of speech signal, but also it is an adaptive process for the analysis of the speech signal. The experiments showed that the method has strong anti-noise and can accurately detect the pitch of speech in low SNR.
TL;DR: In this article, the pitch detection is carried out on an audio file to acquire the pitch sequence of the audio file, the tonic of audio file is searched, and according to the pitch sequences, the mode detection is performed on the audio files to determine the classification of audio files.
Abstract: The embodiment of the invention provides an audio classifying method and device. The method comprises the steps that Pitch detection is carried out on an audio file to be classified, so as to acquire the Pitch sequence of the audio file; according to the Pitch sequence, the tonic of the audio file is searched; and according to the tonic of the audio file, mode detection is carried out on the audio file to determine the classification of the audio file. According to the invention, the classifying cost of the audio file can be reduced; the classifying efficiency is improved; and the intelligence is enhanced.
TL;DR: Experimental evaluation of the proposed PEA shows that it outperforms some of the existing PEAs for a wide range of SNRs.
Abstract: This paper presents an efficient pitch estimation algorithm (PEA) using dominant harmonic modification (DHM) and ensemble empirical mode decomposition (EEMD). The noisy speech is first low-pass filtered within the ranges of fundamental frequencies (50-500Hz) to obtain the pre-filtered signal (PFS). The pre-processed signal is then modified by enhancing its dominant harmonic and followed by the computation of the normalized autocorrelation function (NACF). Then, an EEMD based data adaptive time domain noise filtering is applied to the NACF. Finally, partial reconstruction is performed in the EEMD domain to determine the pitch period. Experimental evaluation of the proposed PEA shows that it outperforms some of the existing PEAs for a wide range of SNRs.
TL;DR: The experimental results of computer simulations on male and female voices in white noise perform that the gross pitch errors are lower in proposed method as compared to other related method in different types of signal to noise ratio conditions.
Abstract: This paper proposed a correlation based method using the autocorrelation function and the YIN. The autocorrelation function and also YIN is a popular measurement in estimating pitch in time domain. The performance of these two methods, however, is effected due to the position of dominant harmonics (usually the first formant) and the presence of spurious peaks introduced in noisy conditions. The experimental results of computer simulations on male and female voices in white noise perform that the gross pitch errors are lower in proposed method as compared to other related method in different types of signal to noise ratio conditions.
TL;DR: The goal of this paper is to investigate how these algorithms should be adapted to pitched musical instrument sounds analysis and to provide a comparative performance evaluation of the most representative state-of-the-art approaches.
Abstract: Pitch detection of an audio signal is an interesting research topic in the field of speech signal processing. Pitch is one of the most important perceptual features, as it conveys much information about the audio signal. It is closely related to the physical feature of fundamental frequency f0. For musical instrument sounds, the f0 and the measured pitch can be considered equivalent. In this paper four pitch detection algorithms have been proposed for pitched musical instrument sounds. The goal of this paper is to investigate how these algorithms should be adapted to pitched musical instrument sounds analysis and to provide a comparative performance evaluation of the most representative state-of-the-art approaches. This study is carried out on a large database of pitched musical instrument sounds, comprising four types of pitched musical instruments violin, trumpet, guitar and flute. The algorithmic performance is assessed according to the ability to estimate pitch contour accurately.
TL;DR: The improved pitch detection method combined with speech enhancement and the method of improved average magnitude difference function (AMDF) weighted autocorrelation function (ACF) is used for accurate pitch detection of the voiced.
Abstract: For poor robustness issues of pitch detection of noisy speech, the improved pitch detection method combined with speech enhancement is proposed in this paper. Firstly, in order to reduce background noise and receive the clean speech relatively, we use the multi-band spectral subtraction and the masking properties of human auditory system to work on the noisy speech, and next use the energy and zero-crossing rate's product, quotient to adjudge the voiced part. Finally, the method of improved average magnitude difference function (AMDF) weighted autocorrelation function (ACF) is used for accurate pitch detection of the voiced. Theoretical and experimental simulations show that, the method can detect pitch accurately in low SNR, and the robustness improved significantly.
TL;DR: In this method, a novel continuous correlation feature was employed for calculating pitch model that not only represents the harmonicity but also includes the information of spectral continuity, and hence improving the accuracy of the multi-pitch estimate.
Abstract: This paper proposed a new approach used for tracking multi-pith within one mixture speech signal. In this method, we employed a novel continuous correlation feature for calculating pitch model. This feature not only represents the harmonicity but also includes the information of spectral continuity, and hence improving the accuracy of the multi-pitch estimate. A DBNs and HMM hybrid model was further utilized to construct pitch models for determining pitch states and search for the best pitch state sequence. The new approach has been evaluated on mixture speech data and the results demonstrated its efficiency.
TL;DR: This thesis presents a new model for representing the spectral structure of polyphonic signals: Uniform MAx Gaussian Envelope (UMAGE), which precisely approximates the distribution of frequency parts in the spectrum while still being resilient to oscillating rapidly (noise).
TL;DR: In this article, three readily available pitch detection algorithms implemented as unit generators in the SuperCollider programming language are evaluated and compared with regard to their accuracy and latency for a variety of test signals consisting of both harmonic and non-harmonic content.
Abstract: Three readily-available pitch detection algorithms implemented as unit generators in the SuperCollider programming language are evaluated and compared with regard to their accuracy and latency for a variety of test signals consisting of both harmonic and non-harmonic content. Suggestions are made for the type of signal on which each algorithm performs well.
TL;DR: The results showed that the diagnosis made by the tool and specialist is equivalent and therefore the proposed use of the Pitch sustainment as a measure for the recognition of the pathology was effective.
Abstract: Objectives: Develop an automated tool for recognition of segments without the presence of voice during phonation of the patient based on Pitch sustainment. Method: The procedures for construction and verification of the technique are the acquisition of voice, windowing, application of Discrete Fourier Transform, the Pitch detection and verification of Pitch. Results: With the analysis of 101 voices, the tool diagnosed 56 voices with laryngeal dystonia and 45 as healthy. Already the specialist diagnosed 53 voices with laryngeal dystonia and 48 voices as healthy. Conclusion: The results showed that the diagnosis made by the tool and specialist is equivalent and therefore the proposed use of the Pitch sustainment as a measure for the recognition of the pathology was effective.
TL;DR: An improved multi-band summary correlogram (MBSC) algorithm is proposed for pitch estimation and voiced/unvoiced (V/UV) detection and the proposed pitch detection algorithm achieves a lower pitch detection error compared with the reference algorithm.
Abstract: This paper presents a speech enhancement approach based on analysis-synthesis framework. An improved multi-band summary correlogram (MBSC) algorithm is proposed for pitch estimation and voiced/unvoiced (V/UV) detection. The proposed pitch detection algorithm achieves a lower pitch detection error compared with the reference algorithm. The denoising autoencoder (DAE) is applied to enhance the line spectrum frequencies (LSFs). The reconstruction loss could be decreased compare with the swallow model. The proposed approach is evaluated using the perceptual evaluation of speech quality (PESQ) and the experimental results show that the proposed approach improves the performance of speech enhancement compared with the conventional speech enhancement approach. In addition, it could be applied to parametric speech coding even at low bit rate and low SNR environments.
TL;DR: Voice and speech feature extraction using advanced signal processing methodology is focused on and generated speech features are used to submit data mining algorithms for classifying deception.
Abstract: Discriminating between deceit and truth is a significant security challenge in a variety of situations, including border crossings, job interviews, flight passenger screenings, and police interviews. Previous research indicates that some features of vocal speech, e.g., fundamental frequency, are related to human emotion and stress levels making them applicable deception detection. This paper focuses on voice and speech feature extraction using advanced signal processing methodology. These generated speech features are used to submit data mining algorithms for classifying deception. The result of this paper is expected to be directly applied to the deception detection system.
TL;DR: Simulation results showed that frequency estimation performance of this algorithm on the whole frequency band is relatively stable, and the root mean square error( RMSE) of frequency estimation error is smaller relative to Rife algorithm, Quinn algorithm and energy centrobaric correction method.
Abstract: In order to improve the frequency estimation accuracy of sinusoidal signal with white Gaussian noise,a comprehensive sinusoidal frequency estimation algorithm combined autocorrelation detection with energy centrobaric correction method was proposed. Firstly the weak sinusoidal signal embedded in white Gaussian noise was detected by multiple autocorrelation to improve the signal to noise ratio. Then the power spectrum could be obtained by the Discrete Fourier Transform,and the signal frequency could be roughly estimated by searching the position of the maximum spectral line. Finally the sinusoidal signal frequency was accurately estimated by using energy centrobaric correction method for discrete spectrum. Simulation results showed that frequency estimation performance of this algorithm on the whole frequency band is relatively stable,and the root mean square error( RMSE) of frequency estimation error is smaller relative to Rife algorithm,Quinn algorithm and energy centrobaric correction method.The algorithm is easily implemented in hardware and also has some practical value for engineering.
TL;DR: In this article, a pitch detection method based on morphological filtering and Hilbert-Huang transform (HHT) was proposed, which can accurately detect pitch of speech signals in low SNR.
Abstract: The new method of pitch detection based on morphological filtering and Hilbert-Huang transform(HHT)was proposed.Noisy speech signals were filtered by the morphological filter to remove noises and highlight pitch,and then HHT was employed to get the Hilbert-Huang spectrum and calculate instantaneous energy and its derivative.The moment of glottal opening and closing can be located accurately through mutation of instantaneous energy,so variation of pitch periods can be tracked accurately.Compared with other traditional methods of pitch detection,the proposed method truely describes the non-stationary and non-linear characteristics of speech signals,and its analysis on voice signals is an adaptive process.The experimental results show that the method gives strong noise immunity and can accurately detect pitch of speech signals in low SNR.
TL;DR: Simulation results indicate this algorithm can effectively improve the precision of pitch detection and convert speech into desired one and a real-time system is implemented on DSP, which can produce desired fundamental frequency and duration.
Abstract: In this paper, we present an algorithm for voice speed changing and pitch shifting based on TD-PSOLA. MATLAB simulation results indicate this algorithm can effectively improve the precision of pitch detection and convert speech into desired one. After the algorithm being realized and optimized, a real-time system is implemented on DSP, which can produce desired fundamental frequency and duration.
TL;DR: In this paper, an exemplar-based sparse representation (SR) classifier was proposed for human pitch detection, automatic speech recognition, and birdsong phrase classification, which achieved good performance with only 7 training images per subject.
Abstract: This dissertation focuses on algorithms for robust speech and bird song processing. Many applications perform well under ideal signal conditions, e.g. noise-free, full bandwidth, sufficient training data. However, a large degradation in performance is generally observed when the input signal condition deviates from these ideal conditions. This dissertation describes robust algorithms for three applications, namely human-pitch detection, automatic speech recognition, and birdsong phrase classification. In the first application, a noise-robust, multi-band summary correlogram (MBSC)-based pitch detector is proposed. Novel signal processing schemes, which include comb-filter channel selection and subband reliability weighting, are designed to enhance the MBSC's peak at the most likely pitch period.In the second application, a feature enhancement scheme using jointly-sparse reference and estimated soft-mask representations, is developed for noise-robust automatic speech recognition (ASR). Reference and estimated soft-mask exemplar-pairs are extracted from clean and noisy utterance-pairs in the training data. Using a sparsity-based dictionary learning algorithm, dictionary representations are trained from the exemplar-pairs. The sparse linear combination of estimated soft-mask dictionary representations that best approximates the test utterance's estimated soft-mask is applied to the reference soft-mask dictionary to produce an enhanced soft-mask. This enhanced soft-mask is then used to perform noise suppression on the spectrogram from which features for ASR are extracted.In the third application, a simple exemplar-based sparse representation (SR) classifier is evaluated on limited data for birdsong phrase classification and verification. Song recordings of the Cassin's Vireo are used for performance evaluation. This study of the SR classifier for bird phrase classification is inspired by a paper that proposed the SR classifier for face recognition and outlier face detection, and reported good performance with only 7 training images per subject. Algorithmic enhancements are subsequently added to the original SR classification framework to improve the classification accuracy of automatically detected and segmented phrases, and phrases sang by bird individuals that are not found in the training set. These algorithmic enhancements include dynamic time warping (DTW) and frame-based feature normalization prior to SR classification. When the class decisions from DTW and first pass SR classification are different, SR classification is repeated with frequency-bin-normalized spectrographic features to resolve the two conflicting decisions.
TL;DR: In this article, a pitch detection method for the secondary spectrum of noisy speech was designed, the noisy speech oval (Elliptic Filter, EF) band-pass filter is designed first in this method, and then the experience mode decomposition (EMD) of Hilbert-Huang transform (HHT) is used to decompose the signal into a finite number of intrinsic mode functions (IMF), and IMF components of different scales are associated with the decomposition of the signal before calculation, the maximum of two modes associated synthetic pitch signal detection is taken.
Abstract: A new method for pitch detection of secondary spectrum is designed in the paper, the noisy speech oval (Elliptic Filter, EF) band-pass filter is designed first in this method, and then the experience mode Decomposition(EMD)of Hilbert-Huang transform (HHT) is used to decompose the signal into a finite number of intrinsic mode functions (IMF), and IMF components of different scales are associated with the decomposition of the signal before calculation, the maximum of two modes associated (IMF) synthetic pitch signal detection is taken. Experimental results show that the method could be better than the traditional autocorrelation method, and cepstrum method has better results, especially with voicing obvious segment features, there is better performance of pitch detection in noisy speech, signal to noise ratio(SNR) also has good robustness in the lower sound environment. http://dx.doi.org/10.11591/telkomnika.v12i12.6482