Open Access
Learning Prosodic Sequences Using the Fundamental Frequency Variation Spectrum
Kornel Laskowski,Jens Edlund,Mattias Heldner +2 more
- 01 Jan 2008
- pp 151-154
TL;DR: A recently introduced vector-valued representation of fundamental frequency variation, whose properties appear to be well-suited for statistical sequence modeling, is investigated, and hidden Markov models are applied to learn prosodic sequences characteristic of higher-level turn-taking phenomena.
read more
Abstract: We investigate a recently introduced vector-valued representation of fundamental frequency variation, whose properties appear to be well-suited for statistical sequence modeling We show what the representation looks like, and apply hidden Markov models to learn prosodic sequences characteristic of higher-level turn-taking phenomena Our analysis shows that the models learn exactly those characteristics which have been reported for the phenomena in the literature Further refinements to the representation lead to a 12-17% relative improvement in speaker change prediction for conversational spoken dialogue systems
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Synthetic speech detection using fundamental frequency variation and spectral features
TL;DR: This paper proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV), which outperforms all existing baseline features for both known and unknown attacks.
66
•Proceedings Article
A Snack Implementation and Tcl/Tk Interface to the Fundamental Frequency Variation Spectrum Algorithm
Kornel Laskowski,Jens Edlund +1 more
- 01 May 2010
TL;DR: This work presents a freely available implementation of an alternative to pitch estimation, namely the computation of the fundamental frequency variation (FFV) spectrum, which can be easily employed at any level within a speech processing system.
Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum
Kornel Laskowski,Qin Jin +1 more
- 19 Apr 2009
TL;DR: This work explores a new frame-level vector representation of the instantaneous change in fundamental frequency, known as fundamental frequency variation (FFV), and indicates that FFV features contain useful information for discriminating among speakers, and that model-space combination of FFV and cepstral features outperforms cepStral features alone.
Improving deep neural network acoustic modeling for audio corpus indexing under The IARPA Babel program
Xiaodong Cui,Brian Kingsbury,Jia Cui,Bhuvana Ramabhadran,Andrew Rosenberg,Mohammad Sadegh Rasooli,Owen Rambow,Nizar Habash,Vaibhava Goel +8 more
- 14 Sep 2014
TL;DR: Experimental results on development languages of Babel option period one show that the improved DNN acoustic models can reduce word error rates in ASR and also help the keyword search performance compared to already competitive DNN baseline systems.
14
Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems
TL;DR: This work continues the analysis of the fundamental frequency variation spectrum, a recently proposed instantaneous, continuous, vector‐valued representation of pitch variation, which is obtained by comparing the harmonic structure of the frequency magnitude spectra of the left and right half of an analysis frame.
References
YIN, a fundamental frequency estimator for speech and music
TL;DR: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds, based on the well-known autocorrelation method with a number of modifications that combine to prevent errors.
•Proceedings Article
Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody.
Luciana Ferrer,Elizabeth Shriberg,Andreas Stolcke +2 more
- 01 Jan 2002
TL;DR: This work develops a new approach to EOU detection that uses prosodic features, modeled by decision trees and combined with an event N-gram language model to obtain a score that measures the likelihood that any nonspeech region is an EOU.
123
Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing
E. Shriberg,Andreas Stolcke +1 more
- 01 Jan 2004
TL;DR: This work describes a “direct modeling” approach to using prosody in various speech technology tasks that does not involve any hand-labeling or modeling of prosodic events such as pitch accents or boundary tones, and focuses on spontaneous speech from a variety of contexts.
70
Real-time Handling of Fragmented Utterances
Linda Bell,Johan Boye,Joakim Gustafson +2 more
- 01 Jan 2001
TL;DR: An adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system is discussed, which presents the following problem: inserted silent pauses between fragments.
An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems
Kornel Laskowski,Jens Edlund,Mattias Heldner +2 more
- 12 May 2008
TL;DR: This work focuses on system responsiveness, aiming to mimic human-like dialogue flow control by predicting speaker changes as observed in real human-human conversations, and derives an instantaneous vector representation of pitch variation which is amenable to standard acoustic modeling techniques.