Learning Prosodic Sequences Using the Fundamental Frequency Variation Spectrum

Open Access

Learning Prosodic Sequences Using the Fundamental Frequency Variation Spectrum

- 01 Jan 2008

- pp 151-154

11

TL;DR: A recently introduced vector-valued representation of fundamental frequency variation, whose properties appear to be well-suited for statistical sequence modeling, is investigated, and hidden Markov models are applied to learn prosodic sequences characteristic of higher-level turn-taking phenomena.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1016/J.CSL.2017.10.001

Synthetic speech detection using fundamental frequency variation and spectral features

Monisankha Pal, +2 more

- 01 Mar 2018

- Computer Speech & Language

TL;DR: This paper proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV), which outperforms all existing baseline features for both known and unknown attacks.

...read moreread less

66

•Proceedings Article

A Snack Implementation and Tcl/Tk Interface to the Fundamental Frequency Variation Spectrum Algorithm

Kornel Laskowski, +1 more

- 01 May 2010

TL;DR: This work presents a freely available implementation of an alternative to pitch estimation, namely the computation of the fundamental frequency variation (FFV) spectrum, which can be easily employed at any level within a speech processing system.

...read moreread less

21

•Proceedings Article•10.1109/ICASSP.2009.4960640

Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum

Kornel Laskowski, +1 more

- 19 Apr 2009

TL;DR: This work explores a new frame-level vector representation of the instantaneous change in fundamental frequency, known as fundamental frequency variation (FFV), and indicates that FFV features contain useful information for discriminating among speakers, and that model-space combination of FFV and cepstral features outperforms cepStral features alone.

...read moreread less

21

Proceedings Article•10.21437/INTERSPEECH.2014-477

Improving deep neural network acoustic modeling for audio corpus indexing under The IARPA Babel program

Xiaodong Cui, +8 more

- 14 Sep 2014

TL;DR: Experimental results on development languages of Babel option period one show that the improved DNN acoustic models can reduce word error rates in ASR and also help the keyword search performance compared to already competitive DNN baseline systems.

...read moreread less

14

•Journal Article•10.1121/1.2934193

Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems

Kornel Laskowski, +3 more

- 09 May 2008

- Journal of the Acoustical Society of Ame...

TL;DR: This work continues the analysis of the fundamental frequency variation spectrum, a recently proposed instantaneous, continuous, vector‐valued representation of pitch variation, which is obtained by comparing the harmonic structure of the frequency magnitude spectra of the left and right half of an analysis frame.

...read moreread less

11

References

Journal Article•10.1121/1.1458024

YIN, a fundamental frequency estimator for speech and music

Alain de Cheveigné, +1 more

- 03 Apr 2002

- Journal of the Acoustical Society of Ame...

TL;DR: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds, based on the well-known autocorrelation method with a number of modifications that combine to prevent errors.

...read moreread less

2.2K

•Proceedings Article

Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody.

Luciana Ferrer, +2 more

- 01 Jan 2002

TL;DR: This work develops a new approach to EOU detection that uses prosodic features, modeled by decision trees and combined with an event N-gram language model to obtain a score that measures the likelihood that any nonspeech region is an EOU.

...read moreread less

123

Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing

E. Shriberg, +1 more

- 01 Jan 2004

TL;DR: This work describes a “direct modeling” approach to using prosody in various speech technology tasks that does not involve any hand-labeling or modeling of prosodic events such as pitch accents or boundary tones, and focuses on spontaneous speech from a variety of contexts.

...read moreread less

70

Real-time Handling of Fragmented Utterances

Linda Bell, +2 more

- 01 Jan 2001

TL;DR: An adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system is discussed, which presents the following problem: inserted silent pauses between fragments.

...read moreread less

55

•Proceedings Article•10.1109/ICASSP.2008.4518791

An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems

Kornel Laskowski, +2 more

- 12 May 2008

TL;DR: This work focuses on system responsiveness, aiming to mimic human-like dialogue flow control by predicting speaker changes as observed in real human-human conversations, and derives an instantaneous vector representation of pitch variation which is amenable to standard acoustic modeling techniques.

...read moreread less

33