Frequency Modulation Technique for Prosodic Modification
Jinfu Ni,S. Sakai,T. Shimizu,Satoshi Nakamura +3 more
- 30 Dec 2008
- pp 1-4
TL;DR: This technique provides a mathematical formulation for representing speaking tone and manipulating FM in a unified framework for communicative speech synthesis and results indicated that the native speakers identified 90% of samples with emphases and 78% of "good news" as well as 94% of bad news samples.
read more
Abstract: Modulation of speaking tone in frequency can make speech interesting and convey subtle meaning in communication. We present a frequency modulation (FM) technique for prosodic modification to consider communicative speech synthesis. This technique provides a mathematical formulation for representing speaking tone and manipulating FM in a unified framework. Two experiments are conducted with a text-to-speech system to which a module of FM-based prosodic modification is added. One is to enhance emphasis in words when synthesizing Chinese conversational speech. The other is to modify reading- style prosody while conveying good and bad news in Japanese; this is done by using the FM technique to shift the frequency ranges and rescale the fundamental frequency contours jointly. The experimental results indicated that the native speakers identified 90% of samples with emphases and 78% of "good news" as well as 94% of "bad news" samples. The FM technique is vital for making synthetic speech communicative.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 4: Illustration of enhancing emphasis in words with frequency modulation technique for rising and lowering tones. 
Figure 3: Schematic diagram of the basic patterns defined by tags baseline (line AB), cap (CDF/CDEF) and toend (line GH). 
Figure 2: Schematic diagram of performing prosodic modification within the framework of TTS system XIMERA. 
Figure 5: Mean opinion scores (the crosses) and standard deviations (the boxes) on a 7-point scale, – 3 (very good “bad news”), 0 (neutral), and +3 (very good “good news”). ![Figure 1: Resonance curve A(λ, ζ) (the left panel) and the warping functions between normalized logF0 ∈ [0, 1] and λ ∈ [1, 2] at several values of ζ (the right panel).](/figures/figure-1-resonance-curve-a-l-z-the-left-panel-and-the-1z6pail8.png)
Figure 1: Resonance curve A(λ, ζ) (the left panel) and the warping functions between normalized logF0 ∈ [0, 1] and λ ∈ [1, 2] at several values of ζ (the right panel).
Citations
Conversational Speech Synthesis (and the need for some laughter)
Nick Campbell
- 12 May 2005
TL;DR: This article reported progress in the synthesis of conversational speech from the viewpoint of work carried out on the analysis of a very large corpus of expressive speech in normal everyday situations, and suggested that this problem may be solved by the use of phrase-sized utterance units taken intact from a large corpus.
6
Prosody Modeling from Tone to Intonation in Chinese using a Functional F0 Model
Jinfu Ni,S. Sakai,T. Shimizu,Satoshi Nakamura +3 more
- 15 Dec 2008
TL;DR: This paper analyzes tonal patterns as sparse target points (tonal F0 peaks and valleys) and model them using classification and regression trees (CART) with contextual linguistic features to form the final F0 contours based on a functional F0 model.
4
Towards a Prosodic Model for Synthesized Speech of Mathematical Expressions in MathML
Adriana Silva Souza,Diamantino Freitas +1 more
- 02 Dec 2020
TL;DR: In this article, a model to improve prosody in the synthesized speech of mathematical expressions based on MathML is presented, where the Fujisaki intonation model is adopted for intonATION control, accent and phrase commands have been extracted from the corpus, and some adjustments have been made to manipulate prosodic parameters in the speech in correlation with the MathML tree; additionally, a pattern of pauses control is being created.
3
CART-based modeling of Chinese tonal patterns with a functional model tracing the fundamental frequency trajectories
Jinfu Ni,Shinsuke Sakai,Tohru Shimizu,Satoshi Nakamura +3 more
- 19 Apr 2009
TL;DR: The most important roles in characterizing tonal patterns were played by a few linguistic features such as lexical tone context and the distinction between voiced from unvoiced initials.
1
Hyperbolic structure of fundamental frequency contour
Jinfu Ni,Shinsuke Sakai,Hisashi Kawai,Satoshi Nakamura +3 more
- 03 Dec 2009
TL;DR: This paper achieves a generalized hyperbolic structure so as to aggressively manipulate F0 contours and proves an equivalent expression of the resonance mechanism capable for dealing with the interaction of tone and intonation.
References
The ATR Multilingual Speech-to-Speech Translation System
Satoshi Nakamura,Konstantin Markov,Hiromi Nakaiwa,Genichiro Kikui,Hisashi Kawai,Takatoshi Jitsuhiro,Jinsong Zhang,H. Yamamoto,Eiichiro Sumita,Seiichi Yamamoto +9 more
TL;DR: The ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages, uses a parallel multilingual database consisting of over 600 000 sentences that cover a broad range of travel-related conversations.
Conversational speech synthesis and the need for some laughter
TL;DR: The problem of expressing paralinguistic information in conversational speech may be solved by the use of phrase-sized utterance units taken intact from a large corpus, the complexity of which may be beyond the capabilities of many current synthesis methods.
55
Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin
Jinfu Ni,Keikichi Hirose +1 more
TL;DR: Analysis of 1044 utterances of various sentences read by eight native speakers revealed that the model could closely approximate the observed F 0 contours with a small number of parameters, which are localized and suited to a data-driven fitting process.
28
Constrained tone transformation technique for separation and combination of Mandarin tone and intonation.
TL;DR: The underlying scientific and linguistic principles are explained and the method's capability of separating and combining tone and intonation is evaluated through analysis and re-synthesis of several hundred observed F0 contours.
24
Generation and perception of F0 markedness for communicative speech synthesis
TL;DR: A computational model of conversational F 0 control is proposed using lexical information of adjectives showing positiveness or negativeness and adverbs expressing markedness, which shows strong positive or negative correlation between the markedness of adverbs and F 0 height.
23