Diphone

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1109/ICSLP.1996.607874•

The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes

[...]

Thierry Dutoit, Vincent Pagel¹, Nicolas Pierret, F. Bataille, O. van der Vrecken - Show less +1 more•Institutions (1)

Faculté polytechnique de Mons¹

3 Oct 1996

TL;DR: The MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications.

...read moreread less

Abstract: The aim of the MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. The ultimate goal is to boost academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges taken up by text-to-speech synthesizers for the years to come. Central to the MBROLA project is MBROLA 2.00, a speech synthesizer based on the concatenation of diphones. Executable files of this synthesizer have been made freely available for many computers/operating systems, as well as a first diphone database for a French male voice. We describe the terms of participation to the project, as a user, as an associated developer, or as a database provider.

...read moreread less

531 citations

Patent•10.1121/1.413554•

Method and apparatus for speech synthesis based on prosodic analysis

[...]

Sandra E. Hutchins

23 Sep 1992-Journal of the Acoustical Society of America

TL;DR: A system for synthesizing a speech signal from strings of words, which are themselves strings of characters, includes a memory in which predetermined syntax tags are stored in association with entered words and phonetic transcriptions are storedIn association with the syntax tags.

...read moreread less

Abstract: A system for synthesizing a speech signal from strings of words, which are themselves strings of characters, includes a memory in which predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags. A parser accesses the memory and groups the syntax tags of the entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another. The parser also verifies the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another. The system retrieves the phonetic transcriptions associated with the syntax tags that were grouped into phrases conforming to the second set of rules, and also translates predetermined strings of characters into words. The system generates strings of phonetic transcriptions and prosody markers corresponding to respective strings of the words, and adds markers for rhythm and stress to the strings, which are then converted into data arrays having prosody information on a diphone-by-diphone basis. Predetermined diphone waveforms are retrieved from memory that correspond to the entered words, and these retrieved waveforms are adjusted based on the prosody information in the arrays. The adjusted diphone waveforms, which may also be adjusted for coarticulation, are then concatenated to form the speech signal. Methods in a digital computer are also disclosed.

...read moreread less

318 citations

Book•

Talking Machines: Theories, Models, and Designs

[...]

Esca Tutorial Day, Gérard Bailly, Christian Benoît, Thomas R. Sawallis

1 Apr 1992

TL;DR: This work presents a meta-modelling framework for morphological and syntactic analysis and its application in a text-to-speech system for German, and develops a structured lexicon for synthesis of prosody.

...read moreread less

Abstract: Synthesis Methods. Speech synthesis methods: Homage to Dennis Klatt (K.N. Stevens). Synthesis models: A discussion (E. Moulines). From data to rules (L.F.M. ten Bosch). On modelling the phonology phonetics interface for articulatory synthesis (H.J. Cedergren, G. Boulianne, D. Archambault). "Synthesis-by-rule" without segments or rewrite-rules (J. Coleman). A generation model of formant trajectories at variable speaking rates (S. Imaizumi, S. Kiritani). Spectral transitions in rule-based and diphone synthesis (D. O'Shaughnessy). On the basic scheme and algorithms in non-uniform speech synthesis (K. Takeda, K. Abe, Y. Sagisaka). Linguistic Processing. Text processing for text-to-speech synthesis (D. O'Shaughnessy). Prosodic phrasing in Swedish speech synthesis (G. Bruce, B. Granstrom, D. House). Syntactic neural networks for bi-directional text-phonetics translation (S.M. Lucas, R.I. Damper). Heuristic strategies for the higher-level analysis of unrestricted text (A.I.C. Monaghan). A framework for morphological and syntactic analysis and its application in a text-to-speech system for German (T. Russi). Novel-word pronunciation within a text-to-speech system (K.P.H. Sullivan, R.I. Damper). Prosody. Prediction of prosody: An overview (D. Hirst). A comment on the prediction of prosody (R. Collier). Automatic Learning. Syllable-based segmental duration (W.N. Campbell). Prosodic processing in a text-to-speech synthesis system using a database and learning procedures (F. Emerard, L. Mortamet, A. Cozannet). Linguistic properties in the control of segmental duration for speech synthesis (N. Kaiki, K. Takeda, Y. Sagisaka). Tree-based modelling of segmental durations (M.D. Riley). Deriving text-to-speech durations from natural speech (J.P.H. van Santen). F 0 generation with a database of natural F 0 patterns and with a neural network (C. Traber). Methodology and Models. Developing a structured lexicon for synthesis of prosody (V. Auberge). Automatic labelling of large prosodic databases: Tools, methodology and links with a text-to-speech system (G. Bailly, T. Barbe, H. Wang). A prosodic model for French text-to-speech synthesis: A psycholinguistic approach (V. Pasdeloup). Prosody and Discourse. The punctuation and perception of read and spontaneous prosody: An application to speech synthesis (I. Guaitella, S. Santi). Using discourse context to guide pitch accent decisions in synthetic speech (J. Hirschberg). System Design. Rule compilers and text-to-speech systems (G. Bailly). The MULTIVOX multilingual text-to-speech converter (G. Olaszy, G. Gordos, G. Nemeth). Concept representation for synthetic speech output system (Y. Yamashita, N. Mizutani, R. Mizoguchi). Speech Assessment. Assessment of synthetic speech (A. Fourcin). On the assessment of synthetic speech (C. Benoit, L.C.W. Pols). Segmental evaluation using the Esprit/SAM test procedures and monosyllabic words (R. Carlson, B. Granstrom, L. Nord). Segmental quality assessment by pseudo-words (A.

...read moreread less

241 citations

Patent•10.1121/1.418194•

Processing device for speech synthesis by addition overlapping of wave forms

[...]

Christian Hamon

01 Sep 1989-Journal of the Acoustical Society of America

TL;DR: In this article, a process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprises supplying a sequence of phoneme codes and respective prosodic information.

...read moreread less

Abstract: A process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprises supplying a sequence of phoneme codes and respective prosodic information, and, for each phoneme, analyzing and synthesizing each phoneme, and then concatenating the synthesized phonemes. For each phoneme, two diphones are selected among the stored diphones and the presence of voicing is determined. For voiced phonemes, the respective waveforms of the two diphones constituting the phoneme are filtered by a window which is centered on a point of the selected waveform representative of the beginning of a pulse response of vocal cords to excitation thereof. The window has a width substantially equal to twice the greater of the original fundamental period and the fundamental synthesis period and has an amplitude progressively decreasing from the center of the window. The signals resulting from the filtering and obtained for each diphone are time shifted so as to be spaced apart by a time equal to the fundamental synthesis period. Synthesis is achieved by adding the displaced overlapping signals.

...read moreread less

222 citations

Patent•10.1121/1.417271•

Speech recognition method and system using triphones, diphones, and phonemes

[...]

Jie Yi¹•Institutions (1)

Oki Electric Industry¹

21 Dec 1992-Journal of the Acoustical Society of America

TL;DR: This article concatenated the triphone, diphone, and phoneme models of a target vocabulary, using triphone models if available, Diphone HMMs when triphone HMMs are not available, and PHE models when neither triphone nor DiphONE models are available.

...read moreread less

Abstract: A speech recognition system starts by training hidden Markov models for all triphones, diphones, and phonemes occurring in a small training vocabulary. Hidden Markov models of a target vocabulary are created by concatenating the triphone, diphone, and phoneme models, using triphone models if available, diphone HMMs when triphone models are not available, and phoneme models when neither triphone nor diphone models are available. Utterances from the target vocabulary are recognized by choosing a model with maximum probability of reproducing quantized utterance features.

...read moreread less

217 citations

...

Expand

Year	Papers
2021	1
2020	1
2019	6
2018	3
2017	2
2016	7

Topic Tools

Papers published on a yearly basis

Papers

The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes

Method and apparatus for speech synthesis based on prosodic analysis

Talking Machines: Theories, Models, and Designs

Processing device for speech synthesis by addition overlapping of wave forms

Speech recognition method and system using triphones, diphones, and phonemes

Related Topics (5)

Performance Metrics