TL;DR: In this article, a model of voiced-sound generation is derived in which the detailed acoustic behavior of the human vocal cords and the vocal tract is computed, and the cord-tract system is programmed for interactive study on a DDP-516 computer.
Abstract: A model of voiced-sound generation is derived in which the detailed acoustic behavior of the human vocal cords and the vocal tract is computed. The vocal cords are approximated by a self-oscillating source composed of two stiffness-coupled masses. The vocal tract is represented as a bilateral transmission line. One-dimensional Bernoulli flow through the vocal cords and plane-wave propagation in the tract are used to establish acoustic factors dominant in the generation of voiced speech. A difference-equation description of the continuous system is derived, and the cord-tract system is programmed for interactive study on a DDP-516 computer. Sampled waveforms are calculated for: acoustic volume velocity through the cord opening (glottis); glottal area; and mouth-output sound pressure. Functional relations between fundamental voice frequency, subglottal (lung) pressure, cord tension, glottal area, and duty ratio of cord vibration are also determined. Results show that the two-mass model duplicates principal features of cord behavior in the human. The variation of fundamental frequency with subglottal pressure is found to be 2 to 3 Hz/cm H 2 O, and is essentially independent of vowel configuration in the programmed tract. Acoustic interaction between tract eigenfrequencies and glottal volume flow is strong. Phase difference in motion of the cord edges is in the range of 0 to 60 degrees, and control of cord tension leads to behavior analogous to chest/falsetto conditions in the human. Phonation-neutral, or rest area of cord opening, is shown to be a critical factor in establishing self-oscillation. Finally, the complete synthesis system suggests an efficient, physiological description of the speech signal, namely, in terms of subglottal pressure, cord tension, rest area of cord opening, and vocal-tract shape.
TL;DR: It is shown that vocal tract inertance reduces the oscillation threshold pressure, whereas vocal tract resistance increases it, and the treatment is harmonized with former treatments based on two-mass models and collapsible tubes.
Abstract: A theory of vocal fold oscillation is developed on the basis of the body‐cover hypothesis. The cover is represented by a distributed surface layer that can propagate a mucosal surface wave. Linearization of the surface‐wave displacement and velocity, and further small‐amplitude approximations, yields closed‐form expressions for conditions of oscillation. The theory predicts that the lung pressure required to sustain oscillation, i.e., the oscillation threshold pressure, is reduced by reducing the mucosal wave velocity, by bringing the vocal folds closer together and by reducing the convergence angle in the glottis. The effect of vocal tract acoustic loading is included. It is shown that vocal tract inertance reduces the oscillation threshold pressure, whereas vocal tract resistance increases it. The treatment, which is applicable to falsetto and breathy voice, as well as onset or release of phonation in the absence of vocal fold collision, is harmonized with former treatments based on two‐mass models and ...
TL;DR: Findings have implications for speech recognition, speech forensics, and the evolution of the human speech production system, and provide a normative standard for future studies of human vocal tract morphology and development.
TL;DR: Comparison is drawn between male and female larynges on the basis of overall size, vocal fold membranous length, elastic properties of tissue, and prephonatory glottal shape and the simulated vocal fold contact area is used to infer male-female differences in the shape of the glottis.
Abstract: Comparison is drawn between male and female larynges on the basis of overall size, vocal fold membranous length, elastic properties of tissue, and prephonatory glottal shape. Two scale factors are proposed that are useful for explaining differences in fundamental frequency, sound power, mean airflow, and glottal efficiency. Fundamental frequency is scaled primarily according to the membranous length of the vocal folds (scale factor of 1.6), whereas mean airflow, sound power, glottal efficiency, and amplitude of vibration include another scale factor (1.2) that relates to overall larynx size. Some explanations are given for observed sex differences in glottographic waveforms. In particular, the simulated (computer-modeled) vocal fold contact area is used to infer male-female differences in the shape of the glottis. The female glottis appears to converge more linearly (from bottom to top) than the male glottis, primarily because of medial surface bulging of the male vocal folds.
TL;DR: A new voice source model that accounted for certain physiological aspects of vocal fold motion was developed and tested using speech synthesis, and applications include synthesis of natural sounding speech, synthesis and modeling of vocal disorders, and the development of speaker independent (or adaptive) speech recognition systems.
Abstract: The purpose of this study was to examine several factors of vocal quality that might be affected by changes in vocal fold vibratory patterns. Four voice types were examined: modal, vocal fry, falsetto, and breathy. Three categories of analysis techniques were developed to extract source‐related features from speech and electroglottographic (EGG) signals. Four factors were found to be important for characterizing the glottal excitations for the four voice types: the glottal pulse width, the glottal pulse skewness, the abruptness of glottal closure, and the turbulent noise component. The significance of these factors for voice synthesis was studied and a new voice source model that accounted for certain physiological aspects of vocal fold motion was developed and tested using speech synthesis. Perceptual listening tests were conducted to evaluate the auditory effects of the source model parameters upon synthesized speech. The effects of the spectral slope of the source excitation, the shape of the glottal excitation pulse, and the characteristics of the turbulent noise source were considered. Applications for these research results include synthesis of natural sounding speech, synthesis and modeling of vocal disorders, and the development of speaker independent (or adaptive) speech recognition systems.