TL;DR: A model that generates on-off speech patterns representative of those in experimental two-way telephone conversations that yields good fits to all events except “speech before interruption;” when an interruption occurs, a model speaker tends to interrupt the other's talkspurt later than a real speaker does.
Abstract: This paper describes a model that generates on-off speech patterns representative of those in experimental two-way telephone conversations. The model assumes a conversant to occupy one of three speaking or one of three silent states. Transitions among the states arc determined by Poisson processes governed by six parameters (one for each state). The validity of the model is tested by comparing the model computer simulation of 16 conversations with 16 real conversations. Cumulative distribution functions are compared for ten events (such as talkspurts, pauses, mutual silences, and so on) defined on the speech patterns. The model yields good fits to all events except “speech before interruption;” when an interruption occurs, a model speaker tends to interrupt the other's talkspurt later than a real speaker does. Theoretical behavior of the model is also studied. All events consist of concatenations of exponentially distributed “state durations,” even though most events are not themselves exponential. For some purposes, the exponential distribution is a satisfactory empirical fit to talkspurts, but not to pauses. Possible applications of the model include studying people's motivations to talk and fall silent on different circuits, and predicting statistical behavior of voice operated devices on the circuits.
TL;DR: Equilibrium point analysis is used to evaluate system behavior in a packet reservation multiple access (PRMA) protocol based network and the probability of packet dropping given the number of simultaneous conversations is derived.
Abstract: Equilibrium point analysis is used to evaluate system behavior in a packet reservation multiple access (PRMA) protocol based network. The authors derive the probability of packet dropping given the number of simultaneous conversations. The authors establish conditions for system stability and efficiency. Numerical calculations based on the theory show close agreement with computer simulations. They also provide valuable guides to system design. Because PRMA is a statistical multiplexer, the channel becomes congested when too many terminals are active. For a particular example it is shown that speech activity detection permits 37 speech terminals to share a PRMA channel with 20 slots per frame, with a packet dropping probability of less than 1%. >
TL;DR: A voice activity detector (VAD) that can operate reliably in SNRs down to 0 dB and detect most speech at −5 dB is described, and how robustness to these signals can be achieved with suitable preprocessing and postprocessing is shown.
Abstract: The paper describes a voice activity detector (VAD) that can operate reliably in SNRs down to 0 dB and detect most speech at −5 dB. The detector applies a least-squares periodicity estimator to the input signal, and triggers when a significant amount of periodicity is found. It does not aim to find the exact talkspurt boundaries and, consequently, is most suited to speech-logging applications where it is easy to include a small margin to allow for any missed speech. The paper discusses the problem of false triggering on nonspeech periodic signals and shows how robustness to these signals can be achieved with suitable preprocessing and postprocessing.
TL;DR: In this paper, a multiplex transmission system is disclosed in which speech activity is detected as a sequence of talkspurts and intervening silent intervals, and the speech information is encoded and accumulated in a buffer store by omitting the silent intervals.
Abstract: A multiplex transmission system is disclosed in which speech activity is detected as a sequence of talkspurts and intervening silent intervals. The speech information is encoded and accumulated in a buffer store by omitting the silent intervals. A time stamp is associated with each talkspurt code burst to permit approximate reconstruction of the talkspurt time structure at the receiver. The talkspurt code bursts are assembled into packets of optimum size for transmission on shared transmission facilities, such as a Time Assignment Speech Interpolation (TASI) system. The assembled talkspurt packets can be encoded by an adaptive technique responsive to the loading on the transmission facilities and can be transmitted at a rate faster than real-time speech generation to accommodate high-level usage on the shared transmission facilities.
TL;DR: The significance of this work lies in the important role that talkspurt hangover plays, for example, in minimizing speech detector induced back-end clipping of talkspurts, reducing exposure to the variable Talkspurt delay impairment, and in determining signaling overhead and resource occupancy in various speech interpolation, packet voice, and integrated voice/data systems.
Abstract: This paper deals with the measurement and calculation of various speech temporal parameters of interest in an environment where speech activity detection is employed. In particular it is shown that, based on either a measurement or model of the probability density function (pdf) for silence durations for the case of zero talkspurt "hangover" or "fill-in," that the following temporal parameters can be computed for any value of hangover or fill-in: the mean (and pdf) for silence durations, the mean talkspurt duration, the mean talkspurt rate, and the speech activity. Directly measured values of these parameters and those computed from both measured and fitted versions of the pdf for silence durations are compared and are shown to be in reasonable agreement. The illustrated results are based on measurements of about two minutes of taped male monolog source speech. However, the approach to calculating the above parameters is general in the sense that it can be applied to any measured or modeled pdf for silence durations. The significance of this work lies in the important role that talkspurt hangover plays, for example, in minimizing speech detector induced back-end clipping of talkspurts, reducing exposure to the variable talkspurt delay impairment, and in determining signaling overhead and resource occupancy in various speech interpolation, packet voice, and integrated voice/data systems.