TL;DR: The authors proposed an incremental text-to-speech (TTS) method that uses a pseudo lookahead generated with a language model to take the future contextual information into account without increasing latency.
Abstract: This letter presents an incremental text-to-speech (TTS) method that performs synthesis in small linguistic units while maintaining the naturalness of output speech. Incremental TTS is generally subject to a trade-off between latency and synthetic speech quality. It is challenging to produce high-quality speech with a low-latency setup that does not make much use of an unobserved future sentence (hereafter, “lookahead”). To resolve this issue, we propose an incremental TTS method that uses a pseudo lookahead generated with a language model to take the future contextual information into account without increasing latency. Our method can be regarded as imitating a human's incremental reading and uses pretrained GPT2, which accounts for the large-scale linguistic knowledge, for the lookahead generation. Evaluation results show that our method 1) achieves higher speech quality than the method taking only observed information into account and 2) achieves a speech quality equivalent to waiting for the future context observation.
TL;DR: This article proposed an incremental text-to-speech (TTS) method that uses a pseudo lookahead generated with a language model to take the future contextual information into account without increasing latency.
Abstract: This letter presents an incremental text-to-speech (TTS) method that performs synthesis in small linguistic units while maintaining the naturalness of output speech. Incremental TTS is generally subject to a trade-off between latency and synthetic speech quality. It is challenging to produce high-quality speech with a low-latency setup that does not make much use of an unobserved future sentence (hereafter, "lookahead"). To resolve this issue, we propose an incremental TTS method that uses a pseudo lookahead generated with a language model to take the future contextual information into account without increasing latency. Our method can be regarded as imitating a human's incremental reading and uses pretrained GPT2, which accounts for the large-scale linguistic knowledge, for the lookahead generation. Evaluation results show that our method 1) achieves higher speech quality than the method taking only observed information into account and 2) achieves a speech quality equivalent to waiting for the future context observation.
TL;DR: A comprehension engine consisting of knowledge induction which connects the knowledge space by augmenting associations within it is proposed which is considered the first algorithm level model for comprehension compared with existing works.
Abstract: Reading is one of the essential practices of modern human learning Comprehending prose text simply from the available text is particularly challenging as in general the comprehension of prose requires the use of external knowledge or references Although the processes of reading comprehension have been widely studied in the field of psychology, no algorithm level models for comprehension have yet to be developed This paper has proposed a comprehension engine consisting of knowledge induction which connects the knowledge space by augmenting associations within it The connections are achieved through the automatic incremental reading of external references and the capturing of high familiarity knowledge associations between prose concepts The Ontology Engine is used to find lexical knowledge associations amongst concept pairs, with the objective being to obtain a knowledge space graph with a single giant component to establish a base model for prose comprehension The comprehension engine is evaluated through experiments with various selected prose texts Akin to human readers, it could mine reference texts from modern knowledge corpuses such as Wikipedia and WordNet The results demonstrate the potential efficiency of using the comprehension engine that enhances the quality of reading comprehension in addition to reducing reading time This comprehension engine is considered the first algorithm level model for comprehension compared with existing works
TL;DR: The authors used the term remedial to describe what might otherwise be called basket case reading courses (usually for younger, markedly lagging students), while developmental has traditionally signaled programs and materials designed for older readers who can get about in the printed word, but not so fast or so ably as these readers would like.
Abstract: stories for children in speech therapy as well. B Educators and publishers have long followed a sensitive semantic practice. They have used the term remedial to describe what might otherwise be called basket case reading courses (usually for younger, markedly lagging students), while developmental has traditionally signaled programs and materials designed for older readers who can get about in the printed word, but not so fast or so ably as these readers would like.
TL;DR: Extensions to the DocQA model are presented to allow incremental reading without loss of accuracy and jointly learns to provide the best answer given the text that is seen so far and predict whether this best-so-far answer is sufficient.
Abstract: Any system which performs goal-directed continual learning must not only learn incrementally but process and absorb information incrementally. Such a system also has to understand when its goals have been achieved. In this paper, we consider these issues in the context of question answering. Current state-of-the-art question answering models reason over an entire passage, not incrementally. As we will show, naive approaches to incremental reading, such as restriction to unidirectional language models in the model, perform poorly. We present extensions to the DocQA [2] model to allow incremental reading without loss of accuracy. The model also jointly learns to provide the best answer given the text that is seen so far and predict whether this best-so-far answer is sufficient.