TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.
TL;DR: This paper identifies seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems, and demonstrates that, the greater the number of turn-Yielding cues that are present, the great the likelihood that a turn change will occur.
TL;DR: A method that can be used for Minimum Bayes Risk decoding for speech recognition that has similar functionality to the widely used Consensus method, but has a clearer theoretical basis and appears to give better results both for MBR decoding and system combination.
TL;DR: A set of acoustic and linguistic features that characterise emotional/emotion-related user states - confined to the one database processed: four classes in a German corpus of children interacting with a pet robot are described and interpreted.
TL;DR: The spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations, and the proposed multiple feature hierarchical method for seven emotions improves the performance over the standard classifiers and the fixed features.
TL;DR: SpeechRater presently fails to measure many important aspects of speaking proficiency (such as intonation and appropriateness of topic development), and its agreement with human ratings of proficiency does not yet approach the level of agreement between two human raters.
TL;DR: Experimental results show that lexical information has more discriminative power than acoustic and contextual cues for detection of politeness, whereas context and acoustic features perform best for frustration detection and results showed that classification performance varies with age and gender.
TL;DR: This study is the first study to show that dynamically responding to uncertainty can significantly improve learning during computer tutoring, and highlights the ongoing evaluation of the two uncertainty-adaptive systems with respect to other important performance metrics.
TL;DR: In this article, the authors studied the multifunctionality of dialogue utterances, i.e., the phenomenon that utterances in dialogue often have more than one communicative function, by analyzing the participation in dialogue as involving the performance of several types of activity in parallel, relating to different dimensions of communication.
TL;DR: This paper describes how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings and presents concrete system architectures, virtual, physical and mobile multimodAL interfaces, and interaction management techniques for such Companions.
TL;DR: This paper presents an approach to do documentary-level sentiment classification by modeling description of topical terms, and shows that the results are comparable to the state-of-art results on a publicly available movie review corpus and a Chinese digital product review corpus.
TL;DR: This special issue of the Journal is concerned with speech and language processing issues in the overall environment of end-to-end dialogue systems, and in particular with the sorts of techniques deployed in the COMPANIONS project, which most of the contributors to this issue are associated with.
TL;DR: Experiments on artificially corrupted speech show that sparse imputation substantially outperforms a conventional imputation technique when the ideal 'oracle' reliability of features is used, and with error-prone estimates of feature reliability, sparse imputations performance is comparable to the baseline imputations technique in the cleanest conditions, and substantially better at lower SNRs.
TL;DR: Results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
TL;DR: A novel approach to enabling reinforcement learning for open dialogue systems through the detection of emotion in the speech signal and its deployment as a form of a learned DM, at a higher level than the DM virtual machine, able to direct the SC's responses to a more emotionally appropriate part of its repertoire.
TL;DR: This paper examines how MLP features, and the associated acoustic models, can be trained efficiently on large training corpora using discriminative training techniques, and an approach that combines multiple individual MLPs is proposed, and this reduces the time needed to train MLPs on large amounts of data.
TL;DR: The Anatolian and Kurgan hypotheses of the Indo-European origin and the 'express train' model of the Polynesian origin are thoroughly discussed and the fully automated method for construction of language taxonomy is tested.
TL;DR: An approach to classify linguistic networks of tens of thousands of vertices by exploring a small range of mathematically well-established topological indices is developed by analyzing social ontologies as a new resource for automatic language classification.
TL;DR: This paper examines whether additional unlabeled data, which is easy to obtain, can be used to improve supervised algorithms, and proposes a simple yet flexible transductive meta-algorithm, which improves over supervised algorithms on the TREC and OHSUMED tasks from the LETOR dataset.
TL;DR: Automatic feature analysis shows that lexical collocations are the most reliable indicators, followed by prosodic/positional features, while sociolinguistic features are marginally useful for the identification of DM like and not useful for well.
TL;DR: A novel user intention simulation method which is data-driven but can integrate diverse user discourse knowledge to simulate various types of user behaviors and successfully generated cooperative, corrective and self-directing user intention patterns.
TL;DR: Demographic fields and turn-taking behavior prove to be statistically dependent, thus observed speaker activity improves estimates of the demographics of held-out data and is used to estimate speaker influence.
TL;DR: The authors applied complex network analysis techniques to a database of available Indus inscriptions, with the aim of detecting patterns indicative of syntactic organization, e.g., recursive structures in the segmentation trees of the sequences, that suggest the existence of a grammar underlying these inscriptions.
TL;DR: The results inferred from speech in both English and Hebrew, indicate that the vocal expressions of complex affective states such as thinking, certainty and interest transcend language boundaries.
TL;DR: A robust feature-extraction method on the basis of the normalization of the sub-band temporal modulation envelopes (TMEs) was proposed, which was better than using other temporal filtering and normalization methods.
TL;DR: A detailed view of the inner workings of the current version of the Vocal Joystick engine is presented, a real-time software library which can be used to map non-linguistic vocalizations into realizable continuous control signals.
TL;DR: In both a laboratory experiment and a web-based experimental paradigm employing the Amazon Mechanical Turk platform, it is shown that the discourse cues in UMSR summaries help users compare different options and choose between options, even though they do not improve verbatim recall.
TL;DR: It is proposed here to use fictional media to compensate for the difficulty of collecting strong emotions, and a fear-type emotion recognition system has been developed, that is based on acoustic models learnt from the fiction corpus.
TL;DR: Results from the experiments show that participants welcome the added value of TTS in being able to provide additional detail on their account transactions, but that TTS should be used minimally in the service.