Top 43 Computer Speech & Language papers published in 2011

TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

...read moreread less

323 citations

Journal Article•10.1016/J.CSL.2010.10.003•

Turn-taking cues in task-oriented dialogue

[...]

Agustín Gravano¹, Julia Hirschberg²•Institutions (2)

University of Buenos Aires¹, Columbia University²

TL;DR: This paper identifies seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems, and demonstrates that, the greater the number of turn-Yielding cues that are present, the great the likelihood that a turn change will occur.

...read moreread less

273 citations

Journal Article•10.1016/J.CSL.2011.03.001•

Minimum Bayes Risk decoding and system combination based on a recursion for edit distance

[...]

Haihua Xu¹, Daniel Povey², Lidia Mangu³, Jie Zhu¹•Institutions (3)

Shanghai Jiao Tong University¹, Microsoft², IBM³

01 Oct 2011-Computer Speech & Language

TL;DR: A method that can be used for Minimum Bayes Risk decoding for speech recognition that has similar functionality to the widely used Consensus method, but has a clearer theoretical basis and appears to give better results both for MBR decoding and system combination.

...read moreread less

179 citations

Journal Article•10.1016/J.CSL.2009.12.003•

Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech

[...]

Anton Batliner¹, Stefan Steidl¹, Björn Schuller², Dino Seppi³, Thurid Vogt⁴, Johannes Wagner⁴, Laurence Devillers⁵, Laurence Vidrascu⁵, Vered Aharonson⁶, Loic Kessous⁷, Noam Amir⁷ - Show less +7 more•Institutions (7)

University of Erlangen-Nuremberg¹, Technische Universität München², fondazione bruno kessler³, University of Augsburg⁴, Centre national de la recherche scientifique⁵, Afeka College of Engineering⁶, Tel Aviv University⁷

Enrique Albornoz, Diego H. Milone, Hugo Leonardo Rufiner

TL;DR: A set of acoustic and linguistic features that characterise emotional/emotion-related user states - confined to the one database processed: four classes in a German corpus of children interacting with a pet robot are described and interpreted.

...read moreread less

170 citations

Journal Article•10.1016/J.CSL.2010.10.001•

Spoken emotion recognition using hierarchical classifiers

[...]

TL;DR: The spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations, and the proposed multiple feature hierarchical method for seven emotions improves the performance over the standard classifiers and the fixed features.

...read moreread less

152 citations

Journal Article•10.1016/J.CSL.2010.06.001•

A three-stage approach to the automated scoring of spontaneous spoken responses

[...]

Derrick Higgins¹, Xiaoming Xi¹, Klaus Zechner¹, David M. Williamson¹•Institutions (1)

Princeton University¹

TL;DR: SpeechRater presently fails to measure many important aspects of speaking proficiency (such as intonation and appropriateness of topic development), and its agreement with human ratings of proficiency does not yet approach the level of agreement between two human raters.

...read moreread less

120 citations

Journal Article•10.1016/J.CSL.2009.12.004•

Detecting emotional state of a child in a conversational computer game

[...]

Serdar Yildirim¹, Shrikanth S. Narayanan², Alexandros Potamianos³•Institutions (3)

Mustafa Kemal University¹, University of Southern California², Technical University of Crete³

University of Pittsburgh¹

TL;DR: Experimental results show that lexical information has more discriminative power than acoustic and contextual cues for detection of politeness, whereas context and acoustic features perform best for frustration detection and results showed that classification performance varies with age and gender.

...read moreread less

101 citations

Journal Article•10.1016/J.CSL.2009.12.002•

Designing and evaluating a wizarded uncertainty-adaptive spoken dialogue tutoring system

[...]

Kate Forbes-Riley¹, Diane J. Litman¹•Institutions (1)

TL;DR: This study is the first study to show that dynamically responding to uncertainty can significantly improve learning during computer tutoring, and highlights the ongoing evaluation of the two uncertainty-adaptive systems with respect to other important performance metrics.

...read moreread less

100 citations

Journal Article•10.1016/J.CSL.2010.04.006•

Multifunctionality in dialogue

[...]

Harry Bunt¹•Institutions (1)

Tilburg University¹

TL;DR: In this article, the authors studied the multifunctionality of dialogue utterances, i.e., the phenomenon that utterances in dialogue often have more than one communicative function, by analyzing the participation in dialogue as involving the performance of several types of activity in parallel, relating to different dimensions of communication.

...read moreread less

67 citations

Journal Article•10.1016/J.CSL.2010.04.004•

Multimodal and mobile conversational Health and Fitness Companions

[...]

Markku Turunen¹, Jaakko Hakulinen¹, Olov Ståhl, Björn Gambäck², Preben Hansen, Mari C. Rodríguez Gancedo³, Raul Santos de la Camara³, Cameron Smith⁴, Daniel Charlton⁴, Marc Cavazza⁴ - Show less +6 more•Institutions (4)

University of Tampere¹, Norwegian University of Science and Technology², Telefónica³, Teesside University⁴

Hong Kong Polytechnic University¹

TL;DR: This paper describes how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings and presents concrete system architectures, virtual, physical and mobile multimodAL interfaces, and interaction management techniques for such Companions.

...read moreread less

53 citations

Journal Article•10.1016/J.CSL.2010.07.004•

Document sentiment classification by exploring description model of topical terms

[...]

Yi Hu¹, Wenjie Li¹•Institutions (1)

TL;DR: This paper presents an approach to do documentary-level sentiment classification by modeling description of topical terms, and shows that the results are comparable to the state-of-art results on a publicly available movie review corpus and a Chinese digital product review corpus.

...read moreread less

Journal Article•10.1016/J.CSL.2010.03.001•

Review: Some background on dialogue management and conversational speech for dialogue systems

[...]

Yorick Wilks¹, Roberta Catizone², Simon Worgan², Markku Turunen³•Institutions (3)

University of Oxford¹, University of Sheffield², University of Tampere³

TL;DR: This special issue of the Journal is concerned with speech and language processing issues in the overall environment of end-to-end dialogue systems, and in particular with the sorts of techniques deployed in the COMPANIONS project, which most of the contributors to this issue are associated with.

...read moreread less

Journal Article•10.1016/J.CSL.2010.06.004•

Sparse imputation for large vocabulary noise robust ASR

[...]

Jort F. Gemmeke¹, Bert Cranen¹, Ulpu Remes²•Institutions (2)

Radboud University Nijmegen¹, Aalto University²

Queensland University of Technology¹

TL;DR: Experiments on artificially corrupted speech show that sparse imputation substantially outperforms a conventional imputation technique when the ideal 'oracle' reliability of features is used, and with error-prone estimates of feature reliability, sparse imputations performance is comparable to the baseline imputations technique in the cleanest conditions, and substantially better at lower SNRs.

...read moreread less

Journal Article•10.1016/J.CSL.2010.09.001•

The use of phase in complex spectrum subtraction for robust speech recognition

[...]

Tristan Kleinschmidt¹, Sridha Sridharan¹, Michael Mason¹•Institutions (1)

TL;DR: Results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.

...read moreread less

Journal Article•10.1016/J.CSL.2010.04.002•

A prototype for a conversational companion for reminiscing about images

[...]

Yorick Wilks¹, Roberta Catizone¹, Simon Worgan¹, Alexiei Dingli¹, Roger K. Moore¹, Debora Field¹, Weiwei Cheng¹ - Show less +3 more•Institutions (1)

University of Sheffield¹

TL;DR: A novel approach to enabling reinforcement learning for open dialogue systems through the detection of emotion in the speech signal and its deployment as a form of a learned DM, at a higher level than the DM virtual machine, able to direct the SC's responses to a more emotionally appropriate part of its repertoire.

...read moreread less

Journal Article•10.1016/J.CSL.2010.07.005•

The efficient incorporation of MLP features into automatic speech recognition systems

[...]

J. Park¹, F. Diehl¹, Mark J. F. Gales¹, Marcus Tomalin¹, Philip C. Woodland¹ - Show less +1 more•Institutions (1)

University of Cambridge¹

TL;DR: This paper examines how MLP features, and the associated acoustic models, can be trained efficiently on large training corpora using discriminative training techniques, and an approach that combines multiple individual MLPs is proposed, and this reduces the time needed to train MLPs on large amounts of data.

...read moreread less

Journal Article•10.1016/J.CSL.2010.05.003•

Geometric representations of language taxonomies

[...]

Ph. Blanchard¹, Filippo Petroni², Maurizio Serva, Dimitry Volchenkov¹•Institutions (2)

Bielefeld University¹, Sapienza University of Rome²

TL;DR: The Anatolian and Kurgan hypotheses of the Indo-European origin and the 'express train' model of the Polynesian origin are thoroughly discussed and the fully automated method for construction of language taxonomy is tested.

...read moreread less

Journal Article•10.1016/J.CSL.2010.05.006•

Geography of social ontologies: Testing a variant of the Sapir-Whorf Hypothesis in the context of Wikipedia

[...]

Alexander Mehler¹, Olga Pustylnikov¹, Nils Diewald¹•Institutions (1)

Bielefeld University¹

University of Washington¹

TL;DR: An approach to classify linguistic networks of tens of thousands of vertices by exploring a small range of mathematically well-established topological indices is developed by analyzing social ontologies as a new resource for automatic language classification.

...read moreread less

Journal Article•10.1016/J.CSL.2010.05.002•

Semi-supervised ranking for document retrieval

[...]

Kevin Duh¹, Katrin Kirchhoff¹•Institutions (1)

TL;DR: This paper examines whether additional unlabeled data, which is easy to obtain, can be used to improve supervised algorithms, and proposes a simple yet flexible transductive meta-algorithm, which improves over supervised algorithms on the TREC and OHSUMED tasks from the LETOR dataset.

...read moreread less

Journal Article•10.1016/J.CSL.2010.12.001•

Automatic identification of discourse markers in dialogues: An in-depth study of like and well

[...]

Andrei Popescu-Belis¹, Sandrine Zufferey²•Institutions (2)

Idiap Research Institute¹, University of Geneva²

Pohang University of Science and Technology¹

TL;DR: Automatic feature analysis shows that lexical collocations are the most reliable indicators, followed by prosodic/positional features, while sociolinguistic features are marginally useful for the identification of DM like and not useful for well.

...read moreread less

Journal Article•10.1016/J.CSL.2010.06.002•

Hybrid user intention modeling to diversify dialog simulations

[...]

Sangkeun Jung¹, Cheongjae Lee¹, Kyungduk Kim¹, Donghyeon Lee¹, Gary Geunbae Lee¹ - Show less +1 more•Institutions (1)

Helsinki University of Technology¹

TL;DR: A novel user intention simulation method which is data-driven but can integrate diverse user discourse knowledge to simulate various types of user behaviors and successfully generated cooperative, corrective and self-directing user intention patterns.

...read moreread less

Journal Article•10.1016/J.CSL.2011.01.002•

Social correlates of turn-taking style

[...]

John Grothendieck¹, Allen L. Gorin², Nash Borges³•Institutions (3)

BBN Technologies¹, United States Department of Defense², Johns Hopkins University³

01 Oct 2011-Computer Speech & Language

TL;DR: Demographic fields and turn-taking behavior prove to be statistically dependent, thus observed speaker activity improves estimates of the demographics of held-out data and is used to estimate speaker influence.

...read moreread less

Journal Article•10.1016/J.CSL.2010.05.007•

Network analysis of a corpus of undeciphered Indus civilization inscriptions indicates syntactic organization

[...]

Sitabhra Sinha, Ashraf Md Izhar, Raj Kumar Pan¹, Bryan Kenneth Wells•Institutions (1)

TL;DR: The authors applied complex network analysis techniques to a database of available Indus inscriptions, with the aim of detecting patterns indicative of syntactic organization, e.g., recursive structures in the segmentation trees of the sequences, that suggest the existence of a grammar underlying these inscriptions.

...read moreread less

Journal Article•10.1016/J.CSL.2009.12.005•

Automatic inference of complex affective states

[...]

Tal Sobol-Shikler¹•Institutions (1)

University of Cambridge¹

National Institute of Information and Communications Technology¹, Japan Advanced Institute of Science and Technology²

TL;DR: The results inferred from speech in both English and Hebrew, indicate that the vocal expressions of complex affective states such as thinking, certainty and interest transcend language boundaries.

...read moreread less

Journal Article•10.1016/J.CSL.2010.10.002•

Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments

[...]

Xugang Lu¹, Masashi Unoki², Satoshi Nakamura¹•Institutions (2)

Centre national de la recherche scientifique¹, Trinity College, Dublin²

TL;DR: A robust feature-extraction method on the basis of the normalization of the sub-band temporal modulation envelopes (TMEs) was proposed, which was better than using other temporal filtering and normalization methods.

...read moreread less

Journal Article•10.1016/J.CSL.2010.07.002•

Editorial: Special issue of computer speech and language on affective speech in real-life interactions

[...]

Laurence Devillers¹, Nick Campbell²•Institutions (2)

University of Washington¹

Journal Article•10.1016/J.CSL.2010.03.005•

The Vocal Joystick Engine v1.0

[...]

Jonathan Malkin¹, Xiao Li¹, Susumu Harada¹, James A. Landay¹, Jeff A. Bilmes¹ - Show less +1 more•Institutions (1)

TL;DR: A detailed view of the inner workings of the current version of the Vocal Joystick engine is presented, a real-time software library which can be used to map non-linguistic vocalizations into realizable continuous control signals.

...read moreread less

Journal Article•10.1016/J.CSL.2010.04.003•

The user model-based summarize and refine approach improves information presentation in spoken dialog systems

[...]

Andi Winterboer¹, Martin I. Tietze², Maria Wolters², Johanna D. Moore²•Institutions (2)

University of Amsterdam¹, University of Edinburgh²

TL;DR: In both a laboratory experiment and a web-based experimental paradigm employing the Amazon Mechanical Turk platform, it is shown that the discourse cues in UMSR summaries help users compare different options and choose between options, even though they do not improve verbatim recall.

...read moreread less

Journal Article•10.1016/J.CSL.2010.03.003•

Fiction support for realistic portrayals of fear-type emotional manifestations

[...]

C. Clavel, I. Vasilescu¹, Laurence Devillers¹•Institutions (1)

University of Paris¹

TL;DR: It is proposed here to use fictional media to compensate for the difficulty of collecting strong emotions, and a fear-type emotion recognition system has been developed, that is based on acoustic models learnt from the fiction corpus.

...read moreread less

Journal Article•10.1016/J.CSL.2010.05.008•

Usability assessment of text-to-speech synthesis for additional detail in an automated telephone banking system

[...]

Hazel Morton¹, Nancie Gunson¹, Diarmid Marshall¹, Fergus McInnes¹, Andrea Ayres², Mervyn Jack¹ - Show less +2 more•Institutions (2)

University of Edinburgh¹, Lloyds Banking Group²