Scispace (Formerly Typeset)
  1. Home
  2. Journals
  3. Computer Speech & Language
  4. 2011
  1. Home
  2. Journals
  3. Computer Speech & Language
  4. 2011
Showing papers in "Computer Speech & Language in 2011"
Journal Article•10.1016/J.CSL.2010.06.003•
The subspace Gaussian mixture model-A structured model for speech recognition

[...]

Daniel Povey1, Lukas Burget2, Mohit Agarwal3, Pinar Akyazi4, Feng Kai5, Arnab Ghoshal6, Ondřej Glembek2, Nagendra Kumar Goel, Martin Karafiat2, Ariya Rastrow7, Richard Rose8, Petr Schwarz2, Samuel Thomas7 •
Microsoft1, Brno University of Technology2, Indian Institute of Information Technology, Allahabad3, Boğaziçi University4, Hong Kong University of Science and Technology5, Saarland University6, Johns Hopkins University7, McGill University8
01 Apr 2011-Computer Speech & Language
TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

323 citations

Journal Article•10.1016/J.CSL.2010.10.003•
Turn-taking cues in task-oriented dialogue

[...]

Agustín Gravano1, Julia Hirschberg2•
University of Buenos Aires1, Columbia University2
01 Jul 2011-Computer Speech & Language
TL;DR: This paper identifies seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems, and demonstrates that, the greater the number of turn-Yielding cues that are present, the great the likelihood that a turn change will occur.

273 citations

Journal Article•10.1016/J.CSL.2011.03.001•
Minimum Bayes Risk decoding and system combination based on a recursion for edit distance

[...]

Haihua Xu1, Daniel Povey2, Lidia Mangu3, Jie Zhu1•
Shanghai Jiao Tong University1, Microsoft2, IBM3
01 Oct 2011-Computer Speech & Language
TL;DR: A method that can be used for Minimum Bayes Risk decoding for speech recognition that has similar functionality to the widely used Consensus method, but has a clearer theoretical basis and appears to give better results both for MBR decoding and system combination.

179 citations

Journal Article•10.1016/J.CSL.2009.12.003•
Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech

[...]

Anton Batliner1, Stefan Steidl1, Björn Schuller2, Dino Seppi3, Thurid Vogt4, Johannes Wagner4, Laurence Devillers5, Laurence Vidrascu5, Vered Aharonson6, Loic Kessous7, Noam Amir7 •
University of Erlangen-Nuremberg1, Technische Universität München2, fondazione bruno kessler3, University of Augsburg4, Centre national de la recherche scientifique5, Afeka College of Engineering6, Tel Aviv University7
01 Jan 2011-Computer Speech & Language
TL;DR: A set of acoustic and linguistic features that characterise emotional/emotion-related user states - confined to the one database processed: four classes in a German corpus of children interacting with a pet robot are described and interpreted.

170 citations

Journal Article•10.1016/J.CSL.2010.10.001•
Spoken emotion recognition using hierarchical classifiers

[...]

Enrique Albornoz, Diego H. Milone, Hugo Leonardo Rufiner
01 Jul 2011-Computer Speech & Language
TL;DR: The spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations, and the proposed multiple feature hierarchical method for seven emotions improves the performance over the standard classifiers and the fixed features.

152 citations

Journal Article•10.1016/J.CSL.2010.06.001•
A three-stage approach to the automated scoring of spontaneous spoken responses

[...]

Derrick Higgins1, Xiaoming Xi1, Klaus Zechner1, David M. Williamson1•
Princeton University1
01 Apr 2011-Computer Speech & Language
TL;DR: SpeechRater presently fails to measure many important aspects of speaking proficiency (such as intonation and appropriateness of topic development), and its agreement with human ratings of proficiency does not yet approach the level of agreement between two human raters.

120 citations

Journal Article•10.1016/J.CSL.2009.12.004•
Detecting emotional state of a child in a conversational computer game

[...]

Serdar Yildirim1, Shrikanth S. Narayanan2, Alexandros Potamianos3•
Mustafa Kemal University1, University of Southern California2, Technical University of Crete3
01 Jan 2011-Computer Speech & Language
TL;DR: Experimental results show that lexical information has more discriminative power than acoustic and contextual cues for detection of politeness, whereas context and acoustic features perform best for frustration detection and results showed that classification performance varies with age and gender.

101 citations

Journal Article•10.1016/J.CSL.2009.12.002•
Designing and evaluating a wizarded uncertainty-adaptive spoken dialogue tutoring system

[...]

Kate Forbes-Riley1, Diane J. Litman1•
University of Pittsburgh1
01 Jan 2011-Computer Speech & Language
TL;DR: This study is the first study to show that dynamically responding to uncertainty can significantly improve learning during computer tutoring, and highlights the ongoing evaluation of the two uncertainty-adaptive systems with respect to other important performance metrics.

100 citations

Journal Article•10.1016/J.CSL.2010.04.006•
Multifunctionality in dialogue

[...]

Harry Bunt1•
Tilburg University1
01 Apr 2011-Computer Speech & Language
TL;DR: In this article, the authors studied the multifunctionality of dialogue utterances, i.e., the phenomenon that utterances in dialogue often have more than one communicative function, by analyzing the participation in dialogue as involving the performance of several types of activity in parallel, relating to different dimensions of communication.

67 citations

Journal Article•10.1016/J.CSL.2010.04.004•
Multimodal and mobile conversational Health and Fitness Companions

[...]

Markku Turunen1, Jaakko Hakulinen1, Olov Ståhl, Björn Gambäck2, Preben Hansen, Mari C. Rodríguez Gancedo3, Raul Santos de la Camara3, Cameron Smith4, Daniel Charlton4, Marc Cavazza4 •
University of Tampere1, Norwegian University of Science and Technology2, Telefónica3, Teesside University4
01 Apr 2011-Computer Speech & Language
TL;DR: This paper describes how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings and presents concrete system architectures, virtual, physical and mobile multimodAL interfaces, and interaction management techniques for such Companions.

53 citations

Journal Article•10.1016/J.CSL.2010.07.004•
Document sentiment classification by exploring description model of topical terms

[...]

Yi Hu1, Wenjie Li1•
Hong Kong Polytechnic University1
01 Apr 2011-Computer Speech & Language
TL;DR: This paper presents an approach to do documentary-level sentiment classification by modeling description of topical terms, and shows that the results are comparable to the state-of-art results on a publicly available movie review corpus and a Chinese digital product review corpus.
Journal Article•10.1016/J.CSL.2010.03.001•
Review: Some background on dialogue management and conversational speech for dialogue systems

[...]

Yorick Wilks1, Roberta Catizone2, Simon Worgan2, Markku Turunen3•
University of Oxford1, University of Sheffield2, University of Tampere3
01 Apr 2011-Computer Speech & Language
TL;DR: This special issue of the Journal is concerned with speech and language processing issues in the overall environment of end-to-end dialogue systems, and in particular with the sorts of techniques deployed in the COMPANIONS project, which most of the contributors to this issue are associated with.
Journal Article•10.1016/J.CSL.2010.06.004•
Sparse imputation for large vocabulary noise robust ASR

[...]

Jort F. Gemmeke1, Bert Cranen1, Ulpu Remes2•
Radboud University Nijmegen1, Aalto University2
01 Apr 2011-Computer Speech & Language
TL;DR: Experiments on artificially corrupted speech show that sparse imputation substantially outperforms a conventional imputation technique when the ideal 'oracle' reliability of features is used, and with error-prone estimates of feature reliability, sparse imputations performance is comparable to the baseline imputations technique in the cleanest conditions, and substantially better at lower SNRs.
Journal Article•10.1016/J.CSL.2010.09.001•
The use of phase in complex spectrum subtraction for robust speech recognition

[...]

Tristan Kleinschmidt1, Sridha Sridharan1, Michael Mason1•
Queensland University of Technology1
01 Jul 2011-Computer Speech & Language
TL;DR: Results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
Journal Article•10.1016/J.CSL.2010.04.002•
A prototype for a conversational companion for reminiscing about images

[...]

Yorick Wilks1, Roberta Catizone1, Simon Worgan1, Alexiei Dingli1, Roger K. Moore1, Debora Field1, Weiwei Cheng1 •
University of Sheffield1
01 Apr 2011-Computer Speech & Language
TL;DR: A novel approach to enabling reinforcement learning for open dialogue systems through the detection of emotion in the speech signal and its deployment as a form of a learned DM, at a higher level than the DM virtual machine, able to direct the SC's responses to a more emotionally appropriate part of its repertoire.
Journal Article•10.1016/J.CSL.2010.07.005•
The efficient incorporation of MLP features into automatic speech recognition systems

[...]

J. Park1, F. Diehl1, Mark J. F. Gales1, Marcus Tomalin1, Philip C. Woodland1 •
University of Cambridge1
01 Jul 2011-Computer Speech & Language
TL;DR: This paper examines how MLP features, and the associated acoustic models, can be trained efficiently on large training corpora using discriminative training techniques, and an approach that combines multiple individual MLPs is proposed, and this reduces the time needed to train MLPs on large amounts of data.
Journal Article•10.1016/J.CSL.2010.05.003•
Geometric representations of language taxonomies

[...]

Ph. Blanchard1, Filippo Petroni2, Maurizio Serva, Dimitry Volchenkov1•
Bielefeld University1, Sapienza University of Rome2
01 Jul 2011-Computer Speech & Language
TL;DR: The Anatolian and Kurgan hypotheses of the Indo-European origin and the 'express train' model of the Polynesian origin are thoroughly discussed and the fully automated method for construction of language taxonomy is tested.
Journal Article•10.1016/J.CSL.2010.05.006•
Geography of social ontologies: Testing a variant of the Sapir-Whorf Hypothesis in the context of Wikipedia

[...]

Alexander Mehler1, Olga Pustylnikov1, Nils Diewald1•
Bielefeld University1
01 Jul 2011-Computer Speech & Language
TL;DR: An approach to classify linguistic networks of tens of thousands of vertices by exploring a small range of mathematically well-established topological indices is developed by analyzing social ontologies as a new resource for automatic language classification.
Journal Article•10.1016/J.CSL.2010.05.002•
Semi-supervised ranking for document retrieval

[...]

Kevin Duh1, Katrin Kirchhoff1•
University of Washington1
01 Apr 2011-Computer Speech & Language
TL;DR: This paper examines whether additional unlabeled data, which is easy to obtain, can be used to improve supervised algorithms, and proposes a simple yet flexible transductive meta-algorithm, which improves over supervised algorithms on the TREC and OHSUMED tasks from the LETOR dataset.
Journal Article•10.1016/J.CSL.2010.12.001•
Automatic identification of discourse markers in dialogues: An in-depth study of like and well

[...]

Andrei Popescu-Belis1, Sandrine Zufferey2•
Idiap Research Institute1, University of Geneva2
01 Jul 2011-Computer Speech & Language
TL;DR: Automatic feature analysis shows that lexical collocations are the most reliable indicators, followed by prosodic/positional features, while sociolinguistic features are marginally useful for the identification of DM like and not useful for well.
Journal Article•10.1016/J.CSL.2010.06.002•
Hybrid user intention modeling to diversify dialog simulations

[...]

Sangkeun Jung1, Cheongjae Lee1, Kyungduk Kim1, Donghyeon Lee1, Gary Geunbae Lee1 •
Pohang University of Science and Technology1
01 Apr 2011-Computer Speech & Language
TL;DR: A novel user intention simulation method which is data-driven but can integrate diverse user discourse knowledge to simulate various types of user behaviors and successfully generated cooperative, corrective and self-directing user intention patterns.
Journal Article•10.1016/J.CSL.2011.01.002•
Social correlates of turn-taking style

[...]

John Grothendieck1, Allen L. Gorin2, Nash Borges3•
BBN Technologies1, United States Department of Defense2, Johns Hopkins University3
01 Oct 2011-Computer Speech & Language
TL;DR: Demographic fields and turn-taking behavior prove to be statistically dependent, thus observed speaker activity improves estimates of the demographics of held-out data and is used to estimate speaker influence.
Journal Article•10.1016/J.CSL.2010.05.007•
Network analysis of a corpus of undeciphered Indus civilization inscriptions indicates syntactic organization

[...]

Sitabhra Sinha, Ashraf Md Izhar, Raj Kumar Pan1, Bryan Kenneth Wells•
Helsinki University of Technology1
01 Jul 2011-Computer Speech & Language
TL;DR: The authors applied complex network analysis techniques to a database of available Indus inscriptions, with the aim of detecting patterns indicative of syntactic organization, e.g., recursive structures in the segmentation trees of the sequences, that suggest the existence of a grammar underlying these inscriptions.
Journal Article•10.1016/J.CSL.2009.12.005•
Automatic inference of complex affective states

[...]

Tal Sobol-Shikler1•
University of Cambridge1
01 Jan 2011-Computer Speech & Language
TL;DR: The results inferred from speech in both English and Hebrew, indicate that the vocal expressions of complex affective states such as thinking, certainty and interest transcend language boundaries.
Journal Article•10.1016/J.CSL.2010.10.002•
Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments

[...]

Xugang Lu1, Masashi Unoki2, Satoshi Nakamura1•
National Institute of Information and Communications Technology1, Japan Advanced Institute of Science and Technology2
01 Jul 2011-Computer Speech & Language
TL;DR: A robust feature-extraction method on the basis of the normalization of the sub-band temporal modulation envelopes (TMEs) was proposed, which was better than using other temporal filtering and normalization methods.
Journal Article•10.1016/J.CSL.2010.07.002•
Editorial: Special issue of computer speech and language on affective speech in real-life interactions

[...]

Laurence Devillers1, Nick Campbell2•
Centre national de la recherche scientifique1, Trinity College, Dublin2
01 Jan 2011-Computer Speech & Language
Journal Article•10.1016/J.CSL.2010.03.005•
The Vocal Joystick Engine v1.0

[...]

Jonathan Malkin1, Xiao Li1, Susumu Harada1, James A. Landay1, Jeff A. Bilmes1 •
University of Washington1
01 Jul 2011-Computer Speech & Language
TL;DR: A detailed view of the inner workings of the current version of the Vocal Joystick engine is presented, a real-time software library which can be used to map non-linguistic vocalizations into realizable continuous control signals.
Journal Article•10.1016/J.CSL.2010.04.003•
The user model-based summarize and refine approach improves information presentation in spoken dialog systems

[...]

Andi Winterboer1, Martin I. Tietze2, Maria Wolters2, Johanna D. Moore2•
University of Amsterdam1, University of Edinburgh2
01 Apr 2011-Computer Speech & Language
TL;DR: In both a laboratory experiment and a web-based experimental paradigm employing the Amazon Mechanical Turk platform, it is shown that the discourse cues in UMSR summaries help users compare different options and choose between options, even though they do not improve verbatim recall.
Journal Article•10.1016/J.CSL.2010.03.003•
Fiction support for realistic portrayals of fear-type emotional manifestations

[...]

C. Clavel, I. Vasilescu1, Laurence Devillers1•
University of Paris1
01 Jan 2011-Computer Speech & Language
TL;DR: It is proposed here to use fictional media to compensate for the difficulty of collecting strong emotions, and a fear-type emotion recognition system has been developed, that is based on acoustic models learnt from the fiction corpus.
Journal Article•10.1016/J.CSL.2010.05.008•
Usability assessment of text-to-speech synthesis for additional detail in an automated telephone banking system

[...]

Hazel Morton1, Nancie Gunson1, Diarmid Marshall1, Fergus McInnes1, Andrea Ayres2, Mervyn Jack1 •
University of Edinburgh1, Lloyds Banking Group2
01 Apr 2011-Computer Speech & Language
TL;DR: Results from the experiments show that participants welcome the added value of TTS in being able to provide additional detail on their account transactions, but that TTS should be used minimally in the service.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve