Top 23 Computer Speech & Language papers published in 2002

TL;DR: WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs, and general transducer operations combine these representations flexibly and efficiently.

...read moreread less

1,199 citations

Journal Article•10.1006/CSLA.2001.0182•

Large scale discriminative training of hidden Markov models for speech recognition

[...]

Philip C. Woodland¹, Daniel Povey¹•Institutions (1)

University of Cambridge¹

Massachusetts Institute of Technology¹

TL;DR: It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR), which has allowed the straightforward integration of MMIe- trained HMMs into complex multi-pass systems for transcription of conversational telephone speech.

...read moreread less

396 citations

Journal Article•10.1006/CSLA.2001.0183•

Recognition confidence scoring and its use in speech understanding systems

[...]

Timothy J. Hazen¹, Stephanie Seneff¹, Joseph Polifroni¹•Institutions (1)

TL;DR: This paper presents an approach to recognition confidence scoring and a set of techniques for integrating confidence scores into the understanding and dialogue components of a speech understanding system and demonstrates a relative reduction in concept error rate.

...read moreread less

198 citations

Journal Article•10.1016/S0885-2308(02)00027-X•

Training a sentence planner for spoken dialogue using boosting

[...]

Marilyn A. Walker¹, Owen Rambow², Monica Rogati³•Institutions (3)

AT&T Labs¹, University of Pennsylvania², Carnegie Mellon University³

Massachusetts Institute of Technology¹

TL;DR: SPoT, a trainable sentence planner, and a new methodology for automatically training SPoT on the basis of feedback provided by human judges, which shows that SPiT performs better than the rule-based systems and the baselines, and as well as the hand-crafted system.

...read moreread less

99 citations

Journal Article•10.1016/S0885-2308(02)00011-6•

Response planning and generation in the mercury flight reservation system

[...]

Stephanie Seneff¹•Institutions (1)

TL;DR: There is a direct meaning-to-speech mapping that eliminates the need to analyze linguistic structure for synthesis in the mercury flight reservation system, a mixed-initiative spoken dialogue system that supports both voice-only interaction and multi-modal interaction augmenting spoken inputs with typing or clicking at a displayed Web page.

...read moreread less

90 citations

Journal Article•10.1016/S0885-2308(02)00025-6•

Trainable approaches to surface natural language generation and their application to conversational dialog systems

[...]

Adwait Ratnaparkhi

TL;DR: How decisions for word ordering and word choice in surface natural language generation can be automatically learned from annotated data is studied to find the highest probability word sequence that is consistent with the rules and conditions of the grammar.

...read moreread less

67 citations

Journal Article•10.1006/CSLA.2001.0188•

Theory and practice of acoustic confusability

[...]

Harry Printz, Peder A. Olsen¹•Institutions (1)

IBM¹

TL;DR: This paper defines two alternatives to the familiar perplexity statistic, respectively acoustic perplexity and the synthetic acoustic word error rate, and shows how to compute these statistics by effectively synthesizing a large acoustic corpus.

...read moreread less

62 citations

Journal Article•10.1006/CSLA.2002.0192•

From within-word model search to across-word model search in large vocabulary continuous speech recognition

[...]

Achim Sixtus¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

TL;DR: This paper reports on the application of across-word context dependent acoustic phoneme models in a single-pass large vocabulary continuous speech recognizer and derives a formal specification ofAcross-word word graphs, which are a good representation of the active search space.

...read moreread less

36 citations

Journal Article•10.1016/S0885-2308(02)00009-8•

A conversation acts model for generating spoken dialogue contributions

[...]

Amanda Stent¹•Institutions (1)

Stony Brook University¹

University of Washington¹

TL;DR: A generation system for spoken dialogue that not only produces coherent, informative and responsive dialogue contributions, but also explicitly models human styles of interaction is described.

...read moreread less

35 citations

Journal Article•10.1016/S0885-2308(02)00023-2•

Efficient integrated response generation from multiple targets using weighted finite state transducers

[...]

Ivan Bulyko¹, Mari Ostendorf¹•Institutions (1)

TL;DR: This paper describes how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture and shows that introducing flexible targets in generation leads to more natural sounding synthesis.

...read moreread less

34 citations

Journal Article•10.1016/S0885-2308(02)00029-3•

Spoken language generation

[...]

Marilyn A. Walker¹, Owen Rambow²•Institutions (2)

AT&T Labs¹, Columbia University²

TL;DR: This research is motivated by several goals: improving the quality of synthesis by using the generator to provide information about the purpose, meaning, and linguistic structure of the utterance to the synthesis process, and making it possible to customize systems that generate spoken language to individual or sets of users or new domains very quickly.

...read moreread less

Journal Article•10.1016/S0885-2308(02)00022-0•

Exploring features from natural language generation for prosody modeling

[...]

Shimei Pan¹, Kathleen R. McKeown², Julia Hirschberg³•Institutions (3)

IBM¹, Columbia University², AT&T Labs³

University College London¹

TL;DR: Three groups of features are investigated: semantic, syntactic, and surface features produced by SURGE, a general-purpose surface natural language generator for English, deep semantic, and discourse features that are available during the domain modeling and content planning phases of generation, and information-based measures statistically derived from text.

...read moreread less

Journal Article•10.1006/CSLA.2001.0187•

Using phonologically-constrained morphological analysis in continuous speech recognition

[...]

Mark Huckvale¹, Alex Chengyu Fang¹•Institutions (1)

Technical University of Madrid¹

TL;DR: Investigation into the use of phonologically-constrained morphological analysis (PCMA) in language modelling for continuous speech recognition shows that PCMA leads to smaller but more generative pronunciation lexicons, and that it does not weaken the quality of the acoustic decoding measured in terms of recognition lattices.

...read moreread less

Journal Article•10.1006/CSLA.2002.0190•

Selection of the most significant parameters for duration modelling in a Spanish text-to-speech system using neural networks

[...]

Ricardo de Córdoba¹, Juan Manuel Montero¹, Juana M. Gutiérrez¹, José A. Vallejo¹, Emilia Enríquez¹, José Manuel Pardo¹ - Show less +2 more•Institutions (1)

TL;DR: A neural network system that predicts duration with very good results (19 ms in RMS) and that clearly improves the previous rule-based system.

...read moreread less

Journal Article•10.1016/S0885-2308(02)00010-4•

Contrast in concept-to-speech generation

[...]

Mariët Theune¹•Institutions (1)

University of Twente¹

TL;DR: How information from natural language generation can be used to compute prosody in a concept-to-speech system, focusing on the automatic marking of contrastive accents on the basis of information about the preceding discourse, is discussed and compared.

...read moreread less

Journal Article•10.1006/CSLA.2002.0193•

Transformation streams and the HMM error model

[...]

Mark J. F. Gales¹•Institutions (1)

University of Cambridge¹

Jean-Luc Gauvain, Renato De Mori, Lori Lamel

TL;DR: A new form of factorial HMM which makes use of transformation streams is introduced which is a generalization of the standard factorialHMM and other related schemes in speech processing.

...read moreread less

Journal Article•10.1006/CSLA.2001.0189•

Advances in Large Vocabulary Speech Recognition

[...]

Carnegie Mellon University¹

Journal Article•10.1016/S0885-2308(02)00012-8•

Stochastic natural language generation for spoken dialog systems

[...]

Alice Oh¹, Alexander I. Rudnicky¹•Institutions (1)

TL;DR: It is shown that a simple statistical model alone can generate appropriate language for a spoken dialog system, and a promising avenue for using a statistical approach in future NLG systems is described.

...read moreread less

Journal Article•10.1006/CSLA.2001.0185•

An overview of decoding techniques for large vocabulary continuous speech recognition

[...]

Xavier L. Aubert¹•Institutions (1)

Philips¹

Marco Matassoni, Maurizio Omologo, Diego Giuliani, Piergiorgio Svaizer

TL;DR: A number of decoding strategies for large vocabulary continuous speech recognition (LVCSR) are examined from the viewpoint of their search space representation, and the main approaches are compared and some prospective views are formulated regarding possible future avenues.

...read moreread less

Journal Article•10.1006/CSLA.2002.0191•

Hidden Markov model training with contaminated speech material for distant-talking speech recognition

[...]

Massachusetts Institute of Technology¹

TL;DR: Improvements in recognition accuracy due to multiple microphones, HMM training on contaminated speech and incremental adaptation are additive on a connected digits task and the results show that unsupervised incremental adaptation receives the benefits of starting from models trained using contaminated speech.

...read moreread less

Journal Article•10.1016/S0885-2308(02)00024-4•

Learning visually grounded words and syntax for a scene description task

[...]

Deb Roy¹•Institutions (1)

Centre national de la recherche scientifique¹

TL;DR: A spoken language generation system that learns to describe objects in computer-generated visual scenes and generates syntactically well-formed compound adjective noun phrases, as well as relative spatial clauses was comparable to human-generated descriptions.

...read moreread less

Journal Article•10.1006/CSLA.2001.0186•

Lightly supervised and unsupervised acoustic model training

[...]

Lori Lamel¹, Jean-Luc Gauvain¹, Gilles Adda¹•Institutions (1)

TL;DR: Experiments providing supervision only via the language model training materials show that including texts which are contemporaneous with the audio data is not crucial for success of the approach, and that the acoustic models can be initialized with as little as 10 min of manually annotated data.

...read moreread less

Journal Article•10.1006/CSLA.2001.0181•

Structural maximum a posteriori linear regression for fast HMM adaptation

[...]

Olivier Siohan¹, Tor André Myrvoll¹, Chin-Hui Lee¹•Institutions (1)

Alcatel-Lucent¹