Simulating recognition errors in speech user interface prototyping

Open Access

Simulating recognition errors in speech user interface prototyping

- 01 Jan 2001

8

TL;DR: A Wizard of Oz simulation tool which allows scenario-based simulation of speech systems for the conduction of empirical studies with future users and considers the aspects of reliability and validity.

Abstract: We have developed a Wizard of Oz simulation tool which allows scenario-based simulation of speech systems for the conduction of empirical studies with future users. This paper focuses on the adequate integration of recognition errors as they are an important feature of speech-based applications. The presented solution considers the aspects of reliability and validity. Both are necessary preconditions for the immediate transferability of simulation results to the real system. 1. SPEECH USER INTERFACE PROTOTYPING In the field of GUI design it has become common practice to test usability in early development stages. By using paper prototypes important design decisions can be met on the empirical basis of tests with future users. In comparison to the vast amount of empirical studies and guidelines concerning the usability of GUIs, we know very little about how to design effective speech user interfaces (SUI). Moreover SUI designers face the essential difficulty of getting a sound feeling for the dialogue flow by merely inspecting a written dialogue specification. For these reasons it is even more important to include prototyping and usability testing early in the design process of user-friendly interactive voice response systems (IVR systems). The speech equivalent to a paper prototype is a Wizard of Oz (WOZ) study (Weinschenk & Barker, 2000), where a human (the wizard) simulates the role of the computer during testing and starts different recorded system prompts dependent on what the user said. Usability testing with the WOZ technique can lead to valuable results regarding the following topics: Designing a user-oriented grammar: In very early development stages WOZ studies can pinpoint the utterances which are typically used in order to control the available functions. Given a sufficient number of subjects the transcriptions of the test sessions can give a representative image of how users would expect the system to understand. The most frequently recorded utterances can serve as a valid basis for a user-centred grammar. This way, the timeconsuming procedure of pilot testing including iterative grammar modifications and recognition tuning can be shortened or even partially avoided (Pearl, 2000). Comparison of different systems / system versions: Alternative design decisions can quickly be acted out and tested with future users. Especially the different effects of alternative prompt versions on the users’ performance and attitudes towards the system can be evaluated. Overall ergonomic evaluation: WOZ experiments can take the traditional role of usability tests in evaluation and troubleshooting. The detection of major problems of use in an early development stage enables iterative redesign and reconception without the otherwise necessary phases of implementation. Necessary precondition for the validity of a WOZ study is that the interaction between user and “machine” (here the wizard) has to be as realistic as possible. Otherwise, the gained results cannot be transferred immediately to the real situation of system use. This means, that on one hand, the subject in a WOZ study must actually belief that she interacts with a real system, which is a matter of adequate instruction. On the other hand, the simulation must not differ from the specified system behaviour in essential aspects. Among others, this refers to the reliability of speech recognition which is treated in detail in the following section, and to the available complexity of the dialogue. With high complexity applications it is necessary to do scenario-based testing in order to reduce the amount of probable user utterances. This supports the wizard’s decision by giving a situation specific pre-selection of probable options for “system” reactions. 2. PROBLEM Speech technology is probabilistic in nature and therefore recognition errors are inherent in any speech-based application. Furthermore, situations of recognition errors are especially crucial to usability variables such as effectiveness and efficiency in task solving and user acceptance (Yankelovich, Levow & Marx, 1995). Therefore, it will be indispensable in most cases to carefully simulate error situations in WOZ studies in order to achieve data about questions like: How frustrating do users experience recognition errors in the application in question? Do the mechanisms of error management actually assist the users in correction? Do the users recognise the occurrence of an error at all? How should recognition errors be included into the simulation design? Even if you had in mind the whole grammar of the recognition system you would never be able to anticipate the system’s behaviour. This unpredictability of recognition errors is still increased if the IVR system is used from a cellular phone. Obviously, a simple rule-based model for simulation of recognition errors is not applicable. On the other hand, bare arbitrariness or intuition as basis for the wizard’s decisions will bias the test results. In order to ensure reliability and validity of a WOZ study the following aspects have to be taken into account: Realistic probabilities for recognition errors: A predefined and realistic probability for correct understandings, substitution errors and rejection errors is a precondition for a sound evaluation of the relevant usability criteria. And it allows controlled examination of the consequences of various confidence thresholds. The confidence threshold defines the minimal probability of correct classification needed to execute an action. Probabilities below the threshold lead to rejection usually accompanied by a prompt like ‘Sorry, I could not understand you. Please repeat.’ Necessary data stem from knowledge of the used recogniser and of relevant parameters of the used classification scheme. Standardised simulation: Without using automatic speech recognition it will never be possible to completely eliminate influences on simulation performance that arise from the wizard’s decisions. These influences cannot be held constant over time, different persons and situations. It is an important goal to achieve a maximum level of objectivity by reducing the possible options, the consequences and the need of human decisions to a manageable minimum. Only under comparable test conditions different systems or system versions and the performance of different user groups can be compared adequately. For the comparison of different prototype versions the simulated recognition performance should be balanced in order to avoid undesired side effects. Interactivity: Although standardisation is an important feature, especially in within-subjects designs of system comparison, interactivity is essential for the validity of the results. That means that, despite standardisation of the simulation system, responses must depend on what the user says. Strict balancing (i.e. constant predefined sequences of correct recognition, substitution error and rejection in both conditions) and randomising (i.e. constant predefined frequencies of correct recognition, substitution error and rejection in both conditions) do not consider occurring training effects in the users’ speech performance which are likely to support higher recognition rates in the version presented in the second position. One method to support the simulation of recognition errors is the use of filters, e.g. vocoders which distort the spoken input, in order to help the wizard perform to the system’s expected level (Bernsen, Dybkaer, and Dybkaer, 1998). Filters suffice the requirements of interactivity and standardisation. But it is questionable if they can support a realistic simulation of errors. Firstly, the relationship between the probability of recognition errors and the physical intensity of the filter is not straightforward and has to be investigated empirically before. Secondly, a deterministic filter that constantly distorts the input signal might be no appropriate model for a highly probabilistic process. Human speech performance is probabilistic. Even if two utterances sound completely identically for another person the acoustic signals will never be totally the same. Environmental noise, recording and transmission are also probabilistic factors that make it impossible to anticipate the acoustic quality of the system input signal. Finally, the procedure of recognition itself is probabilistic in nature as it follows a statistical classification scheme. 3. OUR APPROACH We have developed a software tool that supports WOZ simulations of IVR systems (see figure 1 for the GUI). 3.1 The WOZ-GUI Each button (except those for the scenario selection) on the simulation GUI stands for a set of user utterances, a specific subset of the grammar. For ease of use each button is labelled with the corresponding grammar or at least a part of it. The scenario-based approach makes it possible to simulate even highly complex applications. Any scenario consists of one or more pairs of user utterance and system prompt. When a scenario is started the main frame displays a matrix of buttons each representing an expected user utterance. For illustration, let us take a scenario which includes to call John Smith and after that to change his number entry in the telephone book. The first target user utterance is something like “I’d like to call John Smith” which is represented by the first button in the first column. The other buttons in the first line represent expected variations form the target utterance in this first sub-task, e.g. “I want to place a call” or “Go to telephone book” or “John Smith”. These utterances start other actions, e.g. feed-forward prompts that shall obtain the missing data in order to accomplish a transaction (e.g. “Whom would you like to call?”). When feed-forward prompts are played which are not part of the target path (the first column of the main frame) a child window is popped up displaying buttons representing possible user utterances in the actual sub-dialogu

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Developing Multimodal Spoken Dialogue Systems : Empirical Studies of Spoken Human–Computer Interaction

Joakim Gustafson

- 01 Jan 2002

TL;DR: This thesis presents work done during the last ten years on developing five multimodal spoken dialogue systems, and the empirical user studies that have been conducted with them.

...read moreread less

36

Journal Article•10.3233/TAD-2004-16105

Catering for the Disabled Surfer - A case study in Web Site Navigation for Disabled Students

Kevin Curran, +2 more

- 01 Jan 2004

- Information Technology and Disabilities

TL;DR: The technologies currently available for speech interaction with computers are reviewed and how the future of web navigation may benefit from these technologies are suggested.

...read moreread less

5

•Dissertation

Supporting Wizard of Oz experimentation for language technology applications

Stephan Schlogl

- 01 Jan 2013

3

Journal Article•10.1177/154193120504900505

The Effects of Service Availability and Recognition Errors on Trust in Voice User Interfaces

Carl W. Turner, +3 more

- 01 Sep 2005

TL;DR: The paper discusses the relationship between system trust and the willingness to use self-service systems, as well as special aspects of speech recognition systems in terms of “persona” and users' perceptions.

...read moreread less

3

Computers and Students and Adults Who Are Impaired.

Frank P. Belcastro

- 01 Jan 2006

TL;DR: Assistive devices and assistive software make it possible for the impaired to use computers with all of its processing programs and to access the Internet.

...read moreread less

1

References

•Book

Designing Interactive Speech Systems: From First Ideas to User Testing

Nielsole Ole Bernsen, +1 more

- 09 Apr 1998

TL;DR: This book discusses how to develop Intelligent Multimodal Systems Using Advanced Interactive Speech Systems using Wizard of Oz Simulations and Guidelines for Co-operative Interaction Design.

...read moreread less

216

•Book

Designing effective speech interfaces

Susan Weinschenk, +1 more

- 01 Jan 2000

TL;DR: This book discusses speech technology, interface design, and human factors in speech technology from a standpoint of science,UX, and usability.

...read moreread less

104

Journal Article•10.1145/348941.348974

Natural spoken dialogue systems for telephony applications

Susan J. Boyce

- 01 Sep 2000

- Communications of The ACM

TL;DR: If it becomes possible to build computers that respond to fluent natural language and gracefully recover from errors, the users’ view of the computer as a social entity is likely to change because the act of using natural speech as the input mechanism makes the computer seem more human-like.

...read moreread less

44

Proceedings Article•10.1145/223904.223952

Designing SpeechActs: issues in speech user interfaces

Nicole Yankelovich, +2 more

- 01 May 1995

TL;DR: A set of challenging issues facing speech interface designers is examined and approaches to address some of these challenges are described, including adhering to conversational conventions.

...read moreread less