TL;DR: In this paper, a system and method for universal access to voice-based documents containing information formatted using MIME and HTML standards using customized extensions for voice information access and navigation is presented.
Abstract: A system and method provides universal access to voice-based documents containing information formatted using MIME and HTML standards using customized extensions for voice information access and navigation. These voice documents are linked using HTML hyper-links that are accessible to subscribers using voice commands, touch-tone inputs and other selection means. These voice documents and components in them are addressable using HTML anchors embedding HTML universal resource locators (URLs) rendering them universally accessible over the Internet. This collection of connected documents forms a voice web. The voice web includes subscriber-specific documents including speech training files for speaker dependent speech recognition, voice print files for authenticating the identity of a user and personal preference and attribute files for customizing other aspects of the system in accordance with a specific subscriber.
TL;DR: In this article, a parser unit is communicatively coupled to the network fetcher to parse the retrieved information based on predetermined syntax and an interpreter unit and a state machine are also used.
Abstract: A voice browser to process a markup language document. A voice browser includes a network fetcher unit to retrieve information from a destination of an information source. A parser unit is communicatively coupled to the network fetcher to parse the retrieved information based on predetermined syntax. The parser unit generates a tree structure representing the hierarchy of the retrieved information. An interpreter unit and a state machine are also used. The method includes the steps of retrieving and parsing a markup language document to determine at least one user input, determining whether the user input corresponds to a predetermined grammar, and using the predetermined grammar when the user input corresponds to the predetermined grammar. The method of determining a grammar is based upon phonetic rules and pronunciation. The grammar is sent to a speech recognition engine and compared to a user input.
TL;DR: In this paper, a highly distributed, scalable, and efficient voice browser system provides the ability to seamlessly integrate a variety of audio into the system in a unified manner, such as audio advertisements recorded by sponsors, audio data collected by broadcast groups, and text to speech generated audio.
Abstract: A highly distributed, scalable, and efficient voice browser system provides the ability to seamlessly integrate a variety of audio into the system in a unified manner. The audio rendered to the user comes from various sources, such as, for example, audio advertisements recorded by sponsors, audio data collected by broadcast groups, and text to speech generated audio. In an embodiment, voice browser architecture integrates a variety of components including: various telephony platforms (e.g. PSTN, VOIP), scalable architecture, rapid context switching, and backend web content integration and provides access to information audibly.
TL;DR: In this paper, a distributed voice applications system includes a voice applications rendering agent and at least one voice applications agent that is configured to provide voice applications to an individual user, based on user characteristics, information about the environment in which the voice applications will be performed, prior user interactions and other information.
Abstract: A distributed voice applications system includes a voice applications rendering agent and at least one voice applications agent that is configured to provide voice applications to an individual user. A management system may control and direct the voice applications rendering agent to create voice applications that are personalized for individual users based on user characteristics, information about the environment in which the voice applications will be performed, prior user interactions and other information. The voice applications agent and components of customized voice applications may be resident on a local user device which includes a voice browser and speech recognition capabilities. The local device, voice applications rendering agent and management system may be interconnected via a communications network.
TL;DR: In this paper, a multimodal browser for rendering a multi-modal document on an end system defining a host can include a visual browser component for rendering visual content, if any, of the multimodi-al document, and a voice browser component, which can determine which of a plurality of speech processing configuration is used by the host in rendering the voice-based content.
Abstract: A multimodal browser for rendering a multimodal document on an end system defining a host can include a visual browser component for rendering visual content, if any, of the multimodal document, and a voice browser component for rendering voice-based content, if any, of the multimodal document. The voice browser component can determine which of a plurality of speech processing configuration is used by the host in rendering the voice-based content. The determination can be based upon the resources of the host running the application. The determination also can be based upon a processing instruction contained in the application.