Open AccessProceedings Article
Imperfect transcript driven speech recognition
Benjamin Lecouteux,Georges Linarès,Pascal Nocera,Jean-François Bonastre +3 more
- 01 Sep 2006
TL;DR: A method combining a linguistic analysis of the imperfect transcripts and a dynamic synchronization of these transcripts inside the search algorithm to improve the performance of an automatic speech recognition (ASR) system is proposed.
read more
Abstract: In many cases, textual information can be associated with speech signals such as movie subtitles, theater scenarios, broadcast news summaries etc. This information could be considered as approximated transcripts and corresponds rarely to the exact word utterances. The goal of this work is to use this kind of information to improve the performance of an automatic speech recognition (ASR) system. Multiple applications are possible: to follow a play with closed caption aligned to the voice signal (while respecting to performer variations) to help deaf people, to watch a movie in another language using aligned and corrected closed captions, etc. We propose in this paper a method combining a linguistic analysis of the imperfect transcripts and a dynamic synchronization of these transcripts inside the search algorithm. The proposed technique is based on language model adaptation and on-line synchronization of the search algorithm. Experiments are carried out on an extract of the ESTER evaluation campaign [4] database, using the LIA Broadcast News system. The results show that the transcript-driven system outperforms significantly both the original recognizer and the imperfect transcript itself.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions
Benjamin Lecouteux,Michel Vacher,François Portet +2 more
- 27 Aug 2011
TL;DR: Several state-of-the-art and novel ASR techniques were evaluated on realistic data acquired in a multiroom smart home and found techniques acting at the decoding stage and using a priori knowledge such as DDA give better results than the baseline.
Sound Environment Analysis in Smart Home
Mohamed El Amine Sehili,Benjamin Lecouteux,Michel Vacher,François Portet,Dan Istrate,Bernadette Dorizzi,Jerome Boudy +6 more
- 13 Nov 2012
TL;DR: This study aims at providing audio-based interaction technology that lets the users have full control over their home environment, at detecting distress situations and at easing the social inclusion of the elderly and frail population.
•Proceedings Article
Automatic Transcription of Multi-Genre Media Archives
Pierre Lanchantin,Peter Bell,Mark J. F. Gales,Thomas Hain,Xunying Liu,Yanhua Long,Jennifer Quinnell,Steve Renals,Oscar Saz,Matthew Stephen Seigel,Pawel Swietojanski,Philip C. Woodland +11 more
- 01 Jan 2013
TL;DR: Multi-level Adaptive Networks, a novel technique for incorporating information from out-of domain posterior features using deep neural network, is described, which provides a substantial reduction in WER over other systems including a PLP-based baseline, in- domain tandem features, and the best out- of-domain tandem features.
Semi-Supervised DNN Training with Word Selection for ASR.
Karel Veselý,Lukas Burget,Jan Cernocký +2 more
- 20 Aug 2017
TL;DR: The question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights).
34
Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems.
Pierre Lanchantin,Mark J. F. Gales,Penny Karanasou,Xunying Liu,Yanman Qian,Linlin Wang,Philip C. Woodland,Chao Zhang +7 more
- 08 Sep 2016
TL;DR: It is shown that for different genres, either the original subtitles or the lightly supervised output should be used for model training and a suitable combination yields further reductions in final WER.
26
References
•Proceedings Article
Using dynamic time warping to find patterns in time series
Donald J. Berndt,James Clifford +1 more
- 31 Jul 1994
TL;DR: Preliminary experiments with a dynamic programming approach to pattern detection in databases, based on the dynamic time warping technique used in the speech recognition field, are described.
3.8K
The ESTER Phase II Evaluation Campaign for the Rich Transcription of French Broadcast News
Sylvain Galliano,Edouard Geoffrois,Djamel Mostefa,Khalid Choukri,Jean-François Bonastre,Guillaume Gravier +5 more
- 04 Sep 2005
TL;DR: This paper gives the final results of the ESTER evaluation campaign which started in 2003 and ended in January 2005, to evaluate automatic broadcast news rich transcription systems for the French language.
•Proceedings Article
A recursive algorithm for the forced alignment of very long audio segments.
Pedro J. Moreno,Christopher Frank Joerg,Jean-Manuel Van Thong,Oren Glickman +3 more
- 01 Jan 1998
TL;DR: The key idea of this algorithm is to turn the forced alignment problem into a recursive speech recognition problem with a gradually restricting dictionary and language model, which is tolerant to acoustic noise and errors or gaps in the text transcript or audio tracks.
183
Phoneme Lattice Based A* Search Algorithm for Speech Recognition
Pascal Nocera,Georges Linarès,D Massonié,Loïc Lefort +3 more
- 09 Sep 2002
TL;DR: Speeral uses a modified A* algorithm to find in the search graph the best path taking into account acoustic and linguistic constraints, rather than words by words, the A* used in Speeral is based on a phoneme lattice previously generated.
45
Improving acoustic models with captioned multimedia speech
Photina Jaeyun Jang,Alexander G. Hauptmann +1 more
- 07 Jun 1999
TL;DR: A technique to use television broadcasts with closed-captions as a source for large amounts of automatically extracted and accurately transcribed speech for improving acoustic models of highly accurate speech recognition systems.
45