Imperfect transcript driven speech recognition

Open AccessProceedings Article

Imperfect transcript driven speech recognition

- 01 Sep 2006

41

TL;DR: A method combining a linguistic analysis of the imperfect transcripts and a dynamic synchronization of these transcripts inside the search algorithm to improve the performance of an automatic speech recognition (ASR) system is proposed.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.21437/INTERSPEECH.2011-604

Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions

Benjamin Lecouteux, +2 more

- 27 Aug 2011

TL;DR: Several state-of-the-art and novel ASR techniques were evaluated on realistic data acquired in a multiroom smart home and found techniques acting at the decoding stage and using a priori knowledge such as DDA give better results than the baseline.

...read moreread less

71

•Book Chapter•10.1007/978-3-642-34898-3_14

Sound Environment Analysis in Smart Home

Mohamed El Amine Sehili, +6 more

- 13 Nov 2012

TL;DR: This study aims at providing audio-based interaction technology that lets the users have full control over their home environment, at detecting distress situations and at easing the social inclusion of the elderly and frail population.

...read moreread less

47

•Proceedings Article

Automatic Transcription of Multi-Genre Media Archives

Pierre Lanchantin, +11 more

- 01 Jan 2013

TL;DR: Multi-level Adaptive Networks, a novel technique for incorporating information from out-of domain posterior features using deep neural network, is described, which provides a substantial reduction in WER over other systems including a PLP-based baseline, in- domain tandem features, and the best out- of-domain tandem features.

...read moreread less

35

Proceedings Article•10.21437/INTERSPEECH.2017-1385

Semi-Supervised DNN Training with Word Selection for ASR.

Karel Veselý, +2 more

- 20 Aug 2017

TL;DR: The question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights).

...read moreread less

34

Proceedings Article•10.21437/INTERSPEECH.2016-462

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems.

Pierre Lanchantin, +7 more

- 08 Sep 2016

TL;DR: It is shown that for different genres, either the original subtitles or the lightly supervised output should be used for model training and a suitable combination yields further reductions in final WER.

...read moreread less

26

...

Expand

References

•Proceedings Article

Using dynamic time warping to find patterns in time series

Donald J. Berndt, +1 more

- 31 Jul 1994

TL;DR: Preliminary experiments with a dynamic programming approach to pattern detection in databases, based on the dynamic time warping technique used in the speech recognition field, are described.

...read moreread less

3.8K

•Proceedings Article•10.21437/INTERSPEECH.2005-441

The ESTER Phase II Evaluation Campaign for the Rich Transcription of French Broadcast News

Sylvain Galliano, +5 more

- 04 Sep 2005

TL;DR: This paper gives the final results of the ESTER evaluation campaign which started in 2003 and ended in January 2005, to evaluate automatic broadcast news rich transcription systems for the French language.

...read moreread less

331

•Proceedings Article

A recursive algorithm for the forced alignment of very long audio segments.

Pedro J. Moreno, +3 more

- 01 Jan 1998

TL;DR: The key idea of this algorithm is to turn the forced alignment problem into a recursive speech recognition problem with a gradually restricting dictionary and language model, which is tolerant to acoustic noise and errors or gaps in the text transcript or audio tracks.

...read moreread less

183

Book Chapter•10.1007/3-540-46154-X_41

Phoneme Lattice Based A* Search Algorithm for Speech Recognition

Pascal Nocera, +3 more

- 09 Sep 2002

TL;DR: Speeral uses a modified A* algorithm to find in the search graph the best path taking into account acoustic and linguistic constraints, rather than words by words, the A* used in Speeral is based on a phoneme lattice previously generated.

...read moreread less

45

Proceedings Article•10.1109/MMCS.1999.778582

Improving acoustic models with captioned multimedia speech

Photina Jaeyun Jang, +1 more

- 07 Jun 1999

TL;DR: A technique to use television broadcasts with closed-captions as a source for large amounts of automatically extracted and accurately transcribed speech for improving acoustic models of highly accurate speech recognition systems.

...read moreread less

45