Shinji Watanabe
Johns Hopkins University
543 Papers
2.9K Citations
Shinji Watanabe is an academic researcher from Johns Hopkins University. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 53, co-authored 383 publications. Previous affiliations of Shinji Watanabe include Mitsubishi Electric Research Laboratories & Mitsubishi.
Chat about Author
Papers
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
TL;DR: In this article , two multi-task learning methods based on the CTC/Attention architecture were introduced to perform both tasks simultaneously. And they achieved state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients.
Tensor decomposition for minimization of E2E SLU model toward on-device processing
Yosuke Kashiwagi,Siddhant Arora,Hayato Futami,Jessica Huynh,Shih-Lun Wu,Yifan Peng,Brian Yan,Emiru Tsunoo,Shinji Watanabe +8 more
- 02 Jun 2023
TL;DR: In this paper , tensor decomposition was applied to the Conformer and E-Branchformer architectures in an end-to-end (E2E) SLU model to reduce the model size.
Bag Of ARCS: New representation of speech segment features based on finite state machines
Shinji Watanabe,Yotaro Kubo,Takanobu Oba,Takaaki Hori,Atsushi Nakamura +4 more
- 25 Mar 2012
TL;DR: The effectiveness of the proposed approach for some ASR post-processing applications in utterance classification experiments, and in speaker adaptation experiments, is shown by achieving absolute 1% improvement in WER from baseline results.
•Posted Content
A practical two-stage training strategy for multi-stream end-to-end speech recognition
TL;DR: In this article, a two-stage training scheme is proposed to train a Universal Feature Extractor (UFE), where encoder outputs are produced from a single-stream model trained with all data.
•Posted Content
Multi-mode Transformer Transducer with Stochastic Future Context
TL;DR: In this article, the authors propose a multi-mode ASR model with stochastic future context, a simple training procedure that samples one streaming configuration in each iteration, which can fulfill various latency requirements during inference.