Dara Bahri
38 Papers
195 Citations
Dara Bahri is an academic researcher from Google. The author has contributed to research in topics: Computer science & Transformer (machine learning model). The author has an hindex of 11, co-authored 38 publications. Previous affiliations of Dara Bahri include University of California, Berkeley & Lawrence Berkeley National Laboratory.
Chat about Author
Papers
•Posted Content
Efficient Transformers: A Survey
TL;DR: This paper characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains.
•Posted Content
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay,Mostafa Dehghani,Samira Abnar,Yikang Shen,Dara Bahri,Philip Pham,Jinfeng Rao,Liu Yang,Sebastian Ruder,Donald Metzler +9 more
TL;DR: A systematic and unified benchmark, LRA, specifically focused on evaluating model quality under long-context scenarios, paves the way towards better understanding this class of efficient Transformer models, facilitates more research in this direction, and presents new challenging tasks to tackle.
379
•Posted Content
Synthesizer: Rethinking Self-Attention in Transformer Models
TL;DR: The true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models is investigated and a model that learns synthetic attention weights without token-token interactions is proposed, called Synthesizer.
•Proceedings Article
Sparse Sinkhorn Attention
Yi Tay,Dara Bahri,Liu Yang,Donald Metzler,Da-Cheng Juan +4 more
- 12 Jul 2020
TL;DR: This work introduces a meta sorting network that learns to generate latent permutations over sequences and is able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module.
•Posted Content
Sparse Sinkhorn Attention
TL;DR: The authors propose a meta sorting network that learns to generate latent permutations over sequences, which is then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module.
79