Yi-Ting Chen
15 Papers
56 Citations
Yi-Ting Chen is an academic researcher from Google. The author has contributed to research in topics: Computer science & Embedding. The author has an hindex of 4, co-authored 6 publications.
Chat about Author
Papers
•Posted Content
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia,Yinfei Yang,Ye Xia,Yi-Ting Chen,Zarana Parekh,Hieu Pham,Quoc V. Le,Yun-Hsuan Sung,Zhen Li,Tom Duerig +9 more
TL;DR: In this article, a simple dual-encoder architecture is proposed to align visual and language representations of the image and text pairs using a contrastive loss. But the authors show that the scale of their corpus can make up for its noise and leads to state-of-the-art representations even with a simple learning scheme.
690
•Posted Content
Graph-RISE: Graph-Regularized Image Semantic Embedding
Aleksei Timofeev,Andrew Tomkins,Chun-Ta Lu,Da-Cheng Juan,Futang Peng,Krishnamurthy Viswanathan,Lucy Gao,Sujith Ravi,Tom Duerig,Yi-Ting Chen,Zhen Li +10 more
TL;DR: A large-scale neural graph learning framework that allows embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels, Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking.
Shape-aware Text-driven Layered Video Editing
TL;DR: The authors propagate the deformation field between the input and edited keyframe to all frames and leverage a pre-trained text-conditioned diffusion model as guidance for refining shape distortion and completing unseen regions.
30
•Proceedings Article
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia,Yinfei Yang,Ye Xia,Yi-Ting Chen,Zarana Parekh,Hieu Pham,Quoc V. Le,Yun-Hsuan Sung,Zhen Li,Tom Duerig +9 more
- 18 Jul 2021
TL;DR: In this paper, a dual-encoder architecture is proposed to align visual and language representations of the image and text pairs using a contrastive loss, which achieves state-of-the-art results on the Conceptual Captions dataset.
Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings
I-Chun Chern,Kuo-Hsuan Hung,Yi-Ting Chen,Tassadaq Hussain,Mandar Gogate,Amir Hussain,Yu Tsao,Jen-Cheng Hou +7 more
TL;DR: In this article , a multi-modal self-supervised learning model is proposed for audio-visual speech enhancement (AVSE) and audiovisual speech separation (AVSS).
6