Youngjae Yu
12 Papers
Youngjae Yu is an academic researcher. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 4, co-authored 10 publications.
Chat about Author
Papers
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
Wangrong Zhu,Jack Hessel,Anas Awadalla,Samir Yitzhak Gadre,Jesse Dodge,Alex Fang,Youngjae Yu,Ludwig Schmidt,William Yang Wang,Yejin Choi +9 more
TL;DR: Multimodal C4 as mentioned in this paper is an augmentation of the text-only C4 corpus with images interleaved, which uses a linear assignment algorithm to place images into longer bodies of text using CLIP features, a process that outperforms alternatives.
90
Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step
Liunian Li,Jack Hessel,Youngjae Yu,Xiangyuan Ren,Kai-Wei Chang,Yejin Choi +5 more
- 24 Jun 2023
TL;DR: The authors introduce symbolic chain-of-thought distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model, which improves the performance of the student model in both supervised and few-shot settings.
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu,Jiwan Chung,Heeseung Yun,Jack Hessel,J. Park,Ximing Lu,Prithviraj Ammanabrolu,Rowan Zellers,Ronan LeBras,Gunhee Kim,Yejin Choi +10 more
TL;DR: This work proposes ESPER, a novel approach to reinforcement learning which extends language-only zero-shot models to unseen multimodal tasks, like image and audio captioning, and demonstrates that it outperforms baselines and prior work on a variety of zero- shot tasks.
Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning
Youngjae Yu,Jiwan Chung,Heeseung Yun,Jack Hessel,J. Park,Ximing Lu,Rowan Zellers,Prithviraj Ammanabrolu,Ronan LeBras,Gunhee Kim,Yejin Choi +10 more
- 01 Jun 2023
TL;DR: This work proposes ‡ESPER (Extending Sensory PErception with Reinforcement learning) which enables text-only pretrained models to address multimodal tasks such as visual commonsense reasoning.
13
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
TL;DR: In this paper , a generative model of conversations that can account for visual contexts is introduced. But this model is limited to just text and it cannot handle body gestures and facial expressions, which contribute to meaning that transcends words alone.