Youngjae Yu

12 Papers

Youngjae Yu is an academic researcher. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 4, co-authored 10 publications.

Author Tools

Create citation map

Create Author Profile

Analyze Youngjae Yu's Top Papers

Chat about Author

Papers

Journal Article•10.48550/arXiv.2304.06939

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Wangrong Zhu, +9 more

- 14 Apr 2023

- arXiv.org

TL;DR: Multimodal C4 as mentioned in this paper is an augmentation of the text-only C4 corpus with images interleaved, which uses a linear assignment algorithm to place images into longer bodies of text using CLIP features, a process that outperforms alternatives.

...read moreread less

Proceedings Article•10.48550/arXiv.2306.14050

Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step

Liunian Li, +5 more

- 24 Jun 2023

TL;DR: The authors introduce symbolic chain-of-thought distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model, which improves the performance of the student model in both supervised and few-shot settings.

...read moreread less

Journal Article•10.48550/arXiv.2205.12630

Multimodal Knowledge Alignment with Reinforcement Learning

Youngjae Yu, +10 more

- 25 May 2022

- arXiv.org

TL;DR: This work proposes ESPER, a novel approach to reinforcement learning which extends language-only zero-shot models to unseen multimodal tasks, like image and audio captioning, and demonstrates that it outperforms baselines and prior work on a variety of zero- shot tasks.

...read moreread less

Journal Article•10.1109/cvpr52729.2023.01044

Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning

Youngjae Yu, +10 more

- 01 Jun 2023

TL;DR: This work proposes ‡ESPER (Extending Sensory PErception with Reinforcement learning) which enables text-only pretrained models to address multimodal tasks such as visual commonsense reasoning.

...read moreread less

Journal Article•10.48550/arXiv.2303.09713

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Seungju Han, +4 more

- 17 Mar 2023

- arXiv.org

TL;DR: In this paper , a generative model of conversations that can account for visual contexts is introduced. But this model is limited to just text and it cannot handle body gestures and facial expressions, which contribute to meaning that transcends words alone.

...read moreread less