Bicheng Xu
University of British Columbia
6 Papers
Bicheng Xu is an academic researcher from University of British Columbia. The author has contributed to research in topics: Computer science & Closed captioning. The author has an hindex of 2, co-authored 3 publications.
Chat about Author
Papers
Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning
Tanzila Rahman,Bicheng Xu,Leonid Sigal +2 more
- 01 Oct 2019
TL;DR: There is evidence, that audio signals can carry surprising amount of information when it comes to high-level visual-lingual tasks, and the proposed multi-modal approach outperforms state-of-the-art unimodal methods, as well as validate specific feature representation and architecture design choices.
•Posted Content
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning.
TL;DR: In this article, audio signals can carry surprising amount of information when it comes to high-level visual-lingual tasks, such as weakly-supervised dense event captioning in videos.
61
Self-Supervised Relation Alignment for Scene Graph Generation
TL;DR: In this paper , a self-supervised relational alignment regularization is proposed to improve the performance of scene graph generation, where an auxiliary relation prediction branch, that mirrors and shares parameters with the supervised counterpart, is designed.
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Apratim Bhattacharyya,Bicheng Xu,Sanjay Haresh,Reza Pourreza,Litian Liu,Sunny Panchal,Pulkit Madan,Leonid Sigal,Roland Memisevic +8 more
TL;DR: This study introduces Qualcomm Interactive Cooking, a benchmark and dataset for evaluating multi-modal LLMs in providing live, interactive step-by-step task guidance, and proposes LiveMamba, a streaming LLM for situated coaching, addressing real-time instruction and mistake detection.
Consistent multiple sequence decoding
Bicheng Xu,Leonid Sigal +1 more
TL;DR: A consistent multiple sequence decoding architecture is introduced, which is while relatively simple, is general and allows for consistent and simultaneous decoding of an arbitrary number of sequences.