Haodong Duan

38 Papers

2 Citations

Haodong Duan is an academic researcher. The author has contributed to research in topics: Computer science & Pattern recognition (psychology). The author has an hindex of 1, co-authored 1 publications.

Author Tools

Create citation map

Create Author Profile

Analyze Haodong Duan's Top Papers

Chat about Author

Papers

MMBench: Is Your Multi-modal Model an All-around Player?

Yuan Liu, +11 more

- 12 Jul 2023

TL;DR: MMBench as discussed by the authors is a multi-modality benchmark for large vision-language models, which is designed to evaluate the ability of large-scale vision language models with a large number of evaluation questions and abilities.

...read moreread less

361

Journal Article•10.48550/arxiv.2309.15112

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Pan Zhang, +18 more

- 26 Sep 2023

- arXiv.org

TL;DR: This work proposes InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition that achieves competitive text-image composition scores compared to public solutions, including GPT4-V and GPT3.5.

...read moreread less

130

•Proceedings Article•10.1145/3503161.3548546

PYSKL: Towards Good Practices for Skeleton Action Recognition

Haodong Duan, +3 more

- 19 May 2022

TL;DR: PYSKL implements six different algorithms under a unified framework with both the latest and original good practices to ease the comparison of efficacy and efficiency and provides an original GCN-based skeleton action recognition model named ST-GCN++, which achieves competitive recognition performance without any complicated attention schemes.

...read moreread less

126

Journal Article•10.48550/arxiv.2401.16420

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Xiao-wen Dong, +22 more

- 29 Jan 2024

- arXiv.org

TL;DR: Experimental results demonstrate the superiority of InternLM-XComposer2 based on InternLM2-7B in producing high-quality long-text multi-modal content and its exceptional vision-language understanding performance across various benchmarks, where it not only significantly outperforms existing multimodal models but also matches or even surpasses GPT-4V and Gemini Pro in certain assessments.

...read moreread less

111

Journal Article•10.48550/arxiv.2403.17297

InternLM2 Technical Report

Zheng Cai, +99 more

- 26 Mar 2024

- arXiv.org

TL;DR: InternLM2 is an open-source LLM that outperforms its predecessors in comprehensive evaluations across various tasks, including long-context modeling and open-ended subjective evaluations. It utilizes innovative pre-training and optimization techniques to capture long-term dependencies and achieve remarkable performance on the ``Needle-in-a-Haystack" test.

...read moreread less

...

Expand