Journal Article10.48550/arXiv.2304.07854
Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation
13
TL;DR: This article examined the influence of training data factors, including quantity, quality, and linguistic distribution, on model performance and provided valuable insights for the continued advancement of open-source chat models.
read more
Abstract: Recently, significant public efforts have been directed towards developing low-cost models with capabilities akin to ChatGPT, thereby fostering the growth of open-source conversational models. However, there remains a scarcity of comprehensive and in-depth evaluations of these models' performance. In this study, we examine the influence of training data factors, including quantity, quality, and linguistic distribution, on model performance. Our analysis is grounded in several publicly accessible, high-quality instruction datasets, as well as our own Chinese multi-turn conversations. We assess various models using a evaluation set of 1,000 samples, encompassing nine real-world scenarios. Our goal is to supplement manual evaluations with quantitative analyses, offering valuable insights for the continued advancement of open-source chat models. Furthermore, to enhance the performance and training and inference efficiency of models in the Chinese domain, we extend the vocabulary of LLaMA - the model with the closest open-source performance to proprietary language models like GPT-3 - and conduct secondary pre-training on 3.4B Chinese words. We make our model, data, as well as code publicly available.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
SoTaNa: The Open-Source Software Development Assistant
Ensheng Shi,Fengji Zhang,Yanlin Wang,Bei Chen,Lun Du,Hongyu Zhang,Shi Han,Dongmei Zhang,Hongbin Sun +8 more
TL;DR: SoTaNa utilizes ChatGPT to generate high-quality instruction-based data for the domain of software engineering and employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
7
How Multilingual is Multilingual LLM?
TL;DR: This study evaluates the multilingual capacity of LLMs by conducting an exhaustive analysis across 101 languages, and classifies languages with similar characteristics into four distinct quadrants, shedding light on the rationale behind their categorization and offering actionable guidelines for tuning these languages.
3
Token-free LLMs Can Generate Chinese Classical Poetry with More Accurate Format
TL;DR: The finetuned token-free model, which is based on Qwen-chat-7B, is released, which can generate chinese classical poetry following complex instructions like LLMs (such as story paraphrasing), and also perform well in format.
3
KwaiYiiMath: Technical Report
Jia-Yi Fu,Lei Lin,Xiaoyang Gao,Pengli Liu,Zhengzong Chen,Zhirui Yang,Shengnan Zhang,Xue Zheng,Yan Li,Yuliang Liu,Xucheng Ye,Yiqiao Liao,Chao Liao,Bin Chen,Chengru Song,Junchen Wan,Zijia Lin,Fuzheng Zhang,Zhongyuan Wang,Di Zhang,Kun Gai +20 more
TL;DR: The KwaiyiiMath is introduced, which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks.
Flames: Benchmarking Value Alignment of LLMs in Chinese
Kexin Huang,Xiangyang Liu,Qianyu Guo,Tianxiang Sun,Jiawei Sun,Yaru Wang,Zeyang Zhou,Yixu Wang,Yan Teng,Xipeng Qiu,Yingchun Wang,Dahua Lin +11 more
- 12 Nov 2023
TL;DR: This paper proposes Flames, a value alignment benchmark for large language models (LLMs) that evaluates their alignment with human values, particularly in the Chinese context, and finds most mainstream LLMs perform poorly on safety and fairness dimensions.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
•Posted Content
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Training language models to follow instructions with human feedback
Long Ouyang,Jeffrey Wu,Xu Jiang,Diogo Almeida,Carroll L. Wainwright,Pamela Mishkin,Chong Zhang,Sandhini Agarwal,Katarina Slama,Alex Ray,John Schulman,Jacob Hilton,Fraser Kelton,Luke E. Miller,Maddie Simens,Amanda Askell,Peter Welinder,Paul F. Christiano,Jan Leike,Ryan Lowe +19 more
- 04 Mar 2022
TL;DR: The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.
7.1K
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron,Thibaut Lavril,Gautier Izacard,Xavier Martinet,Marie-Anne Lachaux,Timothée Lacroix,Baptiste Roziere,Naman Goyal,Eric Hambro,Faisal Azhar,Aur'elien Rodriguez,Armand Joulin,Edouard Grave,Guillaume Lample +13 more
TL;DR: This article introduced LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, and trained their models on trillions of tokens, and showed that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets.