PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen H. Bach,Victor Sanh,Zheng-Xin Yong,Albert Webson,Colin Raffel,Nihal V. Nayak,Abheesht Sharma,Taewoon Kim,M Saiful Bari,Thibault Févry,Zaid Alyafeai,Manan Dey,Andrea Santilli,Zhiqing Sun,Srulik Ben-David,Canwen Xu,Gunjan Chhablani,Han Wang,Jason A. Fries,Maged S. Al-shaibani,Shanya Sharma,Urmish Thakker,Khalid Almubarak,Xiangru Tang,Mike Tian-Jian Jiang,Alexander M. Rush +25 more
- 02 Feb 2022
Vol. abs/2202.01279
TL;DR: PromptSource addresses the emergent challenges in this new setting with a templating language for defining data-linked prompts, an interface that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and a community-driven set of guidelines for contributing new prompts to a common pool.
read more
Abstract: PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges in this new setting with (1) a templating language for defining data-linked prompts, (2) an interface that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and (3) a community-driven set of guidelines for contributing new prompts to a common pool. Over 2,000 prompts for roughly 170 datasets are already available in PromptSource. PromptSource is available at https://github.com/bigscience-workshop/promptsource.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Teven Le Scao,Angela Fan,Christopher Akiki,Elizabeth-Jane Pavlick,Suzana Ilic,Daniel Hesslow,Roman Castagn'e,Alexandra Luccioni,Franccois Yvon,Matthias Gallé,J. S. Tow,Alexander M. Rush,Stella Biderman,Albert Webson,Pawan Sasanka Ammanamanchi,Thomas Wang,Benoît Sagot,Niklas Muennighoff,A. Villanova del Moral,Olatunji Ruwase,R. Bawden,Stas Bekman,Angelina McMillan-Major,Iz Beltagy,Huu Nguyen,Lucile Saulnier,Samson Tan,Pedro Javier Ortiz Suárez,Victor Sanh,Hugo Laurenccon,Yacine Jernite,Julien Launay,Margaret Mitchell,Colin Raffel,Aaron Gokaslan,Adi Simhi,Aitor Soroa,Alham Fikri Aji,Amit Alfassy,Anna Rogers,Ariel Kreisberg Nitzav,Canwen Xu,Chenghao Mou,Chris Chinenye Emezue,Christopher Klamm,Colin D. Leong,Daniel van Strien,David Ifeoluwa Adelani,Dragomir R. Radev,Eduardo G. Ponferrada,Efrat Levkovizh,Ethan Kim,Eyal Natan,Francesco De Toni,Gérard Dupont,G. Kruszewski,Giada Pistilli,Hady Elsahar,Hamza Benyamina,H. Tran,Ian Yu,Idris Abdulmumin,Isaac Johnson,Itziar Gonzalez-Dios,Javier de la Rosa,Jenny Chim,Jesse Dodge,Jian Zhou,Jonathan Chang,Jorg Frohberg,Josephine L. Tobing,Joydeep Bhattacharjee,Khalid Almubarak,Kimbo Chen,Kyle Lo,Leandro von Werra,Leon Weber,Long Phan,Loubna Ben Allal,Ludovic Tanguy,Manan Dey,Manuel Romero Muñoz,Maraim Masoud,Mar'ia Grandury,Mario vSavsko,Max Huang,Maximin Coavoux,Mayank Singh,Mike Tian-Jian Jiang,Minh Chien Vu,M. A. Jauhar,Mustafa Ghaleb,Nishant Subramani,Nora Kassner,Nurulaqilla Khamis,Olivier Nguyen,Omar Espejel,Ona de Gibert,Paulo Villegas,Peter Henderson,Pierre Colombo,Priscilla Amuok,Quentin Lhoest,Rheza Harliman,Rishi Bommasani,R. L'opez,Salomey Osei,Sampo Pyysalo,Sebastian Nagel,Shamik Bose,Shamsuddeen Hassan Muhammad,Shanya Sharma,Shayne Longpre,Somaieh Nikpoor,Stanislav Silberberg,Suhas Pai,S Zink,Tiago Timponi Torrent,Timo Schick,Tristan Thrush,Valentin Danchev,Vassilina Nikoulina,Veronika Laippala,Violette Lepercq,V. Prabhu,Zaid Alyafeai,Zeerak Talat,Arun Raja,Benjamin Heinzerling,Chenglei Si,Elizabeth Salesky,Sabrina J. Mielke,Wilson Y. Lee,Abheesht Sharma,Andrea Santilli,Antoine Chaffin,Arnaud Stiegler,Debajyoti Datta,Eliza Szczechla,Gunjan Chhablani,Han Wang,Harshit Pandey,Hendrik Strobelt,Jason A. Fries,Jos Rozen,Leo Gao,Lintang A. Sutawika,M Saiful Bari,Maged S. Al-shaibani,Matteo Manica,Nihal V. Nayak,Ryan Teehan,Samuel Albanie,Sheng Shen,Srulik Ben-David,Stephen H. Bach,Taewoon Kim,T. G. Owe Bers,Thibault Févry,Trishala Neeraj,Urmish Thakker,Vikas Raunak,Xiang Tang,Zheng-Xin Yong,Zhiqing Sun,Shaked Brody,Y Uri,Hadar Tojarieh,Adam Roberts,Hyung Won Chung,Jae-Oong Tae,Jason Phang,Ofir Press,Conglong Li,Deepak Narayanan,Hatim Bourfoune,Jared Casper,Jeffrey Thomas Rasley,Maksim Riabinin,Mayank Mishra,Minjia Zhang,Mohammad Shoeybi,Myriam Peyrounette,Nicolas Patry,Nouamane Tazi,Omar Sanseviero,Patrick von Platen,Pierre Cornette,Pierre Franccois Lavall'ee,R. Lacroix,Samyam Rajbhandari,Sanchit Gandhi,Shaden Smith,S. Requena,Suraj Patil,Tim Dettmers,A. D. Baruwa,Anastasia Cheveleva,Anne-Laure Ligozat,Arjun Subramonian,Aur'elie N'ev'eol,Charles Lovering,Daniel H Garrette,Deepak R. Tunuguntla,Ehud Reiter,Ekaterina Taktasheva,E. Voloshina,Eli Bogdanov,Genta Indra Winata,Hailey Schoelkopf,Jan-Christoph Kalo,Jekaterina Novikova,Jessica Zosa Forde,Xiangru Tang,Jungo Kasai,Ken Kawamura,Liam Hazan,Marine Carpuat,Miruna-Adriana Clinciu,Najoung Kim,Newton Cheng,Oleg Serikov,Omer Antverg,Oskar van der Wal,Rui Zhang,Ruochen Zhang,Sebastian Gehrmann,Shachar Mirkin,S. Osher Pais,Tatiana Shavrina,Thomas Scialom,Tian Yun,Tomasz Limisiewicz,V. Rieser,Vitaly Protasov,Vladislav Mikhailov,Yada Pruksachatkun,Yonatan Belinkov,Zachary Bamberger,Zdenvek Kasner,Alice Rueda,A. Pestana,Amir Feizpour,Ammar Khan,Amy Faranak,A. Santos,Anthony Hevia,Antigona Unldreaj,Arash Aghagol,Arezoo Abdollahi,Aycha Tammour,Azadeh HajiHosseini,Bahareh Behroozi,Benjamin Olusola Ajibade,Bharat Kumar Saxena,Carlos Muñoz Ferrandis,Danish Contractor,David Lansky,Davis David,Douwe Kiela,Luong An Nguyen,Edward Tan,Emily Baylor,Ezinwanne Ozoani,Fatim Tahirah Mirza,Frankline Ononiwu,Habib Rezanejad,H.A. Jones,Indrani Bhattacharya,Irene Solaiman,Irina Sedenko,Isar Nejadgholi,J. Lawrence Passmore,Joshua Seltzer,Julio Bonis Sanz,Lívia Macedo Dutra,Mairon Samagaio,Maraim Elbadri,M. Mieskes,Marissa Gerchick,Martha Akinlolu,Michael McKenna,Mike Qiu,M. K. K. Ghauri,Mykola Burynok,Nafis Abrar,Nazneen Fatema Rajani,Nour Elkott,Nourhan Fahmy,O. Samuel,Ran An,R. P. Kromann,Ryan Hao,Samira Alizadeh,Sarmad Shubber,Silas L Wang,Sourav Roy,Sylvain Viguier,Thanh-Cong Le,Tobi Oyebade,T. Le,Yoyo Yang,Zachary Nguyen,Abhinav Ramesh Kashyap,Alfredo Palasciano,Alison Callahan,Anima Shukla,Antonio Miranda-Escalada,Ayush Kumar Singh,Benjamin Beilharz,Bo Wang,C. Brito,Chenxi Zhou,Chirag Jain,Chuxin Xu,Clémentine Fourrier,Daniel Le'on Perin'an,Daniel Molano,Dian Yu,Enrique Manjavacas,Fabio Barth,Florian Fuhrimann,Gabriel Altay,Giyaseddin Bayrak,Helena U Vrabec,I. Bello,Isha Dash,Jihyun Kang,John M Giorgi,Jonas Golde,J. Posada,Karthi Sivaraman,Lokesh Bulchandani,Lu Liu,Luisa Shinzato,Madeleine Hahn de Bykhovetz,Maiko Takeuchi,Marc Pàmies,M Andrea Castillo,Marianna Nezhurina,Mario Sanger,Matthias Samwald,Michael Cullan,Michaela Django Weinberg,M. Wolf,Mina Mihaljcic,Minna Liu,M. Freidank,Myungsun Kang,Natasha Seelam,Nathan B Dahlberg,Nicholas Broad,N. Muellner,Pascale Fung,Patricia Haller,R. Chandrasekhar,R. Eisenberg,Robert Martin,Rodrigo L. Canalli,Rosaline Su,Ruisi Su,Samuel Cahyawijaya,Samuele Garda,Shlok S Deshmukh,Shubhanshu Mishra,Sid Kiblawi,Simon Ott,Sinee Sang-aroonsiri,Srishti Kumar,Stefan Schweter,Sushil Pratap Bharati,Tanmay Laud,Th'eo Gigant,Tomoya Kainuma,Wojciech Kusa,Yanis Labrak,Yashasvi Bajaj,Y. Venkatraman,Yifan Xu,Ying Xu,Yunchao Xu,Z. Tan,Zhong-li Xie,Zifan Ye,Mathilde Bras,Younes Belkada,T. Wolf +386 more
TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).
1.4K
A Survey of Large Language Models
Wayne Xin Zhao,Kun Zhou,Junyi Li,Tianyi Tang,Xiaolei Wang,Yupeng Hou,Yingqian Min,Beichen Zhang,Junjie Zhang,Zican Dong,Yifan Du,Chen Yang,Yushuo Chen,Zhongyong Chen,Jinhao Jiang,Ruiyang Ren,Yifan Li,Xinyu Tang,Zikang Liu,Peiyu Liu,Jian-Yun Nie,Ji-Rong Wen +21 more
TL;DR: Recently, a large language model (LLM) as mentioned in this paper has been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang,Yeganeh Kordi,Swaroop Mishra,Alisa Liu,Noah A. Smith,Daniel Khashabi,Hannaneh Hajishirzi +6 more
- 20 Dec 2022
TL;DR: The authors propose Self-Instruct, a framework for improving the instruction-following capabilities of pre-trained language models by bootstrapping off their own generations, which generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model.
QLoRA: Efficient Finetuning of Quantized LLMs
TL;DR: QLoRA as discussed by the authors proposes to backpropagate gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters (LoRA) and achieves state-of-the-art performance.
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng,Xiao Liu,Zhengxiao Du,Zihan Wang,Hanyu Lai,Ming Ding,Zhuoyi Yang,Yifan Xu,Wendi Zheng,Xiao Xia,Weng Lam Tam,Zixuan Ma,Yufei Xue,Jidong Zhai,Wenguang Chen,Feng Zhang,Yuxiao Dong,Jie Tang +17 more
- 05 Oct 2022
TL;DR: An attempt to open-source a 100B-scale model at least as good as GPT-3 and unveil how models of such a scale can be successfully pre-trained, including its design choices, training strategies for both efficiency and stability, and engineering efforts is introduced.
707
References
•Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
- 28 May 2020
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
A large annotated corpus for learning natural language inference
Samuel R. Bowman,Gabor Angeli,Christopher Potts,Christopher D. Manning +3 more
- 21 Aug 2015
TL;DR: The Stanford Natural Language Inference (SNLI) corpus as discussed by the authors is a large-scale collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.
•Proceedings Article
brat: a Web-based Tool for NLP-Assisted Text Annotation
Pontus Stenetorp,Sampo Pyysalo,Goran Topic,Tomoko Ohta,Sophia Ananiadou,Jun'ichi Tsujii +5 more
- 23 Apr 2012
TL;DR: The brat rapid annotation tool (BRAT) is introduced, an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology and an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15% decrease in total annotation time.
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick,Hinrich Schütze +1 more
- 01 Jun 2021
TL;DR: This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.
1.1K
•Posted Content
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick,Hinrich Schütze +1 more
TL;DR: This work introduces Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task.
1K
Related Papers (5)
Deborah Hawkins
- 16 Aug 2017
N. Mustafa,Dinmukhamed Kelesbayev,Zh. Alibekova +2 more
- 31 Oct 2018
Lai Ji-chuan
Jia Qing-jun