Open AccessPosted Content
A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
TL;DR: In this paper, a survey of low-resource natural language processing methods is presented, including data augmentation, distant supervision, and transfer learning settings that reduce the need for target supervision.
read more
Abstract: Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Sentiment Analysis Approach to Predict an Individual's Awareness of the Precautionary Procedures to Prevent COVID-19 Outbreaks in Saudi Arabia.
Sumayh S. Aljameel,Dina A. Alabbad,Norah A. Alzahrani,Shouq M. Alqarni,Fatimah A. Alamoudi,Lana M. Babili,Somiah K. Aljaafary,Fatima M. Alshamrani +7 more
TL;DR: A model that predicts an individual’s awareness of the precautionary procedures in five main regions in Saudi Arabia can support the medical sectors and decision-makers to decide the appropriate procedures for each region based on their attitudes towards the pandemic.
91
A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts
Priyankar Bose,Sriram Srinivasan,William C. Sleeman,Jatinder R. Palta,Rishabh Kapoor,Preetam Ghosh +5 more
TL;DR: This comprehensive survey on clinical NER and RE encompass current challenges, state-of-the-art practices, and future directions in information extraction from clinical text.
84
•Posted Content
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey.
TL;DR: The authors focus on domain adaptation for NMT, particularly the case where a system may need to translate sentences from multiple domains, and divide techniques into those relating to data selection, model architecture, parameter adaptation procedure, and inference procedure.
46
•Posted Content
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP
TL;DR: The authors provided an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting, summarizing the landscape of methods and carrying out experiments on 11 datasets covering topics/news classification, inference tasks, paraphrasing tasks, and single-sentence tasks.
16
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
TL;DR: The authors survey approaches to domain adaptation for NMT, particularly where a system may need to translate across multiple domains, and highlight the benefits of domain adaptation and multidomain adaptation techniques to other lines of NMT research.
References
Data Augmentation for Low-Resource Neural Machine Translation
TL;DR: This article proposed a data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts, which improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2BLEU over back-translation.
Patent classification by fine-tuning BERT language model
Jieh Sheng Lee,Jieh Hsiang +1 more
TL;DR: When applied to large datasets of over two million patents, this approach outperforms the state of the art by an approach using CNN with word embeddings and shows that patent claims alone are sufficient to achieve state-of-the-art results for classification task, in contrast to conventional wisdom.
180
Robust Multilingual Part-of-Speech Tagging via Adversarial Training
Michihiro Yasunaga,Jungo Kasai,Dragomir R. Radev +2 more
- 01 Jun 2018
TL;DR: It is found that AT not only improves the overall tagging accuracy, but also prevents over-fitting well in low resource languages and boosts tagging accuracy for rare / unseen words.
Cross-Lingual Word Embeddings for Low-Resource Language Modeling
Oliver Adams,Adam J. Makarucha,Graham Neubig,Steven Bird,Trevor Cohn +4 more
- 01 Apr 2017
TL;DR: This work investigates the use of bilingual lexicons to improve language models when textual training data is limited to as few as a thousand sentences, and involves learning cross-lingual word embeddings as a preliminary step in training monolingual language models.
Urdu language processing: a survey
Ali Daud,Wahab Khan,Dunren Che +2 more
TL;DR: The goal of this paper is to organize the ULP work in a way that it can provide a platform for ULP research activities in future and to describe in detail the recent increase in interest and progress made in Urdu language processing research.
175
Related Papers (5)
Jing Shao,Siyu Chen,Yangguang Li,Kun Wang,Zhenfei Yin,Yinan He,Teng Jianing,Qinghong Sun,Mengya Gao,Jihao Liu,Huang Gengshi,Guanglu Song,Yichao Wu,Yuming Huang,Fenggang Liu,Huan Peng,Shuo Qin,Chengyu Wang,Yujie Wang,Conghui He,Ding Liang,Yu Liu,Fengwei Yu,Junjie Yan,Dahua Lin,Xiaogang Wang,Yu Qiao +26 more