Journal Article10.1108/EL-10-2020-0301
Data set entity recognition based on distant supervision
6
TL;DR: This is the first attempt to apply distant learning to the study of data set entity recognition, and introduces a robust vectorised representation and two data augmentation strategies to address the problem inherent in distant supervised learning methods.
read more
Abstract: This paper aims to identify data set entities in scientific literature. To address poor recognition caused by a lack of training corpora in existing studies, a distant supervised learning-based approach is proposed to identify data set entities automatically from large-scale scientific literature in an open domain.,Firstly, the authors use a dictionary combined with a bootstrapping strategy to create a labelled corpus to apply supervised learning. Secondly, a bidirectional encoder representation from transformers (BERT)-based neural model was applied to identify data set entities in the scientific literature automatically. Finally, two data augmentation techniques, entity replacement and entity masking, were introduced to enhance the model generalisability and improve the recognition of data set entities.,In the absence of training data, the proposed method can effectively identify data set entities in large-scale scientific papers. The BERT-based vectorised representation and data augmentation techniques enable significant improvements in the generality and robustness of named entity recognition models, especially in long-tailed data set entity recognition.,This paper provides a practical research method for automatically recognising data set entities in scientific literature. To the best of the authors’ knowledge, this is the first attempt to apply distant learning to the study of data set entity recognition. The authors introduce a robust vectorised representation and two data augmentation strategies (entity replacement and entity masking) to address the problem inherent in distant supervised learning methods, which the existing research has mostly ignored. The experimental results demonstrate that our approach effectively improves the recognition of data set entities, especially long-tailed data set entities.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Exploring developments of the AI field from the perspective of methods, datasets, and metrics
TL;DR: In this article , a multi-stage self-paced learning strategy (MSPL) is proposed to address the negative influence of hard and noisy samples on the model training, and original papers are traced for AI markers.
6
A term function-aware keyword citation network method for science mapping analysis
TL;DR: Zhang et al. as mentioned in this paper proposed a term function-aware keyword citation network to represent the correlation structure of keywords and explored the topology characteristics, question-method bipartite network, and knowledge community structure of the generated network to validate its superiority in science mapping analysis.
4
Journal Article
Beyond Tasks, Methods, and Metrics: Extracting Metrics-driven Mechanism from the Abstracts of AI Articles
TL;DR: This paper proposes a novel knowledge schema, i.e., metrics-driven mechanism knowledge schema ( Operation, Effect, Direction), which depict the knowledge about “How to optimize the quantitative and qualitative metrics of a specific task?”
Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge
TL;DR: In this article , a semi-supervised method called Mix Prompt Tuning (MPT) is proposed to alleviate the dependence on annotated data and improve the performance of multi-granularity academic function recognition tasks with a small number of labeled examples.
From "what" to "how": Extracting the Procedural Scientific Information Toward the Metric-optimization in AI
TL;DR: In this article , a metric-driven mechanism schema (Operation, Effect, Direction, Task) is proposed for the applied AI community, which depicts the procedural scientific information concerning how to optimize the quantitative metrics for a specific task.
References
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
A survey on Image Data Augmentation for Deep Learning
TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.
Neural Architectures for Named Entity Recognition
Guillaume Lample,Miguel Ballesteros,Sandeep Subramanian,Kazuya Kawakami,Chris Dyer +4 more
- 04 Mar 2016
TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
Jason Wei,Kai Zou +1 more
- 05 Mar 2019
TL;DR: This paper proposed easy data augmentation techniques for boosting performance on text classification tasks, which consists of synonym replacement, random insertion, random swap, and random deletion, and showed that EDA improves performance for both convolutional and recurrent neural networks.