Data set entity recognition based on distant supervision

doi:10.1108/EL-10-2020-0301

Journal Article10.1108/EL-10-2020-0301

Data set entity recognition based on distant supervision

Pengcheng Li, +3 more

- 26 Jul 2021

- The Electronic Library

- Vol. 39, Iss: 3, pp 435-449

6

TL;DR: This is the first attempt to apply distant learning to the study of data set entity recognition, and introduces a robust vectorised representation and two data augmentation strategies to address the problem inherent in distant supervised learning methods.

Abstract: This paper aims to identify data set entities in scientific literature. To address poor recognition caused by a lack of training corpora in existing studies, a distant supervised learning-based approach is proposed to identify data set entities automatically from large-scale scientific literature in an open domain.,Firstly, the authors use a dictionary combined with a bootstrapping strategy to create a labelled corpus to apply supervised learning. Secondly, a bidirectional encoder representation from transformers (BERT)-based neural model was applied to identify data set entities in the scientific literature automatically. Finally, two data augmentation techniques, entity replacement and entity masking, were introduced to enhance the model generalisability and improve the recognition of data set entities.,In the absence of training data, the proposed method can effectively identify data set entities in large-scale scientific papers. The BERT-based vectorised representation and data augmentation techniques enable significant improvements in the generality and robustness of named entity recognition models, especially in long-tailed data set entity recognition.,This paper provides a practical research method for automatically recognising data set entities in scientific literature. To the best of the authors’ knowledge, this is the first attempt to apply distant learning to the study of data set entity recognition. The authors introduce a robust vectorised representation and two data augmentation strategies (entity replacement and entity masking) to address the problem inherent in distant supervised learning methods, which the existing research has mostly ignored. The experimental results demonstrate that our approach effectively improves the recognition of data set entities, especially long-tailed data set entities.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1016/j.ipm.2022.103157

Exploring developments of the AI field from the perspective of methods, datasets, and metrics

Rujing Yao, +4 more

- 01 Mar 2023

- Information Processing and Management

TL;DR: In this article , a multi-stage self-paced learning strategy (MSPL) is proposed to address the negative influence of hard and noisy samples on the model training, and original papers are traced for AI markers.

...read moreread less

6

Journal Article•10.1016/j.ipm.2023.103405

A term function-aware keyword citation network method for science mapping analysis

Jiamin Wang, +4 more

- 01 Jul 2023

- Information Processing and Management

TL;DR: Zhang et al. as mentioned in this paper proposed a term function-aware keyword citation network to represent the correlation structure of keywords and explored the topology characteristics, question-method bipartite network, and knowledge community structure of the generated network to validate its superiority in science mapping analysis.

...read moreread less

4

Journal Article

Beyond Tasks, Methods, and Metrics: Extracting Metrics-driven Mechanism from the Abstracts of AI Articles

Yongqiang Ma, +3 more

TL;DR: This paper proposes a novel knowledge schema, i.e., metrics-driven mechanism knowledge schema ( Operation, Effect, Direction), which depict the knowledge about “How to optimize the quantitative and qualitative metrics of a specific task?”

...read moreread less

Journal Article•10.48550/arXiv.2305.03287

Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

Jiawei Liu, +6 more

- 05 May 2023

- arXiv.org

TL;DR: In this article , a semi-supervised method called Mix Prompt Tuning (MPT) is proposed to alleviate the dependence on annotated data and improve the performance of multi-granularity academic function recognition tasks with a small number of labeled examples.

...read moreread less

Journal Article•10.1016/j.ipm.2023.103315

From "what" to "how": Extracting the Procedural Scientific Information Toward the Metric-optimization in AI

Yongqiang Ma, +3 more

- 01 May 2023

- Information Processing and Management

TL;DR: In this article , a metric-driven mechanism schema (Operation, Effect, Direction, Task) is proposed for the applied AI community, which depicts the procedural scientific information concerning how to optimize the quantitative metrics for a specific task.

...read moreread less

References

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K

•Journal Article•10.1186/S40537-019-0197-0

A survey on Image Data Augmentation for Deep Learning

Connor Shorten, +1 more

- 06 Jul 2019

- Journal of Big Data

TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.

...read moreread less

10.6K

•Proceedings Article•10.18653/V1/N16-1030

Neural Architectures for Named Entity Recognition

Guillaume Lample, +4 more

- 04 Mar 2016

TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.

...read moreread less

5.3K

•Proceedings Article•10.18653/V1/D19-1670

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Jason Wei, +1 more

- 05 Mar 2019

TL;DR: This paper proposed easy data augmentation techniques for boosting performance on text classification tasks, which consists of synonym replacement, random insertion, random swap, and random deletion, and showed that EDA improves performance for both convolutional and recurrent neural networks.

...read moreread less

1.6K