Topic

Biomedical text mining

About: Biomedical text mining is a research topic. Over the lifetime, 1000 publications have been published within this topic receiving 31522 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Journal Article•10.1093/BIOINFORMATICS/BTZ682•

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

[...]

Jinhyuk Lee¹, Wonjin Yoon¹, Sungdong Kim², Donghyeon Kim¹, Sunkyu Kim¹, Chan Ho So¹, Jaewoo Kang¹ - Show less +3 more•Institutions (2)

Korea University¹, Naver Corporation²

25 Jan 2019-Bioinformatics

TL;DR: This article proposed BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora.

...read moreread less

Abstract: Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.

...read moreread less

4,929 citations

Book•

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

[...]

Ronen Feldman¹, James Sanger•Institutions (1)

Hebrew University of Jerusalem¹

1 Dec 2006

TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.

...read moreread less

Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.

...read moreread less

1,979 citations

Proceedings Article•10.3115/1034678.1034679•

Untangling Text Data Mining

[...]

Marti A. Hearst¹•Institutions (1)

University of California, Berkeley¹

20 Jun 1999

TL;DR: Data mining, information access, and corpus-based computational linguistics are defined and the relationship of these to text data mining is discussed, and the intent behind these contrasts is to draw attention to exciting new kinds of problems for computational linguists.

...read moreread less

Abstract: The possibilities for data mining from large text collections are virtually untapped. Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically. Perhaps for this reason, there has been little work in text data mining to date, and most people who have talked about it have either conflated it with information access or have not made use of text directly to discover heretofore unknown information. In this paper I will first define data mining, information access, and corpus-based computational linguistics, and then discuss the relationship of these to text data mining. The intent behind these contrasts is to draw attention to exciting new kinds of problems for computational linguists. I describe examples of what I consider to be real text data mining efforts and briefly outline recent ideas about how to pursue exploratory data analysis over text.

...read moreread less

971 citations

Book•10.1201/9781420085938•

Handbook of Natural Language Processing

[...]

Nitin Indurkhya, Fred J. Damerau¹•Institutions (1)

University of New South Wales¹

28 Jan 2010

TL;DR: Fully updated with the latest developments in the field, this comprehensive, modern handbook emphasizes how to implement practical language processing tools in computational systems.

...read moreread less

Abstract: The Handbook of Natural Language Processing, Second Edition presents practical tools and techniques for implementing natural language processing in computer systems. Along with removing outdated material, this edition updates every chapter and expands the content to include emerging areas, such as sentiment analysis. New to the Second Edition Greater prominence of statistical approaches New applications section Broader multilingual scope to include Asian and European languages, along with English An actively maintained wiki (http://handbookofnlp.cse.unsw.edu.au) that provides online resources, supplementary information, and up-to-date developments Divided into three sections, the book first surveys classical techniques, including both symbolic and empirical approaches. The second section focuses on statistical approaches in natural language processing. In the final section of the book, each chapter describes a particular class of application, from Chinese machine translation to information visualization to ontology construction to biomedical text mining. Fully updated with the latest developments in the field, this comprehensive, modern handbook emphasizes how to implement practical language processing tools in computational systems.

...read moreread less

961 citations

Journal Article•10.1093/BIB/6.1.57•

A survey of current work in biomedical text mining

[...]

Aaron Cohen¹, William R. Hersh¹•Institutions (1)

Oregon Health & Science University¹

01 Mar 2005-Briefings in Bioinformatics

TL;DR: The major challenge of biomedical text mining over the next 5-10 years will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

...read moreread less

Abstract: The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5–10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

...read moreread less

860 citations

...

Expand

Performance Metrics

1,128

Papers

5,712

Citations

No. of papers in the topic in previous years
Year	Papers
2025	8
2024	14
2023	50
2022	52
2021	39
2020	55

Biomedical text mining

Topic Tools

Papers published on a yearly basis

Papers

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Untangling Text Data Mining

Handbook of Natural Language Processing

A survey of current work in biomedical text mining

Related Topics (5)

Performance Metrics