Topic

Truecasing

About: Truecasing is a research topic. Over the lifetime, 22 publications have been published within this topic receiving 271 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers

Journal Article•10.1093/BIOINFORMATICS/BTT580•

Anatomical entity mention recognition at literature scale.

[...]

Sampo Pyysalo¹, Sophia Ananiadou¹•Institutions (1)

University of Manchester¹

15 Mar 2014-Bioinformatics

TL;DR: AnatomyTagger is presented, a machine learning-based system for anatomical entity mention recognition that incorporates a broad array of approaches proposed to benefit tagging, including the use of Unified Medical Language System (UMLS)- and Open Biomedical Ontologies (OBO)-based lexical resources, word representations induced from unlabeled text, statistical truecasing and non-local features.

...read moreread less

Abstract: Motivation: Anatomical entities ranging from subcellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyze various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced. Results: We present AnatomyTagger, a machine learning-based system for anatomical entity mention recognition. The system incorporates a broad array of approaches proposed to benefit tagging, including the use of Unified Medical Language System (UMLS)- and Open Biomedical Ontologies (OBO)-based lexical resources, word representations induced from unlabeled text, statistical truecasing and non-local features. We train and evaluate the system on a newly introduced corpus that substantially extends on previously available resources, and apply the resulting tagger to automatically annotate the entire open access scientific domain literature. The resulting analyses have been applied to extend services provided by the Europe PubMed Central literature database. Availability and implementation: All tools and resources introduced in this work are available from http://nactem.ac.uk/anatomytagger. Contact: ku.ca.retsehcnam@uodainana.aihpos Supplementary Information: Supplementary data are available at Bioinformatics online.

...read moreread less

90 citations

Proceedings Article•10.3115/1626431.1626464•

Edinburgh's Submission to all Tracks of the WMT 2009 Shared Task with Reordering and Speed Improvements to Moses

[...]

Philipp Koehn¹, Barry Haddow¹•Institutions (1)

University of Edinburgh¹

30 Mar 2009

TL;DR: A drug for preventing the absorption of food materials dissolved during digestion, adrug for preventing obesity, a drug for treating hyperlipemia, aDrug for treating diabetes mellitus, and a drugFor preventing constipation, wherein the flocculant and other auxiliary additives at request, are coated with the aquatic enteric material.

...read moreread less

Abstract: Edinburgh University participated in the WMT 2009 shared task using the Moses phrase-based statistical machine translation decoder, building systems for all language pairs. The system configuration was identical for all language pairs (with a few additional components for the German-English language pairs). This paper describes the configuration of the systems, plus novel contributions to Moses including truecasing, more efficient decoding methods, and a framework to specify reordering constraints.

...read moreread less

44 citations

Improving the Extraction of Clinical Concepts from Clinical Records

[...]

X Fu¹, Sophia Ananiadou¹•Institutions (1)

University of Manchester¹

1 May 2014

TL;DR: This work proposed a machine learning-based named entity recognition system to extract clinical concepts from patient discharge summaries and progress notes without the need for any external knowledge resources.

...read moreread less

Abstract: Essential information relevant to medical problems, tests, and treatments is often expressed in patient clinical records with natural language, making their processing a daunting task for automated systems. One of the steps towards alleviating this problem is concept extraction. In this work, we proposed a machine learning-based named entity recognition system to extract clinical concepts from patient discharge summaries and progress notes without the need for any external knowledge resources. Three preand post-processing methods were investigated, i.e. truecasing, abbreviation disambiguation, and distributional thesaurus lookup, the individual annotation results of which were combined into a final annotation set using two refinement schemes. While truecasing and abbreviation disambiguation capture the inflectional morphology of words, the distributional thesaurus lookup allows for statistics-based similarity matching. We achieved a maximum F-score of 0.7586 and 0.8444 for exact and inexact matching, respectively. Our results show that truecasing and annotation combination are the enhancements which best increase the system performance, whereas abbreviation disambiguation and distributional thesaurus lookup bring about insignificant improvements.

...read moreread less

28 citations

Posted Content•

Robust Named Entity Recognition with Truecasing Pretraining

[...]

Stephen Mayhew, Nitish Gupta, Dan Roth¹•Institutions (1)

University of Pennsylvania¹

15 Dec 2019-arXiv: Computation and Language

TL;DR: This work addresses the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data.

...read moreread less

Abstract: Although modern named entity recognition (NER) systems show impressive performance on standard datasets, they perform poorly when presented with noisy data. In particular, capitalization is a strong signal for entities in many languages, and even state of the art models overfit to this feature, with drastically lower performance on uncapitalized text. In this work, we address the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data. The pretrained truecaser is combined with a standard BiLSTM-CRF model for NER by appending output distributions to character embeddings. In experiments over several datasets of varying domain and casing quality, we show that our new model improves performance in uncased text, even adding value to uncased BERT embeddings. Our method achieves a new state of the art on the WNUT17 shared task dataset.

...read moreread less

25 citations

Proceedings Article•10.18653/V1/N18-3015•

From dictations to clinical reports using machine translation

[...]

Gregory Finley¹, Wael Salloum², Najmeh Sadoughi, Erik Edwards³, Amanda L. Robinson⁴, Nico Axtmann, Michael Brenndoerfer⁵, Mark Miller, David Suendermann-Oeft⁶ - Show less +5 more•Institutions (6)

University of Minnesota¹, Columbia University², University of California, San Francisco³, Cardiff University⁴, University of California, Berkeley⁵, Educational Testing Service⁶

1 Jun 2018

TL;DR: This work introduces a novel holistic approach to post-processing that relies on machine callytranslation and shows how this technique outperforms an alternative conventional system—even learning to correct speech recognition errors during post- processing—while being much simpler to maintain.

...read moreread less

Abstract: A typical workflow to document clinical encounters entails dictating a summary, running speech recognition, and post-processing the resulting text into a formatted letter. Post-processing entails a host of transformations including punctuation restoration, truecasing, marking sections and headers, converting dates and numerical expressions, parsing lists, etc. In conventional implementations, most of these tasks are accomplished by individual modules. We introduce a novel holistic approach to post-processing that relies on machine callytranslation. We show how this technique outperforms an alternative conventional system—even learning to correct speech recognition errors during post-processing—while being much simpler to maintain.

...read moreread less

19 citations

...

Expand

Performance Metrics

Papers

101

Citations

No. of papers in the topic in previous years
Year	Papers
2021	3
2020	7
2019	3
2018	2
2015	2
2014	2

Truecasing

Topic Tools

Papers

Anatomical entity mention recognition at literature scale.

Edinburgh's Submission to all Tracks of the WMT 2009 Shared Task with Reordering and Speed Improvements to Moses

Improving the Extraction of Clinical Concepts from Clinical Records

Robust Named Entity Recognition with Truecasing Pretraining

From dictations to clinical reports using machine translation

Related Topics (5)

Performance Metrics