Top 57 papers published in the topic of Clef in 2020

Showing papers on "Clef published in 2020"

Book Chapter•10.1007/978-3-030-58219-7_26•

Overview of Touché 2020: Argument Retrieval

[...]

Alexander Bondarenko¹, Maik Fröbe¹, Meriem Beloucif², Lukas Gienapp³, Yamen Ajjour¹, Alexander Panchenko⁴, Chris Biemann², Benno Stein⁵, Henning Wachsmuth⁶, Martin Potthast³, Matthias Hagen¹ - Show less +7 more•Institutions (6)

Martin Luther University of Halle-Wittenberg¹, University of Hamburg², Leipzig University³, Skolkovo Institute of Science and Technology⁴, Bauhaus University, Weimar⁵, University of Paderborn⁶

22 Sep 2020

TL;DR: This paper is a condensed report on Touche: the first shared task on argument retrieval that was held at CLEF 2020 and runs two tasks: supporting individuals in finding arguments on socially important topics and supporting individuals with arguments on everyday personal decisions.

...read moreread less

Abstract: This paper is a condensed report on Touche: the first shared task on argument retrieval that was held at CLEF 2020. With the goal to create a collaborative platform for research in argument retrieval, we run two tasks: (1) supporting individuals in finding arguments on socially important topics and (2) supporting individuals with arguments on everyday personal decisions.

...read moreread less

48 citations

Posted Content•

Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media

[...]

Alberto Barrón-Cedeño¹, Tamer Elsayed², Preslav Nakov³, Giovanni Da San Martino³, Maram Hasanain², Reem Suwaileh², Fatima Haouari², Nikolay Babulkov⁴, Bayan Hamdan, Alex Nikolov⁴, Shaden Shaar³, Zien Sheikh Ali² - Show less +8 more•Institutions (4)

University of Bologna¹, Qatar University², Qatar Computing Research Institute³, Sofia University⁴

15 Jul 2020-arXiv: Computation and Language

TL;DR: The CheckThat! Lab at CLEF 2020 as mentioned in this paper was the third edition of the CLEF 2019 challenge, which consisted of five tasks in two different languages: English and Arabic: check-worthiness estimation, retrieving previously fact-checked claims, evidence retrieval, and claim verification.

...read moreread less

Abstract: We present an overview of the third edition of the CheckThat! Lab at CLEF 2020. The lab featured five tasks in two different languages: English and Arabic. The first four tasks compose the full pipeline of claim verification in social media: Task 1 on check-worthiness estimation, Task 2 on retrieving previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on claim verification. The lab is completed with Task 5 on check-worthiness estimation in political debates and speeches. A total of 67 teams registered to participate in the lab (up from 47 at CLEF 2019), and 23 of them actually submitted runs (compared to 14 at CLEF 2019). Most teams used deep neural networks based on BERT, LSTMs, or CNNs, and achieved sizable improvements over the baselines on all tasks. Here we describe the tasks setup, the evaluation results, and a summary of the approaches used by the participants, and we discuss some lessons learned. Last but not least, we release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research in the important tasks of check-worthiness estimation and automatic claim verification.

...read moreread less

22 citations

Book Chapter•10.1007/978-3-030-45442-5_66•

Shared Tasks on Authorship Analysis at PAN 2020.

[...]

Janek Bevendorff¹, Bilal Ghanem², Anastasia Giachanou², Mike Kestemont³, Enrique Manjavacas³, Martin Potthast⁴, Francisco Rangel, Paolo Rosso², Günther Specht⁵, Efstathios Stamatatos⁶, Benno Stein¹, Matti Wiegmann¹, Eva Zangerle⁵ - Show less +9 more•Institutions (6)

Bauhaus University, Weimar¹, Polytechnic University of Valencia², University of Antwerp³, Leipzig University⁴, University of Innsbruck⁵, University of the Aegean⁶

14 Apr 2020

TL;DR: The paper gives a brief overview of the four shared tasks that are to be organized at the PAN 2020 lab on digital text forensics and stylometry, hosted at CLEF conference.

...read moreread less

Abstract: The paper gives a brief overview of the four shared tasks that are to be organized at the PAN 2020 lab on digital text forensics and stylometry, hosted at CLEF conference. The tasks include author profiling, celebrity profiling, cross-domain author verification, and style change detection, seeking to advance the state of the art and to evaluate it on new benchmark datasets.

...read moreread less

21 citations

An Extended Overview of the CLEF 2020 ChEMU Lab: Information Extraction of Chemical Reactions from Patents.

[...]

1 Jan 2020

TL;DR: An overview of the ChEMU2020 lab, which focuses on extracting synthesis process of new chemical compounds from chemical patents, the resources created for the two tasks, the evaluation methodology adopted, and participants results are described.

...read moreread less

Abstract: The discovery of new chemical compounds is perceived as a key driver of the chemistry industry and many other economic sectors. The information about the new discoveries are usually disclosed in scientific literature and in particular, in chemical patents, since patents are often the first venues where the new chemical compounds are publicized. Despite the significance of the information provided in chemical patents, extracting the information from patents is costly due to the large volume of existing patents and its drastic expansion rate. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), provides a platform to advance the state-of-the-arts in automatic information extraction systems over chemical patents. In particular, we focus on extracting synthesis process of new chemical compounds from chemical patents. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 targets at chemical named entity recognition, i.e., the identification of chemical compounds and their specific roles in chemical reactions. Task 2 targets at event extraction, i.e., the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. In this paper, we provide an overview of our ChEMU2020 lab. Herein, we describe the resources created for the two tasks, the evaluation methodology adopted, and participants results. We also provide a brief summary of the methods employed by participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than the baseline methods prepared by the organizers.

...read moreread less

13 citations

Journal Article•10.1145/3372328•

Exploring Disorder-Aware Attention for Clinical Event Extraction

[...]

Shweta Yadav¹, Pralay Ramteke², Asif Ekbal², Sriparna Saha², Pushpak Bhattacharyya² - Show less +1 more•Institutions (2)

Wright State University¹, Indian Institute of Technology Patna²

13 Apr 2020-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: In this article, a deep learning approach is proposed that aims to extract specific information concerning incidents embedded in the texts from biomedical text mining texts.

...read moreread less

Abstract: Event extraction is one of the crucial tasks in biomedical text mining that aims to extract specific information concerning incidents embedded in the texts. In this article, we propose a deep learning framework that aims to identify the attributes (severity, course, temporal expression, and document creation time) associated with the medical concepts extracted from electronic medical records. The bi-directional long short-term memory network assisted by the attention mechanism is utilized to uncover the important aspects of the patient’s medical conditions. The attention mechanism specific to the medical disorder mention can focus on various parts of the sentence when different disorders are considered as input. The proposed methodology is evaluated on benchmark ShARe/CLEF eHealth Evaluation Lab 2014 shared task 2 datasets. In addition to the CLEF dataset, we also used the social media text, especially the medical blog posts. Experimental results of the proposed approach illustrate that our proposed approach achieves significant performance improvements over the state-of-the-art techniques and the highly competitive deep learning--based baseline methods.

...read moreread less

12 citations

Book Chapter•10.1007/978-3-030-45442-5_76•

CLEF eHealth Evaluation Lab 2020.

[...]

Hanna Suominen¹, Hanna Suominen², Hanna Suominen³, Liadh Kelly⁴, Lorraine Goeuriot⁵, Martin Krallinger⁶ - Show less +2 more•Institutions (6)

Australian National University¹, University of Turku², Commonwealth Scientific and Industrial Research Organisation³, Maynooth University⁴, University of Grenoble⁵, Barcelona Supercomputing Center⁶

14 Apr 2020

TL;DR: The substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information.

...read moreread less

Abstract: Laypeople’s increasing difficulties to retrieve and digest valid and relevant information in their preferred language to make health-centred decisions has motivated CLEF eHealth to organize yearly labs since 2012. These 20 evaluation tasks on Information Extraction (IE), management, and Information Retrieval (IR) in 2013–2019 have been popular—as demonstrated by the large number of team registrations, submissions, papers, their included authors, and citations (748, 177, 184, 741, and 1299, respectively, up to and including 2018)—and achieved statistically significant improvements in the processing quality. In 2020, CLEF eHealth is calling for participants to contribute to the following two tasks: The 2020 Task 1 on IE focuses on term coding for clinical textual data in Spanish. The terms considered are extracted from clinical case records and they are mapped onto the Spanish version of the International Classification of Diseases, the 10th Revision, including also textual evidence spans for the clinical codes. The 2020 Task 2 is a novel extension of the most popular and established task in CLEF eHealth on CHS. This IR task uses the representative web corpus used in the 2018 challenge, but now also spoken queries, as well as textual transcripts of these queries, are offered to the participants. The task is structured into a number of optional subtasks, covering ad-hoc search using the spoken queries, textual transcripts of the spoken queries, or provided automatic speech-to-text conversions of the spoken queries. In this paper we describe the evolution of CLEF eHealth and this year’s tasks. The substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information.

...read moreread less

12 citations

Melaxtech: A report for CLEF 2020 - ChEMU Task of Chemical Reaction Extraction from Patent.

[...]

Jingqi Wang, Yuankai Ren, Zhi Zhang, Yaoyun Zhang

1 Jan 2020

TL;DR: In the CLEF 2020 Task of Chemical Reaction Extraction from Patent as discussed by the authors, the task consisted of two subtasks: (1) Named entity recognition to identify compounds and different semantic roles in the chemical reaction; (2) Event extraction to identify event-triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1.

...read moreread less

Abstract: This work describes the participation of the Melaxtech team in the CLEF 2020 – ChEMU Task of Chemical Reaction Extraction from Patent. The task consisted of two subtasks: (1) Named entity recognition to identify compounds and different semantic roles in the chemical reaction. (2) Event extraction to identify event-triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1. We developed hybrid approaches combining both deep learning models and pattern-based rules for this task. Our approaches achieved state-of-art results in both subtasks, with the best F1 of 0.957 for entity recognition and the best F1 of 0.9536 for event extraction, indicating the proposed approaches are promising.

...read moreread less

11 citations

Journal Article•10.1007/S10930-020-09925-W•

The Last Secret of Protein Folding: The Real Relationship Between Long-Range Interactions and Local Structures.

[...]

Aoneng Cao¹•Institutions (1)

Shanghai University¹

10 Oct 2020-Protein Journal

TL;DR: The CLEF hypothesis provides a simple solution to all protein folding paradoxes, and proposes a “CLEF age” or “Stone Age” for the prebiotic evolution of proteins.

...read moreread less

Abstract: The protein folding problem has been extensively studied for decades, and hundreds of thousands of protein structures have been solved. Yet, how proteins fold from a linear peptide chain to their unique 3D structures is not fully understood. With key clues having emerged unexpectedly from the field of nanoscience, a "Confined Lowest Energy Fragment" (CLEF) hypothesis was proposed. The CLEF hypothesis states that a protein chain can be divided into CLEFs, the semi-independent folding units, by a small number of key residues that form key long-range interactions. The native structure of a CLEF is the lowest energy state under the constraints of the key long-range interactions, but the native structure of the whole protein is not necessary the lowest energy state as Anfinsen's thermodynamic hypothesis suggested. The CLEF hypothesis proposes a unified CLEF mechanism for protein folding, basically a two-step process. In the first step, the favorable enthalpy of CLEFs for native structures quickly brings those residues for the key long-range interactions together, forming intermediates corresponding to the so-called hydrophobic collapse. In the second step, those collapsed key residues shuffle for the right combination to form the native key long-range interactions. The CLEF hypothesis provides a simple solution to all protein folding paradoxes, and proposes a "CLEF Age" or "Stone Age" for the prebiotic evolution of proteins.

...read moreread less

11 citations

Profiling Fake News Spreaders on Twitter based on TFIDF Features and Morphological Process. Notebook for PAN at CLEF 2020.

[...]

Mohamed Lichouri, Mourad Abbas, Besma Benaziz

1 Jan 2020

TL;DR: A comparison study between a set of classifiers has been carried out and the best results were achieved using the model LSVC which yielded an f1-score of 76% and 58.50% for Spanish and English, respectively.

...read moreread less

Abstract: In this paper, we present a description of our experiments on Profiling Fake News Spreaders on Twitter based on TFIDF Features and Morphological Processes as stemming, lemmatization and part of speech tagging. A comparison study between a set of classifiers has been carried out. The best results were achieved using the model LSVC which yielded an f1-score of 76% and 58.50% for Spanish and English, respectively.

...read moreread less

10 citations

RMIT at PAN-CLEF 2020: Profiling Fake News Spreaders on Twitter.

[...]

Xinhuan Duan, Elham Naghizade, Damiano Spina, Xiuzhen Zhang

1 Jan 2020

TL;DR: This paper approaches this challenge through extracting linguistic and sentiment features from users’ tweet feed as well as retrieving the presence of emojis, hashtags and political bias in their tweets, and achieves 72% accuracy, being among the top-4 results obtained by systems for the task in the English language.

...read moreread less

Abstract: Automatic detection of fake news in social media has become a prominent research topic due to its widespread, adverse effect on not only the society and public health but also on economy and democracy. The computational approaches towards automatic detection of fake news span from analyzing the source credibility, user credibility, as well as social network structure and the news content. However, the studies on user credibility in this context have largely focused on the frequency and times of engaging in a fake news propagation rather than profiling users based on the content of their tweets. In this paper, we approach this challenge through extracting linguistic and sentiment features from users’ tweet feed as well as retrieving the presence of emojis, hashtags and political bias in their tweets. These features are then used to classify users into spreaders or non-spreaders of fake news. Our proposed approach achieves 72% accuracy, being among the top-4 results obtained by systems for the task in the English language.

...read moreread less

9 citations

IAM at CLEF eHealth 2020: Concept Annotation in Spanish Electronic Health Records.

[...]

Sébastien Cossin, Vianney Jouhet

1 Jan 2020

TL;DR: This paper tackled the task of automatically assigning ICD-10 diagnosis and procedure codes to Spanish electronic health records using a dictionary-based approach and achieved an F1-score of 0.52 on a test set of 250 clinical cases.

...read moreread less

Abstract: In this paper, we describe the approach and the results of our participation in task 1 (multilingual information extraction) of the CLEF eHealth 2020 challenge. We tackled the task of automatically assigning ICD-10 diagnosis and procedure codes to Spanish electronic health records. We used a dictionary-based approach using only materials provided by the task organizers. The training set consisted of 750 clinical cases annotated by a medical expert. Our system achieved an F1-score of 0.69 for the detection of diagnoses and 0.52 for the detection of procedures on a test set of 250 clinical cases.

...read moreread less

Journal Article•10.1177/0305735618817925•

Effect of processing fluency on metamemory for written music in piano players

[...]

Bennett L. Schwartz¹, Zehra F. Peynircioğlu², Joshua R Tatz²•Institutions (2)

Florida International University¹, American University²

01 Sep 2020-Psychology of Music

TL;DR: In this paper, the effects of processing fluency on metamemory for written music were examined for short sequences notated in either treble or bass clef by playing them on a sile...

...read moreread less

Abstract: We examined the effects of processing fluency on metamemory for written music. In Experiment 1, piano players studied short sequences notated in either treble or bass clef by playing them on a sile...

...read moreread less

ICB-UMA at CLEF e-Health 2020 Task 1: Automatic ICD-10 coding in Spanish with BERT.

[...]

Guillermo López-García, Jose M. Jerez, Francisco J. Veredas

1 Jan 2020

TL;DR: BERT-SciELO, a BERT-Base model pre-trained from scratch on an unlabeled corpus of biomedical articles in Spanish, achieved the best results among three submitted systems, obtaining a final Mean Average Precision (MAP) metric score of 0.482 on the evaluation set.

...read moreread less

Abstract: This working notes paper presents our contribution to the CLEF eHealth 2020 Task 1. Our team has participated in the CodiEsp-D subtask, the first shared task consisted in the automatic clinical coding of medical cases in Spanish, annotated with ICD-10-CM codes. We tackled the task as a multi-label classification problem using BERT model [4]. With the aim of leveraging all the language modeling capacities of the deep bidirectional encoder architecture of BERT, we developed a tailored approach to annotate short fragments of text extracted from the long clinical cases present in the CodiEsp corpus and use them as input to the model. Two publicly available Spanish versions of BERT, namely BETO [3] and BERT-SciELO [1], were fine-tuned on the CodiEsp-D corpus extended by a set of abstracts annotated with ICD-10 codes, following our fragment-based classification approach. BERT-SciELO, a BERT-Base model pre-trained from scratch on an unlabeled corpus of biomedical articles in Spanish, achieved the best results among our three submitted systems, obtaining a final Mean Average Precision (MAP) metric score of 0.482 on the evaluation set.

...read moreread less

DPRL Systems in the CLEF 2020 ARQMath Lab.

[...]

Behrooz Mansouri, Douglas W. Oard, Richard Zanibbi

1 Jan 2020

TL;DR: The participation of the Document and Pattern Recognition Lab from the Rochester Institute of Technology in the CLEF 2020 ARQMath lab yielded strong results, the Task 1 results were less competitive.

...read moreread less

Abstract: This paper describes the participation of the Document and Pattern Recognition Lab from the Rochester Institute of Technology in the CLEF 2020 ARQMath lab. There are two tasks defined for ARQMath: (1) Question Answering, and (2) Formula Retrieval. Four runs were submitted for Task 1 using systems that take advantage of text and formula embeddings. For Task 2, three runs were submitted: one uses only formula embedding, another uses formula and text embeddings, and the final one uses formula embedding followed by re-ranking results by tree-edit distance. The Task 2 runs yielded strong results, the Task 1 results were less competitive.

...read moreread less

Proceedings Article•

Named Entity Recognition and Linking on Historical Newspapers: UvA.ILPS & REL at CLEF HIPE 2020.

[...]

Vera Provatorova, Svitlana Vakulenko¹, Evangelos Kanoulas², Koen Dercksen³, Johannes M. van Hulst³ - Show less +1 more•Institutions (3)

Vienna University of Economics and Business¹, University of Amsterdam², Radboud University Nijmegen³

1 Jan 2020

TL;DR: This paper describes the submission to the CLEF HIPE 2020 shared task on identifying named entities in multi-lingual historical newspapers in French, German and English, and uses an ensemble of fine-tuned BERT models for named entity recognition and entity linking.

...read moreread less

Abstract: This paper describes our submission to the CLEF HIPE 2020 shared task on identifying named entities in multi-lingual historical newspapers in French, German and English. The subtasks we addressed in our submission include coarse-grained named entity recognition, entity mention detection and entity linking. For the task of named entity recognition we used an ensemble of fine-tuned BERT models; entity linking was approached by three different methods: (1) a simple method relying on ElasticSearch retrieval scores, (2) an approach based on contextualised text embeddings, and (3) REL, a modular entity linking system based on several state-of-the-art components.

...read moreread less

PSU at CLEF-2020 ARQMath Track: Unsupervised Re-ranking using Pretraining.

[...]

Shaurya Rohatgi, Jian Wu, C. Lee Giles

1 Jan 2020

TL;DR: This paper elaborates on the submission to the ARQMath track at CLEF 2020, using a two-stage retrieval technique in which the first stage is a fusion of traditional BM25 scoring and tf-idf with cosine similarity-based retrieval while the second stage is a re-ranking technique using contextualized embeddings.

...read moreread less

Overview of the CLEF eHealth 2020 task 2: Consumer health search with ad hoc and spoken queries

[...]

Lorraine Goeuriot, Hanna Suominen¹, Hanna Suominen², Liadh Kelly³, Zhengyang Liu¹, Gabriella Pasi⁴, Gabriela Gonzalez Saez, Marco Viviani⁴, Chenchen Xu¹ - Show less +5 more•Institutions (4)

Australian National University¹, University of Turku², Maynooth University³, University of Milan⁴

1 Jan 2020

TL;DR: The task was a novel extension of the most popular and established task in CLEF eHealth on Consumer Health Search, which makes responses to spoken ad-hoc queries, and described the resources created for the task and evaluation methodology adopted.

...read moreread less

Abstract: In this paper, we provide an overview of the CLEF eHealth Task 2 on Information Retrieval (IR), organized as part of the eighth annual edition of the CLEF eHealth evaluation lab by the Conference and Labs of the Evaluation Forum. Its aim was to address laypeople’s difficulties in retrieving and digesting valid and relevant information, in their preferred language, to make health-centred decisions. The task was a novel extension of the most popular and established task in CLEF eHealth on Consumer Health Search (CHS), which makes responses to spoken ad-hoc queries. In total, five submissions were made to its two subtasks; three addressed the ad-hoc IR task on text data and two considered the spoken queries. Herein, we describe the resources created for the task and evaluation methodology adopted. We also summarize lab submissions and results. As in previous years, organizers have made data, methods, and tools associated with the lab tasks available for future research and development. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 September 2020, Thessaloniki, Greece. ? With equal contribution, LG & HS were the co-leaders of the task. LK, ZL, GP, GGS, MV, and CX were the task co-organizers and contributors to the evaluation conceptualization, dataset creation, assessments, and measurements.

...read moreread less

IXA-AAA at CLEF eHealth 2020 CodiEsp. Automatic Classification of Medical Records with Multi-label Classifiers and Similarity Match Coders.

[...]

Alberto Blanco, Alicia Pérez, Arantza Casillas

1 Jan 2020

TL;DR: These working notes present the participation of the IXAAAA team on the CodiEsp Track, as part of the CLEF 2020, and developed several systems to cope with the three sub-tasks, including tree-based multi-label classifiers, similarity match strategies, and ensemble models.

...read moreread less

Abstract: These working notes present the participation of the IXAAAA team on the CodiEsp Track, as part of the CLEF 2020. The track is about automatic coding of clinical records according to the International Classification of Diseases 10th revision (ICD-10). There are three sub-tasks: CodiEsp-D, CodiEsp-P and CodiEsp-X. The two main tasks, CodiEsp-D and CodiEsp-P, aim to develop systems able to automatically classify clinical texts according to the ICD-10, respectively for diagnostics and procedures. CodiEsp-X, by contrast, is an exploratory sub-task within the framework of Explainable AI in which the goal is to detect the text fragment that motivates the presence of the ICD code. For the IXA-AAA team participation, we have developed several systems to cope with the three sub-tasks, including tree-based multi-label classifiers, similarity match strategies, and ensemble models. For the similarity match, we have explored several approaches and algorithms from string edit distances as Levenshtein to dense representation with Transformers grounded BERT models. Our best results overall are achieved by the combination of models, with a MAP of 69.8% for CodiEsp-D and 48.1% for CodiEsp-P. Regarding the exploratory task, CodiEsp-X, our best coder achieve a micro F1-Score of 30.6%.

...read moreread less

FLE at CLEF eHealth 2020: Text Mining and Semantic Knowledge for Automated Clinical Encoding.

[...]

Nuria García-Santa, Kendrick Cetina

1 Jan 2020

NLPatVCU CLEF 2020 ChEMU Shared Task System Description.

[...]

Darshini Mahendran, Gabrielle Gurdin, Nastassja Lewinski, Christina Tang, Bridget T. McInnes - Show less +1 more

1 Jan 2020

TL;DR: This paper describes the team’s participation in the Tracks 1 & 2 from Conference and Labs of the Evaluation Forum (CLEF 2020) Challenge organized by Cheminformatics Elsevier Melbourne University for extracting information over chemical reactions from patents and discusses their systems: MedaCy, a python-based supervised multi-class entity recognition system, and RelEx, a Python-based relation extraction system which includes rule-based and supervised learning pipelines.

...read moreread less

Abstract: This paper describes our team’s participation in the Tracks 1 & 2 from Conference and Labs of the Evaluation Forum (CLEF 2020) Challenge organized by Cheminformatics Elsevier Melbourne University for extracting information over chemical reactions from patents. We discuss our systems: MedaCy, a python-based supervised multi-class entity recognition system, and RelEx, a python-based relation extraction system which includes rule-based and supervised learning pipelines. Our best model for Task 1 obtained an overall relaxed precision of 0.95 and exact precision of 0.87; relaxed recall of 0.99 and exact recall of 0.86; and relaxed F1 score of 0.97 and exact F1 score of 0.87. Our best model for Task 2 obtained an overall precision of 0.80; recall of 0.54; and F1 score of 0.65.

...read moreread less

CLRG ChemNER: A Chemical Named Entity Recognizer @ ChEMU CLEF 2020.

[...]

C. S. Malarkodi, Pattabhi R. K. Rao, Sobha Lalitha Devi

1 Jan 2020

TL;DR: This paper describes the system developed for ChEMU @ CLEF Cheminformatics Elsevier Melbourne University lab, Named Entity Recognition task for identifying chemical compounds as well as their types in context, i.e., to assign the label of a chemical compound according to the role which the compound plays within a chemical reaction from patent documents.

...read moreread less

Abstract: This paper describes our system developed for ChEMU @ CLEF Cheminformatics Elsevier Melbourne University lab, Named Entity Recognition (NER) task for identifying chemical compounds as well as their types in context, i.e., to assign the label of a chemical compound according to the role which the compound plays within a chemical reaction from patent documents. We have presented two systems which use Conditional random fields (CRFs) algorithms and Artificial Neural Networks (ANN). In this work we used feature set that includes linguistic, orthographical and lexical clue features. In the development of systems, we have used only the training data provided by the track organizers and no other external resources or embedding models were used. We obtained an F-score of 0.6640 using CRFs and F-Score of 0.3764 using ANN on the test data.

...read moreread less

SINAI at CLEF eHealth 2020: Testing Different pre-trained Word Embeddings for Clinical Coding in Spanish.

[...]

José M. Perea-Ortega, Pilar López-Úbeda, Manuel Carlos Díaz-Galiano, María Teresa Martín Valdivia, Luis Alfonso Ureña López - Show less +1 more

1 Jan 2020

TL;DR: The main finding was that combining word embeddings could be a useful strategy to apply for deep learning-based approaches, even though the combinedembeddings do not belong to the medical domain.

...read moreread less

Abstract: This paper describes the system presented by the SINAI team for the Multilingual Information Extraction task of the CLEF eHealth Lab 2020. This task focuses on the automatic assignment of the International Classification of Diseases (ICD) codes to health-related texts in Spanish. Our proposal follows a deep learning-based approach where we have used the bidirectional variant of a Long Short Term Memory (LSTM) network along with a stacked Conditional Random Fields (CRF) decoding layer (BiLSTM+CRF). The aim of the experiments carried out was to test the performance of different pre-trained word embeddings for recognizing diagnoses and procedures in clinical text. The main finding was that combining word embeddings could be a useful strategy to apply for deep learning-based approaches, even though the combined embeddings do not belong to the medical domain. The best MAP scores achieved were 0.314 and 0.293 for the CodiEsp-D and CodiEsp-P subtasks, respectively.

...read moreread less

Overview of ARQMath 2020 (Updated Working Notes Version): CLEF Lab on Answer Retrieval for Questions on Math.

[...]

Richard Zanibbi, Douglas W. Oard, Anurag Agarwal, Behrooz Mansouri

1 Jan 2020

TL;DR: The ARQMath Lab at CLEF considers finding answers to new mathematical questions among posted answers on a community question answering site (Math Stack Exchange), which includes a formula retrieval sub-task.

...read moreread less

Abstract: The ARQMath Lab at CLEF considers finding answers to new mathematical questions among posted answers on a community question answering site (Math Stack Exchange). Queries are question posts held out from the searched collection, each containing both text and at least one formula. This is a challenging task, as both math and text may be needed to find relevant answer posts. ARQMath also includes a formula retrieval sub-task: individual formulas from question posts are used to locate formulae in earlier question and answer posts, with relevance determined considering the context of the post from which a query formula is taken, and the posts in which retrieved formulae appear.

...read moreread less

Will Longformers PAN Out for Authorship Verification? Notebook for PAN at CLEF 2020.

[...]

Juanita Ordoñez, Rafael A. Rivera Soto, Barry Y. Chen

1 Jan 2020

Book Chapter•10.1007/978-3-030-53360-1_9•

Author Profiling of Tweets

[...]

Jacques Savoy¹•Institutions (1)

University of Neuchâtel¹

1 Jan 2020

TL;DR: In this paper, the authors explore the distinct linguistic characteristics related to Twitter compared to the traditional oral or written form, and discover the linguistic features strongly related to bots, and those associated with men or women.

...read moreread less

Abstract: In this second chapter presenting stylometric applications, the social networks, and more precisely Twitter, are the source of our dataset. To explore new forms of communication, this chapter explores the distinct linguistic characteristics related to Twitter compared to the traditional oral or written form. For example, the frequency of mentions (e.g., @POTUS44), hyperlinks (e.g., www.nytimes.com), retweets or emojis (e.g., Open image in new window, Open image in new window) can be exploited to profile the author of a set of tweets. The dataset, freely available, is provided by the CLEF PAN evaluation campaign in 2019. With this corpus, the first classification task is to discriminate between tweets generated by bots or by humans. In a second application, the computer must identify tweets written by men or women. As a useful additional result, one can discover the linguistic features strongly related to bots, and those associated with men or women.

...read moreread less

ICD-10 Coding based on Semantic Distance: LSI_UNED at CLEF eHealth 2020 Task 1.

[...]

Mario Almagro, Raquel Martínez-Unanue, Víctor Fresno, Soto Montalvo, Hegler Tissot - Show less +1 more

1 Jan 2020

TL;DR: The unsupervised component is used to provide code evidences in EHRs exploiting a greater interpretability and the mixed approach improves the strict supervised proposals by more than 38% and 13% respectively.

...read moreread less

Abstract: This paper describes our contribution to the CLEF eHealth 2020 Task 1, consisting of the CIE-10-ES annotation of Spanish Electronic Health Records (EHRs). CIE-10-ES coding is the extended version of the ICD-10 in Spain. One of the sub-tasks is aimed at the interpretability of proposals, which is in line with the latest demands in Natural Language Processing (NLP). Moreover, ICD-10 entries generated by hospitals usually follow an extreme distribution, involving complex annotation challenges. For that reason, an unsupervised semantic similarity-based method has been explored using a representation based on SNOMED-CT clinical terminology. Since example-based learning is able to capture complex patterns, the proposal has been combined with Gradient Boosting methods to model the codes with more instances. mAP scores of 0.517 are achieved for CIE-10-ES codes associated with diagnoses and 0.398 for CIE-10-ES procedure codes. The mixed approach improves the strict supervised proposals by more than 38% and 13% respectively. Finally, the unsupervised component is used to provide code evidences in EHRs exploiting a greater interpretability.

...read moreread less

University of Amsterdam at CLEF 2020.

[...]

Mahsa S. Shahshahani, Jaap Kamps

1 Jan 2020

TL;DR: The University of Amsterdam’s participation in CLEF 2020 Touché Track consists of two tasks: Conversational Argument Retrieval and Comparative Argument Retrieval, and a pipeline to re-rank documents retrieved from Clueweb using three features: PageRank scores, web domains, and argu-mentativeness.

...read moreread less

MEDIA team: CLEF-2020 eHealth Task 1: Multilingual Information Extraction - CodiEsp.

[...]

Iker de la Iglesia, Mikel Martínez-Puente, Alex Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola - Show less +2 more

1 Jan 2020

UniNE at PAN-CLEF 2020: Profiling Fake News Spreaders on Twitter.

[...]

Catherine Ikae, Jacques Savoy

1 Jan 2020

TL;DR: This suggested approach is based on a two-stage method ignoring infrequent terms and ranking the others according to their occurrence differences between the two categories, and a classifier is implemented combining decision tree, random forest, and boosting.

...read moreread less

Abstract: In our participation of the “Profiling Fake News Spreaders on Twitter” task (both in English and Spanish), our main objective is to be able to detect Twitter user accounts used to spread disinformation, fake news, as well as conspiracy theories. To automatically solve these questions based only on the tweets' contents, we suggest to reduce the number of features (isolated words) to a few hundred. This suggested approach is based on a two-stage method ignoring infrequent terms and ranking the others according to their occurrence differences between the two categories. Finally, a classifier is implemented combining decision tree, random forest, and boosting. Our first evaluation experiments indicate an overall accuracy around 70%.

...read moreread less

Convolutional Attention Models with Post-Processing Heuristics at CLEF eHealth 2020.

[...]

Elias Moons, Marie-Francine Moens

1 Jan 2020

TL;DR: The presented models use the neural principles of convolution and attention to obtain their results and a hierarchical component is introduced as well as hierarchical post-processing heuristics that leverage the information that is inherently present in the ICD taxonomy.

...read moreread less

Abstract: In this paper, we compare state-of-the-art neural network approaches to the 2020 CLEF eHealth task 1. The presented models use the neural principles of convolution and attention to obtain their results. Furthermore, a hierarchical component is introduced as well as hierarchical post-processing heuristics. These additions successfully leverage the information that is inherently present in the ICD taxonomy.

...read moreread less