TL;DR: The 2016 CLEF eHealth Task 2 as mentioned in this paper extended the previous information extraction tasks of ShARe/CLEF ehealth evaluation labs by introducing a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10).
Abstract: This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Semantic Groups in the Unified Medical Language System® (UMLS®), which was also used for normalizing the entities. In addition, we introduced a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10). Participant systems were evaluated against a blind reference standard of 832 titles of scientific articles indexed in MEDLINE, 4 drug monographs published by the European Medicines Agency (EMEA) and 27,850 death certificates using Precision, Recall and F-measure. In total, seven teams participated, including five in the entity recognition and normalization task, and five in the death certificate coding task. Three teams submitted their systems to our newly offered reproducibility track. For entity recognition, the highest performance was achieved on the EMEA corpus, with an overall F-measure of 0.702 for plain entities recognition and 0.529 for normalized entity recognition. For entity normalization, the highest performance was achieved on the MEDLINE corpus, with an overall F-measure of 0.552. For death certificate coding, the highest performance was 0.848 F-measure.
TL;DR: This work used Peregrine, the authors' open-source indexing engine, with a dictionary based on French terms in the Unified Medical Language System supplemented with English UMLS terms that were translated into French with automatic translators to address entity recognition and normalization in a corpus of French drug labels and Medline titles.
Abstract: We participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach. For entity recognition and normalization, we used Peregrine, our open-source indexing engine, with a dictionary based on French terms in the Unified Medical Language System (UMLS) supplemented with English UMLS terms that were translated into French with automatic translators. For ICD-10 coding, we used the Solr text tagger, together with one of two ICD-10 terminologies derived from the task training ma-terial. To reduce the number of false-positive detections, we implemented several post-processing steps. On the challenge test set, our best system obtained F-scores of 0.702 and 0.651 for entity recognition in the drug labels and in the Medline titles, respectively. For entity normalization, F-scores were 0.529 and 0.474. On the test set for ICD-10 coding, our system achieved an F-score of 0.848 (precision 0.886, recall 0.813). These scores were substantially higher than the average score of the systems that participated in the challenge.
TL;DR: An SVM model is trained to perform user profiling, in terms of gender and age, on non-Twitter social media data, on English, Dutch, and Spanish data without any language-specific tuning of features or parameters.
Abstract: We trained an SVM model on tweets to perform user profiling, in terms of gender and age, on non-Twitter social media data. The system exploits features that we deemed appropriate to profile authors on social media, and that do not characterise too closely the specific usage of Twitter. Our system works on English, Dutch, and Spanish data without any language-specific tuning of features or parameters. Results on the cross-validated training set seem to indicate that features contribute rather equally to the model’s performance.
TL;DR: This paper casts the task as a machine learning problem involving the prediction of the ICD10 codes (categorical variable) from the raw text transformed into a bag-of-words matrix and relies on probabilistic topic models that are evaluated against classical classifiers such as SVM and Naive Bayes.
Abstract: This paper describes the participation of ECSTRA-INSERM team at CLEF eHealth 2016, task 2.C. The task involves extracting ICD10 codes from death certificates, mainly described with short plain texts. We cast the task as a machine learning problem involving the prediction of the ICD10 codes (categorical variable) from the raw text transformed into a bag-of-words matrix. We rely on probabilistic topic models that we evaluate against classical classifiers such as SVM and Naive Bayes. We demonstrate the effectiveness of topic models for this task in terms of prediction accuracy and result interpretation.
TL;DR: Cultural micro-blog Contextualization Workshop is aiming at providing the research community with data sets to gather, organize and deliver relevant social data related to events generating a large number of micro- blog posts and web documents.
Abstract: CLEF Cultural micro-blog Contextualization Workshop is aiming at providing the research community with data sets to gather, organize and deliver relevant social data related to events generating a large number of micro-blog posts and web documents. It is also devoted to discussing tasks to be run from this data set and that could serve applications.
TL;DR: This paper presents SIBM’s participation in the Multilingual Information Extraction task 2 of the CLEF eHealth 2016 evaluation initiative which focuses on named entity recognition in French written text and reports on the indexing of the provided QUAERO dataset with multiple knowledge organization systems (KOS) partially or totally translated in French.
Abstract: This paper presents SIBM’s participation in the Multilingual Information Extraction task 2 of the CLEF eHealth 2016 evaluation initiative which focuses on named entity recognition in French written text. We report on the indexing of the provided QUAERO dataset with multiple knowledge organization systems (KOS) partially or totally translated in French. The extraction method is available online in the form a webbased service that requests the KOS to extract clinical concepts from Electronic Health Records. It is also available via a user-friendly interface developed for clinicians. We addressed the identification of relevant clinical entities within the International Classification of Diseases version 10 in the CépiDC dataset with a system based on natural language processing and approximate string matching methods. The results obtained this year were rather satisfactory and attested significant progress, particularly in exact match recognition, since our last year’s participation.
TL;DR: This paper presents the image classification techniques performed by the IPL Group for the subfigure classification subtask of ImageCLEF 2016 Medical Task and presents the results of the runs and the extensive experiments applying early or late fusion on the results obtained from a multi-class linear kernel support vector machine.
Abstract: In this paper we present the image classification techniques performed by the IPL Group for the subfigure classification subtask of ImageCLEF 2016 Medical Task. For the visual representation of images, various state-of-the-art visual features, such as, Bag of Visual Words computed with pyramid-histogram of-visual-words descriptors and quadtree bag-of-colors were adopted. We present the results of our runs and our extensive experiments applying early or late fusion on the results obtained from a multi-class linear kernel support vector machine. Our top run was ranked 3rd among 34 runs.
TL;DR: The participation of the BiTeM/SIB Text Mining team at the CLEF eHealth 2016 evaluation lab and an ad hoc solution based on simple pattern matching to comply with the constraints of the CépiDC challenge.
Abstract: BiTeM/SIB Text Mining (http://bitem.hesge.ch/) is a University research group carrying over activities in semantic and text analytics applied to health and life sciences. This paper reports on the participation of our team at the CLEF eHealth 2016 evaluation lab. The processing applied to each evaluation corpus (QUAREO and CépiDC) was originally very similar. Our method is based on an Automatic Text Categorization (ATC) system. First, the system is set with a specific input ontology (French UMLS), and ATC assigns a rank list of related concepts to each document received in input. Then, a second module relocates all of the positive matches in the text, and normalizes the extracted entities. For the CépiDC corpus, the system was loaded with the Swiss ICD-10 GM thesaurus. However a late minute data transformation issue forced us to implement an ad hoc solution based on simple pattern matching to comply with the constraints of the CépiDC challenge. We obtained an average precision of 62% on the QUAREO entity extraction (over MEDLINE/EMEA texts, and exact/inexact), 48% on normalizing this entities, and 59% on the CépiDC subtask. Enhancing the recall by expanding the coverage of the terminologies could be an interesting approach to improve this system at moderate labour costs.
TL;DR: This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum (CLEF) in 2016, where all participant methods outperformed all 4 baselines.
Abstract: Cascaded speech recognition (SR) and information extraction (IE) could support the best practice for clinical handover and release clinicians’ time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges and hence, processing correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum (CLEF) in 2016. This IE task built on the 2015 CLEF eHealth Task on SR by using its 201 synthetic handover documents for training and validation (appr. 8, 500 + 7, 700 words) and releasing another 100 documents with over 6, 500 expert-annotated words for testing. It attracted 25 team registrations and 3 team submissions with 2 methods each. When using the macro-averaged F1 over the 35 form headings present in the training documents for evaluation on the test documents, all participant methods outperformed all 4 baselines, including the organizers’ method (F1 = 0.25), published in 2015 in a top-tier medical informatics journal and provided to the participants as an option to build on, a random classifier (F1 = 0.02), and majority classifiers for the two most common classes (i.e., NA to filter out text irrelevant to the form and the most common form heading, both with F1 < 0.00). The top-2 methods (F1 = 0.38 and 0.37) had statistically significantly (p < 0.05, Wilcoxon signed-rank test) better performance than the third-best method (F1 = 0.35). In comparison, the top-3 methods and the organizers’ method (7th) had F1 of 0.81, 0.80, 0.81, and 0.75 in the NA class, respectively.
TL;DR: An effective unsupervised author clustering authorship linking model called SPATIUM-L1 is described and evaluated, which can be adapted without any problem to different languages (such as Dutch, English, and Greek) in different genres.
Abstract: This paper describes and evaluates an effective unsupervised author clustering authorship linking model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, and Greek) in different genres (e.g., newspaper articles and reviews). As features, we suggest using the m most frequent terms of each text (isolated words and punctuation symbols with m at most 200). Applying a simple distance measure, we determine whether there is enough indication that two texts were written by the same author. The evaluations are based on six test collections (PAN AUTHOR CLUSTERING task at CLEF 2016).
TL;DR: A systematic series of experiments is run for creating a grid of points where many combinations of retrieval methods and components adopted by MultiLingual Information Access (MLIA) systems are represented to provide insights about the effectiveness of the different components and their interaction.
Abstract: In this paper we run a systematic series of experiments for creating a grid of points where many combinations of retrieval methods and components adopted by MultiLingual Information Access (MLIA) systems are represented. This grid of points has the goal to provide insights about the effectiveness of the different components and their interaction and to identify suitable baselines with respect to which all the comparisons can be made.
TL;DR: Cultural micro-blog Contextualization Workshop is aiming at providing the research community with data sets to gather, organize and deliver relevant social data related to events generating a large number of micro- blog posts and web documents.
Abstract: CLEF Cultural micro-blog Contextualization Workshop is aiming at providing the research community with data sets to gather, organize and deliver relevant social data related to events generating a large number of micro-blog posts and web documents. It is also devoted to discussing tasks to be run from this data set and that could serve applications.
TL;DR: This paper describes the participation of master's students (LITL programme, university of Toulouse) and their teachers to the CLEF eHealth 2016 campaign and the system used, a CRF classier based on a number of dierent features (POS tagging, generic word lists and syntactic parsing).
Abstract: This paper describes the participation of master's students (LITL programme, university of Toulouse) and their teachers to the CLEF eHealth 2016 campaign. Two runs were submitted for task 2 (multilingual information extraction) which consisted in the recognition and categorization of medical entities in French biomedical documents. The system used consists of a CRF classier based on a number of dierent features (POS tagging, generic word lists and syntactic parsing). In addition , several patterns were used on the CRF's output in order to extract more complex entities. The best run achieved high precision (0.640.78) but lower recall (0.320.40), with an overall F1-measure of 0.430.53.
TL;DR: A query expansion module which is based on word2vec, a word embedding toolkit is designed which helps the CYUT CSIE system to get better performance in suggestion track.
Abstract: The Social Book Search (SBS) Lab is part of CLEF 2016 lab series. This is the fourth time that the CYUT CSIE team attends the SBS track. The content of topics has changed a little bit by the organizer; therefore, we make necessary modification on our system, which is based on keyword searching and ranking by social features. This year, we design a query expansion module which is based on word2vec, a word embedding toolkit. The new module helps our system to get better performance in suggestion track.
TL;DR: A method is proposed for finding statistical inflectional rules based on minimum edit distance table of word pairs and the likelihoods of the rules in a language to statistically stem words and can be used in different text mining tasks.
Abstract: There have been multiple attempts to resolve various inflection matching problems in information retrieval. Stemming is a common approach to this end. Among many techniques for stemming, statistical stemming has been shown to be effective in a number of languages, particularly highly inflected languages. In this paper we propose a method for finding affixes in different positions of a word. Common statistical techniques heavily rely on string similarity in terms of prefix and suffix matching. Since infixes are common in irregular/informal inflections in morphologically complex texts, it is required to find infixes for stemming. In this paper we propose a method whose aim is to find statistical inflectional rules based on minimum edit distance table of word pairs and the likelihoods of the rules in a language. These rules are used to statistically stem words and can be used in different text mining tasks. Experimental results on CLEF 2008 and CLEF 2009 English-Persian CLIR tasks indicate that the proposed method significantly outperforms all the baselines in terms of MAP.
TL;DR: The participation of InfoLab in the patient-centred information retrieval task of the CLEF eHealth 2016 lab is described and the performance of several query expansion strategies using different sources of terms and different methods to select the terms to be added to the original query is analysed.
Abstract: In this paper we describe the participation of InfoLab in the patient-centred information retrieval task of the CLEF eHealth 2016 lab. We analyse the performance of several query expansion strategies using different sources of terms and different methods to select the terms to be added to the original query. One of the strategies uses pseudo relevance feedback for term selection. The other strategies use external sources such as Wikipedia articles and definitions from the UMLS Metathesaurus for term selection. In the end, readability metrics such as SMOG, FOG and Flesch-Kincaid were used to re-rank the documents retrieved using the expanded queries. As the relevance and readability assessments are not available we can’t make any conclusion regarding the results of our approaches.
TL;DR: This work uses synonyms and hypernyms from a large medical ontology to generate alternative formulations for a query and results obtained by the reformulated queries are fused using the Borda rank aggregation algorithm.
Abstract: Recent surveys have shown that a growing number internet users seek medical help online. Yet, recent research [12] has shown that many commercial search engine still struggle in completely satisfying the information need of users. In this work, we present a study on the use of medical terms for query reformulation. We use synonyms and hypernyms from a large medical ontology to generate alternative formulations for a query; Results obtained by the reformulated queries are fused using the Borda rank aggregation algorithm.
TL;DR: Results illustrate potentials for multi-dimensional evaluation of recommendation algorithms in a living lab and simulation based evaluation setting and overviews ideas on living lab evaluation that have been presented as part of a “New Ideas” track at the conference in Portugal.
Abstract: Running in its third year at CLEF, NewsREEL challenged participants
to develop news recommendation algorithms and have them benchmarked in
an online (Task 1) and offline setting (Task 2), respectively. This paper provides
an overview of the NewsREEL scenario, outlines this year’s campaign, presents
results of both tasks, and discusses the approaches of participating teams. Moreover,
it overviews ideas on living lab evaluation that have been presented as part
of a “New Ideas” track at the conference in Portugal. Presented results illustrate
potentials for multi-dimensional evaluation of recommendation algorithms in
a living lab and simulation based evaluation setting.
TL;DR: The focus of the Conference is "Experimental IR" as carried out in the CLEF Labs and other evaluation forums, it featured keynotes by Greg Greffenstette, Mounia Lalmas, and Doug Oard, and 43 papers have been presented, covering a wide range of topics.
Abstract: This is a report on the sixth edition of the Conference and Labs of the Evaluation Forum (CLEF 2015), held in early September 2015, in Toulouse, France. CLEF was a four day event combining a Conference and an Evaluation Forum. The focus of the Conference is "Experimental IR" as carried out in the CLEF Labs and other evaluation forums, it featured keynotes by Greg Greffenstette, Mounia Lalmas, and Doug Oard, and 43 papers have been presented, covering a wide range of topics. There were a total of eight Labs: eHealth, ImageCLEF, LifeCLEF, Living Labs for IR, NEWSREEL, PAN, QA, and Social Book Search, addressing a wide range of tasks, media, languages, and ways to go beyond standard test collections.
TL;DR: The article presents also design of the CLEF (Conference and Labs of the Evaluation Forum) evaluation labs with special attention paid to CHiC (Cultural Heritage in CLEF).
Abstract: We present the genesis and evolution of methods and measures of IR systems evaluation. The design of the Cranfield experiment, a long-term model for evaluation methodology, is described. Evolution of current methodology of IR systems evaluation, developed at the annual TREC (Text REtrieval Conference) is provided, and the most popular and current measures described. The article presents also design of the CLEF (Conference and Labs of the Evaluation Forum) evaluation labs with special attention paid to CHiC (Cultural Heritage in CLEF). We describe the design of Polish Task in CHiClab and discuss conclusions from lab realisation.
TL;DR: A Part-of-Speech (POS) based query term weighting approach which assigns different weights to the query terms according to their POS categories is explored, which applied with the optimal weights obtained from TREC 2011 and 2012 Medical Records Track.
Abstract: This paper presents the participation of MayoNLPTeam in the 2016 CLEF eHealth Information Retrieval Task (IR Task 1: ad-hoc search). We explored a Part-of-Speech (POS) based query term weighting approach which assigns different weights to the query terms according to their POS categories. The weights are learned by defining an objective function based on the mean average precision. We applied the proposed approach with the optimal weights obtained from TREC 2011 and 2012 Medical Records Track into the Query Likelihood model (Run 2) and Markov Random Field (MRF) models (Run 3). The conventional Query Likelihood model was implemented as the baseline (Run 1).
TL;DR: Two different approaches using word vectors learnt by Word2Vec with Wikipedia are attempted to deliver useful medical information through pseudo-relevance feedback with two different usage of the word vectors.
Abstract: User’s searching activity to obtain relevant medical information becomes very common as the general public uses the Web as source of health information. As a response to this phenomenon, there have been a number of approaches to find useful information for diagnosing or understanding their health conditions from the Web or medical literatures. As an ongoing effort to deliver useful medical information, we attempted two different approaches using word vectors learnt by Word2Vec with Wikipedia. At first, initial documents are obtained using a search engine. Based the retrieved documents, pseudo-relevance feedback is applied with two different usage of the word vectors. In the first approach, a feedback model is constructed using new relevance scores using the word vectors while it is constructed with a new query expanded.
TL;DR: This paper describes its participation in the CLEF eHealth 2016 task 3: Patient-Centred Information Retrieval focusing on the clinical web documents based on user queries in the health forum and employs multiple features based unsupervised reranking method to the documents retrieved by a baseline system.
Abstract: In this paper, we describe our participation in the CLEF eHealth 2016 task 3: Patient-Centred Information Retrieval focusing on the clinical web documents based on user queries in the health forum. In our participation, we submitted three runs in ad-hoc search and two runs in query variation search subtasks. In ad-hoc search, the main challenge is to retrieve high quality clinical documents based on user query. For ad-hoc search, we employ multiple features based unsupervised reranking method to the documents retrieved by a baseline system. During the query variation search, the main challenge is to generate a ranked list of documents covering the different variations of the query. To tackle the query variation problem, first we formulate a query and a set of information needs from the query variation. Then, we re-rank the documents retrieved for the formulated query by focusing on the set of information needs.
TL;DR: The approach used by the LIG-MRIM research group to the participation of the task 3 (TimeLine illustration based on Microblogs) for the CLEF of Cultural Microblog Contextualization track deals with the retrieval of tweets related to cultural events (music festivals).
Abstract: This paper presents the approach used by the LIG-MRIM research group to the participation of the task 3 (TimeLine illustration based on Microblogs) for the CLEF of Cultural Microblog Contextualization track. This task deals with the retrieval of tweets related to cultural events (music festivals). For the content-based elements, we use the classical BM25 model [4]. Then, we diversify the results based on duplicate removal, using tf-based representations of tweets. In a third step, we apply optional re-ranking related to time-line, activity and popularity of authors of tweets.
TL;DR: This paper presents the work on the 2016 CLEF eHealth Task 3.0 using CHV to expand query and proposed a learning-to-rank algorithm to re-rank the result.
Abstract: This paper presents our work on the 2016 CLEF eHealth Task 3.We used Indri to conduct our experiments. We used CHV to expand query and proposed a learning-to-rank algorithm to re-rank the result.
TL;DR: The resources created for these tasks, evaluation methodology adopted and a brief summary of participants to this year’s challenges are described and some results obtained are provided.
Abstract: In this paper we provide an overview of the fourth edition of the CLEF eHealth evaluation lab. CLEF eHealth 2016 continues our evaluation resource building efforts around the easing and support of patients, their next-of-kins and clinical staff in understanding, accessing and authoring eHealth information in a multilingual setting. This year’s lab offered three tasks: Task 1 on handover information extraction related to Australian nursing shift changes, Task 2 on information extraction in French corpora, and Task 3 on multilingual patient-centred information retrieval considering query variations. In total 20 teams took part in these tasks (3 in Task 1, 7 in Task 2 and 10 in Task 3). Herein, we describe the resources created for these tasks, evaluation methodology adopted and provide a brief summary of participants to this year’s challenges and some results obtained. As in previous years, the organizers have made data and tools associated with the lab tasks available for future research and development.