TL;DR: Five models utilizing linguistic meta information extracted from the texts of each evaluated user and combine them with classifiers based on Bag of Words models, Paragraph Vector, Latent Semantic Analysis, and Recurrent Neural Networks using Long Short Term Memory are described.
Abstract: Methods for automatic early detection of depressed individuals based on written texts can help in research of this disorder and especially offer better assistance to those affected. FHDO Biomedical Computer Science Group (BCSG) has submitted results obtained from five models for the CLEF 2017 eRisk task for early detection of depression that are described in this paper. All models utilize linguistic meta information extracted from the texts of each evaluated user and combine them with classifiers based on Bag of Words (BoW) models, Paragraph Vector, Latent Semantic Analysis (LSA), and Recurrent Neural Networks (RNN) using Long Short Term Memory (LSTM). BCSG has achieved top performance according to ERDE5 and F1 score for this task.
TL;DR: An overview of eRisk 2017, the first year that this lab was organized at CLEF, explores issues of evaluation methodology, effectiveness metrics and other processes related to early risk detection.
Abstract: This paper provides an overview of eRisk 2017. This was the first year that this lab was organized at CLEF. The main purpose of eRisk was to explore issues of evaluation methodology, effectiveness metrics and other processes related to early risk detection. Early detection technologies can be employed in different areas, particularly those related to health and safety. The first edition of eRisk had two possible ways to participate: a pilot task on early risk detection of depression, and a workshop open to the submission of papers related to the topics of the lab.
TL;DR: This work applied the Baseline Model Implementation from the TREC Total Recall Track to the CLEF eHealth 2017 task of screening MEDLINE abstracts to identify articles reporting studies to be considered for inclusion to suggest that TAR can substantially improve the efficiency of abstract screening without compromising recall.
TL;DR: The participation of the KFU team in the CLEF eHealth 2017 challenge is described, namely “Multilingual Information Extraction ICD-10 coding”, for which they implemented recurrent neural networks to automatically assign ICD10 codes to fragments of death certificates written in English.
Abstract: This paper describes the participation of the KFU team in the CLEF eHealth 2017 challenge. Specifically, we participated in Task 1, namely “Multilingual Information Extraction ICD-10 coding” for which we implemented recurrent neural networks to automatically assign ICD10 codes to fragments of death certificates written in English. Our system uses Long Short-Term Memory (LSTM) to map the input sequence into a vector representation, and then another LSTM to decode the target sequence from the vector. We initialize the input representations with word embeddings trained on user posts in social media. The encoderdecoder model obtained F-measure of 85.01% on a full test set with significant improvement as compared to the average score of 62.2% for all participants’ approaches. We also obtained significant improvement from 26.1% to 44.33% on an external test set as compared to the average score of the submitted runs.
TL;DR: A Learning to Rank system that uses a novel set of syntactic and semantic features to improve consumer health search and was evaluated on the 2016 CLEF eHealth dataset, outperforming the best method.
Abstract: For many internet users, searching for health advice online is the first step in seeking treatment. We present a Learning to Rank system that uses a novel set of syntactic and semantic features to improve consumer health search. Our approach was evaluated on the 2016 CLEF eHealth dataset, outperforming the best method by 26.6% in NDCG@10.
TL;DR: The techniques employed for the University of Arizona team's participation in this early risk detection shared task leveraged external information beyond the small training set, including a preexisting depression lexicon and concepts from the Unified Medical Language System as features.
Abstract: The 2017 CLEF eRisk pilot task focuses on automatically detecting depression as early as possible from a users' posts to Reddit. In this paper we present the techniques employed for the University of Arizona team's participation in this early risk detection shared task. We leveraged external information beyond the small training set, including a preexisting depression lexicon and concepts from the Unified Medical Language System as features. For prediction, we used both sequential (recurrent neural network) and non-sequential (support vector machine) models. Our models perform decently on the test data, and the recurrent neural models perform better than the non-sequential support vector machines while using the same feature sets.
TL;DR: It is determined that the TFIDF-based model is the best one for language variety classification and that the Deep-Learning model achieve the highest accuracy on gender classification.
Abstract: This paper describes and evaluates a strategy for author profiling using TF-IDF and a Deep-Learning model based on Convolutional Neural Networks. We applied this strategy to the author profiling task of the PAN17 challenge and show that it can be applied to different languages (English, Spanish, Portuguese and Arabic). As features, we suggest using a simple cleaning method for both models, and for the Deep-Learning model, a matrix of 2-grams of letters with punctuation marks, beginning and ending 2-grams, as features. Applying this strategy, we determine that the TFIDF-based model is the best one for language variety classification and that the Deep-Learning model achieve the highest accuracy on gender classification. The evaluations are based on four tweet collections (PAN AUTHOR PROFILING task at CLEF 2017).
TL;DR: This paper leveraged external information beyond the small training set, including a preexisting depression lexicon and concepts from the Unified Medical Language System as features for early risk detection shared task.
Abstract: The 2017 CLEF eRisk pilot task focuses on automatically detecting depression as early as possible from a users' posts to Reddit. In this paper we present the techniques employed for the University of Arizona team's participation in this early risk detection shared task. We leveraged external information beyond the small training set, including a preexisting depression lexicon and concepts from the Unified Medical Language System as features. For prediction, we used both sequential (recurrent neural network) and non-sequential (support vector machine) models. Our models perform decently on the test data, and the recurrent neural models perform better than the non-sequential support vector machines while using the same feature sets.
TL;DR: The participation of the Information Management Systems (IMS) group at CLEF eHealth 2017 Task 2.0 focuses on the problem of systematic reviews, that is articles that summarise all evidence that is published regarding a certain medical topic.
Abstract: In this paper, we describe the participation of the Information Management Systems (IMS) group at CLEF eHealth 2017 Task 2. This task focuses on the problem of systematic reviews, that is articles that summarise all evidence that is published regarding a certain medical topic. This task, known in Information Retrieval as the total recall problem, requires long and tedious search sessions by experts in the field of medicine. Automatic (or semi-automatic) approaches are essential to support these type of searches when the amount of data exceed the limits of users, i.e. in terms of attention or patience. We present the two-dimensional probabilistic version of BM25 with explicit relevance feedback together with a query aspect rewriting approach for both the simple evaluation and the cost-effective evaluation.
TL;DR: This paper proposes methods for author identification task dividing into author clustering and style breach detection using locality-sensitive hashing based clustering of real-valued vectors and a statistical approach based on some different tf-idf features that characterize documents.
Abstract: In this paper, we propose methods for author identification task dividing into author clustering and style breach detection. Our solution to the first problem consists of locality-sensitive hashing based clustering of real-valued vectors, which are mixtures of stylometric features and bag of n-grams. For the second problem, we propose a statistical approach based on some different tf-idf features that characterize documents. Applying the Wilcoxon Signed Rank test to these features, we determine the style breaches.
TL;DR: The first systematic and large-scale longitudinal study on several CLEF Adhoc-ish tasks is conducted, providing quantitative evidence that CLEF has achieved the objective which led to its establishment, i.e. making multilingual information access a reality for European languages.
Abstract: Multilingual information access and retrieval is a key concern in today global society and, despite the considerable achievements over the past years, it still presents many challenges. In this context, experimental evaluation represents a key driver of innovation and multilinguality is tackled in several evaluation initiatives worldwide, such as CLEF in Europe, NTCIR in Japan and Asia, and FIRE in India. All these activities have run several evaluation cycles and there is a general consensus about their strong and positive impact on the development of multilingual information access systems. However, a systematic and quantitative assessment of the impact of evaluation initiatives on multilingual information access and retrieval over the long period is still missing. Therefore, in this paper we conduct the first systematic and large-scale longitudinal study on several CLEF Adhoc-ish tasks – namely the Adhoc, Robust, TEL, and GeoCLEF labs – in order to gain insights on the performance trends of monolingual, bilingual and multilingual information access systems, spanning several European and non-European languages, over a range of 10 years. We learned that monolingual retrieval exhibits a stable positive trend for many of the languages analyzed, even though the performance increase is not always steady from year to year due to the varying interests of the participants, who may not always be focused on just increasing performances. Bilingual retrieval demonstrates higher improvements in recent years – probably due to the better language resources now available – and it also outperforms monolingual retrieval in several cases. Multilingual retrieval shows improvements over the years and performances are comparable to those of bilingual and monolingual retrieval, and sometimes even better. Moreover, we have found evidence that the rule-of-thumb of a 3-year duration for an evaluation task is typically enough since top performances are usually reached by the third year and sometimes even by the second year, which then leaves room for research groups to investigate relevant research issues other than top performances. Overall, this study provides quantitative evidence that CLEF has achieved the objective which led to its establishment, i.e. making multilingual information access a reality for European languages. However, the outcomes of this paper not only indicate that CLEF has steered the community in the right direction, but they also highlight the many open challenges for multilinguality. For instance, multilingual technologies greatly depend on language resources and targeted evaluation cycles help not only in developing and improving them, but also in devising methodologies which are more and more language-independent. Another key aspect concerns multimodality, intended not only as the capability of providing access to information in multiple media, but also as the ability of integrating access and retrieval over different media and languages in a way that best fits with user needs and tasks.
TL;DR: SIBM’s participation in the Task 1: Multilingual Information Extraction ICD10 coding of the CLEF eHealth 2017 evaluation initiative which focuses on named entity recognition in French and English death certificates is presented.
Abstract: This paper presents SIBM’s participation in the Task 1: Multilingual Information Extraction ICD10 coding of the CLEF eHealth 2017 evaluation initiative which focuses on named entity recognition in French and English death certificates. We addressed the identification of relevant clinical entities within the International Classification of Diseases version 10 (ICD10) in the CépiDC and CDC datasets with our CIM-IND system. CIM-IND is a multilingual system designed to recognize named entities in French and English texts using a dictionary-based approach and natural language processing and fuzzy matching methods. The evaluation was performed for two cases: (i) for all ICD10 codes, the main evaluation for the task and (ii) for ICD10 codes addressing a particular type of deaths, called external causes or violent deaths. On the English test set, our system obtained F-scores of 0.81 for all ICD10 codes and 0.4066 for external causes. On the French aligned test set, our system obtained F-scores of 0.8038 for all ICD10 codes and 0.5011 for external causes. On the French raw test set, our system obtained Fscores of 0.7636 for all ICD10 codes and 0.4897 for external causes. These scores were substantially higher than the average score of the systems that participated in the challenge.
TL;DR: The intention of the collection is to allow research groups working on PIR to both experience with and provide feedback about the proposed PIR evaluation methodology with the aim of launching a more formal PIR Lab at CLEF 2018.
Abstract: The Personalised Information Retrieval Pilot Lab (PIR-CLEF 2017) provides a forum for the exploration of evaluation of personalised approaches to information retrieval (PIR). The Pilot Lab provides a preliminary edition of a Lab task dedicated to personalised search. The PIR-CLEF 2017 Pilot Task is the first evaluation benchmark based on the Cranfield paradigm, with the potential benefits of producing evaluation results that are easily reproducible. The task is based on search sessions over a subset of the ClueWeb12 collection, undertaken by 10 users by using a clearly defined and novel methodology. The collection provides data gathered by the activities undertaken during the search sessions by each participant, including details of relevant documents as marked by the searchers. The intention of the collection is to allow research groups working on PIR to both experience with and provide feedback about our proposed PIR evaluation methodology with the aim of launching a more formal PIR Lab at CLEF 2018.
TL;DR: The CLEF NewsREEL challenge allows researchers to evaluate newsrecommendation algorithms both online (NewsREEL Live) and offline (News-======REEL Replay). Compared with the previous year, participants had a higher volume of messages and new news portals.
Abstract: The CLEF NewsREEL challenge allows researchers to evaluate news
recommendation algorithms both online (NewsREEL Live) and offline (News-
REEL Replay). Compared with the previous year NewsREEL challenged participants
with a higher volume of messages and new news portals. In the 2017
edition of the CLEF NewsREEL challenge a wide variety of new approaches have
been implemented ranging from the use of existing machine learning frameworks,
to ensemble methods to the use of deep neural networks. This paper gives an
overview over the implemented approaches and discusses the evaluation results.
In addition, the main results of Living Lab and the Replay task are explained.
TL;DR: The evaluation demonstrates that the suggested methods show slightly better performance for full document screening than abstract screening, and the role of convolutional neural networks for classifying medical documents for systematic reviews is examined.
Abstract: Identifying eligible documents for systematic reviews is one of the most time-consuming steps in writing the reviews. From retrieving numerous clinical documents to manually checking the documents with detailed criteria requires a tremendous amount of time and skilled workforce. In this paper, to increase the efficiency of the process we examine the role of convolutional neural networks for classifying medical documents for systematic reviews. The analysis is carried out in the context of the CLEF 2017 eHealth Task 2 as a participant. The evaluation demonstrates that the suggested methods show slightly better performance for full document screening than abstract screening.
TL;DR: The approach proposes a two tier, two stage process that uses a rule-based system, based on handcrafted rules and the use of Apache Solr, to perform ICD-10 code Named Entity Recognition (NER), and uses tf-idf weighted character n-gram classification models to normalize and rank a previously generated I CD-10 candidate set.
Abstract: In this paper we present our research efforts and obtained results within the CLEF eHealth challenge 2017, Track 1. The task involves the recognition and mapping of ICD-10 codes to English and French death certificates. Our approach proposes a two tier, two stage process. First, we use a rule-based system, based on handcrafted rules and the use of Apache Solr, to perform ICD-10 code Named Entity Recognition (NER). This step produces a set of possible candidates extracted from the input text. Next, we use tf-idf weighted character n-gram classification models to normalize and rank a previously generated ICD-10 candidate set. Classification models used are generated and follow the hierarchical structure of the given ICD-10 dictionaries, by creating individual classification models for the first two hierarchical levels (chapters and blocks). Finally, the top candidate, generated from the overlap between the list of possible ICD-10 code candidates (input list) and ranked list of final ICD-10 candidates (output list), is taken as the final ICD-10 code. Although the ICD-10 candidate NER is language-dependent, the normalization and ranking of candidates utilizes a language independent approach.
TL;DR: The results suggest that automatic assistance is promising for ranking the DTA literature as it could reduce the screening workload for review writer by 65% on.
Abstract: This paper describes the participation of the LIMSI-MIROR team at CLEF eHealth 2017, task 2. The task addresses the automatic ranking of articles in order to assist with the screening process of Diagnostic Test Accuracy (DTA) Systematic Reviews. We used a logistic regression classifier and handled class imbalance using a combination of class reweighting and undersampling. We also experimented with two strategies for relevance feedback. Our best run obtained an overall Average Precision of 0.179 and Work Saved over Sampling @95% Recall of 0.650. This run uses stochastic gradient descent for training but no feature selection or relevance feedback. We observe high performance variation within the queries in the test set. Nonetheless, our results suggest that automatic assistance is promising for ranking the DTA literature as it could reduce the screening workload for review writer by 65% on
TL;DR: A group of students supervised by two teachers to the CLEF eHealth 2017 campaign, task 1.1 involves the classication of death certicates in French and more precisely the labelling of each cause of death with the relevant ICD10 code.
Abstract: This paper describes the participation of a group of students supervised by two teachers to the CLEF eHealth 2017 campaign, task 1. The task involves the classication of death certicates in French and more precisely the labelling of each cause of death with the relevant ICD10 code. The system that performs the automatic coding is based on an information retrieval method using the Solr interface. Two runs were submitted according to whether the system distinguishes cases of multiple causes or not. The best performance was obtained with the system which distinguishes multiple causes, with a precision of 0.61 and a recall of 0.55.
TL;DR: This paper presents the participation as the team of IIIT Hyderabad at Task2 Technologically Assisted Reviews in Empirical Medicine as an effort to automate this task and deliver relevant information in medical literature.
Abstract: Observational evidence in clinical practice is critical in healthcare and policy making. Researchers spend a lot of time searching for relevant published articles to write a systematic review of a topic. In this paper, we present our participation as the team of IIIT Hyderabad at Task2 Technologically Assisted Reviews in Empirical Medicine as an effort to automate this task and deliver relevant information in medical literature. We base our approach on query expansion according to relevance feedback. Query expansion is a standard technique in information retrieval tasks with growing use in medical literature [1, 2]. Articles returned from pubmed query performed during a systematic review are first indexed using lucene’s inverted index. The query is porcessed for term boosting, fuzzy search and used for scoring documents according to TF-IDF similarity. Relevance feedback is used to update the query and become more pragmatic.
TL;DR: This paper describes the participation of INSA Lyon and UNI Passau at the PAN 2017 Author Profiling task and adapt the features and machine learning algorithm used for each language and each classification task by selecting the configuration that provides the best results in terms of prediction performance.
Abstract: This paper describes the participation of INSA Lyon and UNI Passau at the PAN 2017 Author Profiling task. Given the language and tweets from an author, the goal is to predict his/her gender and language variety. We consider two strategies : a "loose" classification that learns one predictive model for the gender and another one for the variety, and a "successive" classification that first predict the gender then learn a predictive model for variety, given the gender. We consider all the languages. We experiment various features representations and machine learning algorithms used in previous PAN Author Profiling editions in order to learn the models. We adapt the features and machine learning algorithm used for each language and each classification task by selecting the configuration that provides the best results in terms of prediction performance.
TL;DR: An analysis to assess the variability of the performance measures indicates that the system working stable independent of the underlying text collection and that the parameter choices did not over-fit to the training data.
Abstract: This paper describes and evaluates an effective unsupervised author clustering and authorship linking model called SPATIUM. The suggested strategy can be adapted without any difficulty to different languages (such as Dutch, English, and Greek) in different text genres (e.g., newspaper articles and reviews). As features, we suggest using the m most frequent terms (isolated words and punctuation symbols) or the m most frequent character n-grams of each text. Applying a simple distance measure, we determine whether there is enough indication that two texts were written by the same author. The evaluations are based on 60 training and 120 test problems (PAN AUTHOR CLUSTERING task at CLEF 2017). Using the most frequent terms results in a higher clustering precision, while using the most frequent character n-grams of letters gives a higher clustering recall. An analysis to assess the variability of the performance measures indicates that we have a system working stable independent of the underlying text collection and that our parameter choices did not over-fit to the training data.
TL;DR: A supervised author profiling model is described and an analysis of the top ranked terms from a feature selection method allows a better understanding of the proposed assignments and presents typical writing styles for each category.
Abstract: This paper describes and evaluates a supervised author profiling model. The suggested strategy can be adapted without any problem to various languages (such as Arabic, English, Spanish, and Portuguese). As features, we suggest using the m most frequent terms of the query text (isolated words and punctuation symbols with m at most 200). Applying a simple distance measure and looking at the nearest text profiles, we can determine the gender (with the nominal values “male” or “female”) and the language variety (e.g., in Spanish the nominal values “Argentina”, “Chile”, “Colombia”, “Mexico”, “Peru”, “Spain”, or “Venezuela”). The training and test data is available for Twitter tweets (PAN AUTHOR PROFILING task at CLEF 2017). An analysis of the top ranked terms from a feature selection method allows a better understanding of the proposed assignments and presents typical writing styles for each category.
TL;DR: The participation of the Language and Reasoning Research Group of UAM Cuajimalpa at eRisk 2017 pilot task: Early Risk Prediction on the Internet is described and results indicate that more experiments are required, as well as a more thorough analysis, regarding the pertinence of the proposed strategy.
Abstract: In this paper we describe the participation of the Language and Reasoning Research Group of UAM Cuajimalpa at eRisk 2017 pilot task: Early Risk Prediction on the Internet. The goal of the eRisk task consists in detecting with enough anticipation cases of depression on texts. For evaluating this task, organizers provided a dataset containing comments from a set of Social Media users. All comments are chronologically ordered and represent writings from depressed and non-depressed users. Our proposed approach addressed this problem by means of graph models. This type of representation allows to capture some inherent characteristics from documents that can be determined though traditional graph measurements, and then, employed as features in a supervised classification system. Obtained results indicate that more experiments, as well as a more thorough analysis is required to conclude regarding the pertinence (or not) of our proposed strategy.
TL;DR: The 2017 CLEF 2017 eHealth Evaluation Lab Information Retrieval Task as mentioned in this paper investigated the effectiveness of web search engines in providing access to medical information for common people that have no or little medical knowledge (health consumers).
Abstract: This paper provides an overview of the information retrieval (IR) Task of the CLEF 2017 eHealth Evaluation Lab. This task investigates the effectiveness of web search engines in providing access to medical information for common people that have no or little medical knowledge (health consumers). The task aims to foster advances in the development of search technologies for consumer health search by providing resources and evaluation methods to test and validate search systems. The problem considered in this year's task was to retrieve web pages to support the information needs of health consumers that are faced with a medical condition and that want to seek relevant health information online through a search engine. The task re-used the 2016 topics, to deepen the assessment pool and create a more comprehensive and reusable collection. The task had four sub-tasks: ad-hoc search, personalized search, query variations, and multilingual ad-hoc search. Seven teams participated in the task; relevance assessment is underway and assessments along with the participants results will be released at the CLEF 2017 conference. Resources for this task, including topics, assessments, evaluation scripts and participant runs are available at the task's GitHub repository: https://github.com/CLEFeHealth/CLEFeHealth2017IRtask/.
TL;DR: The overall objectives of the CLEF 2017 Dynamic Search Lab are described, the resources created for the pilot task and the evaluation methodology adopted are described.
Abstract: In this paper we provide an overview of the first edition of the CLEF Dynamic Search Lab. The CLEF Dynamic Search lab ran in the form of a workshop with the goal of approaching one key question: how can we evaluate dynamic search algorithms? Unlike static search algorithms, which essentially consider user request’s independently, and which do not adapt the ranking w.r.t the user’s sequence of interactions, dynamic search algorithms try to infer from the user’s intentions from their interactions and then adapt the ranking accordingly. Personalized session search, contextual search, and dialog systems often adopt such algorithms. This lab provides an opportunity for researchers to discuss the challenges faced when trying to measure and evaluate the performance of dynamic search algorithms, given the context of available corpora, simulations methods, and current evaluation metrics. To seed the discussion, a pilot task was run with the goal of producing search agents that could simulate the process of a user, interacting with a search system over the course of a search session. Herein, we describe the overall objectives of the CLEF 2017 Dynamic Search Lab, the resources created for the pilot task and the evaluation methodology adopted.
TL;DR: The IMS group at CLEF eHealth 2017 tackled this task by focusing on the replicability and reproducibility of the experiments and, in particular, on building a basic compact system that produces a clean dataset that can be used to implement more sophisticated approaches.
Abstract: In this paper, we describe the participation of the Information Management Systems (IMS) group at CLEF eHealth 2017 Task 1. In this task, participants are required to extract causes of death from death reports (in French and in English) and label them with the correct International Classification Diseases (ICD10) code. We tackled this task by focusing on the replicability and reproducibility of the experiments and, in particular, on building a basic compact system that produces a clean dataset that can be used to implement more sophisticated approaches.
TL;DR: The goal of the timeline illustration track is to study approaches that better retrieve microblogs issued during a cultural event, in order to get a glimpse of the attendees’ perception.
Abstract: MC2 CLEF 2017 lab investigates the relationship between cultural microblogs and their social context. This involves microblog search, classification, filtering, language recognition, localization, entity extraction, linking open data, and summarization. The goal of the timeline illustration track is to study approaches that better retrieve microblogs issued during a cultural event, in order to get a glimpse of the attendees’ perception. Regular Lab participants have access to the private massive multilingual microblog stream of The Festival Galleries project. Festivals have a large presence on social media. The topics were in four languages: Arabic, English, French and Spanish, and results were expected in any language.