TL;DR: This paper presents the results of task 3 of the ShARe/CLEF eHealth Evaluation Lab 2013, investigating the effect of using additional information such as the discharge summaries and external resources such as medical ontologies on the IR eectiveness.
Abstract: This paper presents the results of task 3 of the ShARe/CLEF eHealth Evaluation Lab 2013. This evaluation lab focuses on improving access to medical information on the web. The task objective was to investigate the eect of using additional information such as the discharge summaries and external resources such as medical ontologies on the IR eectiveness. The participants were allowed to submit up to seven runs, one mandatory run using no additional information or external resources, and three each using or not using discharge summaries.
TL;DR: This paper reports on Task 2 of the 2014 Share/CLEF eHealth evaluation lab which extended Task 1 of the 2013 ShARe/cleF e health evaluation lab by focusing on template filling of disorder attributes by instructed participants to develop a system which kept or updated a default attribute value for each task.
Abstract: In this pilot study, we aimed to generate a reference stan- dard of clinical acronyms and abbreviations normalized to concepts from a standardized, medical vocabulary for the ShARe/CLEF eHealth 2013 challenge. In this paper, we review prior text normalization shared tasks, reference standard generation approaches, and recent clinical acronym and abbreviation normalization research. We report inter-annotator agree- ment for the reference standard and performance for participant systems.
TL;DR: The first task of the 2013 ShARe/CLEF challenge was to extract disorder mention spans and their associated UMLS (Unified Medical Language System) concept unique identifiers (CUIs) as mentioned in this paper.
Abstract: The ShARe/CLEF eHealth Evaluation Lab (SHEL) organized a chal- lenge on natural language processing (NLP) and information retrieval (IR) in the medical domain in 2013. The first task of the 2013 ShARe/CLEF challenge was to extract disorder mention spans and their associated UMLS (Unified Medical Language System) concept unique identifiers (CUIs). We participated in Task 1 and developed a clinical disorder recognition and encoding system. The proposed system consists of two components: a machine learning-based approach to recognize disorder entities and a vector space model-based method to encode disorders to UMLS CUIs. The challenge organizers manually anno- tated disorder entities and corresponding UMLS CUIs in 298 clinical notes, of which 199 notes were used for training and 99 were for testing. Evaluation on the test data set showed that our system achieved the best F-measure of 0.750 for entity recognition (ranked first) and the highest F-measure of 0.514 for UMLS CUI encoding (ranked third), indicating the promise of the proposed ap- proaches.
TL;DR: The approach to author profiling task is summarized, which has used ensemble-based classification on large features set and evaluation of different methods and classification approaches are provided.
Abstract: This paper summarize our approach to author profiling task - a part of evaluation lab PAN'13. We have used ensemble-based classification on large features set. All the features are roughly described and experimen- tal section provides evaluation of different methods and classification ap- proaches.
TL;DR: The Task 1 of the ShARe/CLEF eHealth evaluation lab pilot as mentioned in this paper focused on identification and normalization of diseases and disorders in clinical reports, and the best systems had an F1 score of 0.75 (0.80 Precision, 0.71 Recall) in Task 1a and 0.59 in task 1b.
Abstract: This report outlines the Task 1 of the ShARe/CLEF eHealth evaluation lab pilot. This task focused on identification (1a) and normalization (1b) of diseases and disorders in clinical reports. It used annotations from the ShARe corpus. A total of 22 teams competed in Task 1a and 17 of them also participated Task 1b. The best systems had an F1 score of 0.75 (0.80 Precision, 0.71 Recall) in Task 1a and an accuracy of 0.59 in Task 1b. The organizers have made the text corpora, annotations, and evaluation tools available for future research and development.
TL;DR: The preparation of the data sets, the definition of the background collections, the metric used for the evaluation of the systems' submissions, and the results are described.
Abstract: This paper describes the Question Answering for Machine Reading (QA4MRE) Main Task at the 2013 Cross Language Evaluation Forum. In the main task, systems answered multiple-choice questions on documents con- cerned with four different topics. There were also two pilot tasks, Machine Reading on Biomedical Texts about Alz- heimer's disease, and Japanese Entrance Exams. This paper describes the preparation of the data sets, the definition of the background collections, the metric used for the evaluation of the systems' submissions, and the results. We intro- duced two novelties this year: auxiliary questions to evaluate systems level of inference, and a portion of questions where none of the options were correct. Nineteen groups participated in the task submitting a total of 77 runs in five languages.
TL;DR: This paper describes the approach at the PAN@CLEF2013 plagiarism detection competition, and proposes a method based on sentence similarity to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document.
Abstract: In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment algorism and similar sentences merging algorism, called Bilateral Alternating Merging Algorithm, are described in detail.
TL;DR: A method based on learning the author profile with a focus on dimensions age and gender of its author, which has shown a high level of accuracy and effectiveness in treating the gender dimension and got the best accuracy for the entire PAN 2013 competition.
Abstract: In this paper, we present a method for profiling the author of an anonymous text. Our approach is based on learning the author profile with a focus on dimensions age and gender. Our system takes as input a document which is written in English or in Spanish and generates the age and the gender of its author. First, we computed a ranked list of words that occur in the corpus and we grouped them into classes according to their similarities. Then, we calculated the TF * IDF score of each class for each document in order to find the stylistic differences between men and women, on the one hand, and those between different age intervals on the other hand. After that, we applied the learning process on 66% of the English and the Spanish corpuses using decision trees through the J48 algorithm. In factwe got the second place in the competition for the English corpus;Our system has shown a high level of accuracy and effectiveness in treating the gender dimension and we got the best accuracy for the entire PAN 2013 competition.
TL;DR: Comparison to other PAN 2013 submissions for the same task, show the presented plagiarism source retrieval system to be one of the top performers.
Abstract: This paper details the approach of implementing an English plagiarism source retrieval system to be presented at PAN 2013. The system uses the TextTiling algorithm to break a given document into segments that are centered around certain topics within the document. From these segments, keyphrases are generated using the KPMiner keyphrase extraction system. These keyphrases and segments are then used in generating queries indicative of the segment, and consequently the document. The queries are submitted to ChatNoir for finding plagiarism sources in the ClueWeb09 corpus from which the pan13 dataset is plagiarized. The target is to lessen the overall search effort while maximizing the performance by scoring unconsumed queries against the already downloaded candidate sources. Comparison to other PAN 2013 submissions for the same task, show the presented system to be one of the top performers.
TL;DR: The Australian e-Health Research Centre (AEHRC) recently participated in the ShARe/CLEF eHealth Evaluation Lab Task 1, to individuate mentions of disorders in free-text electronic health records and map disorders to SNOMED CT concepts in the UMLS metathesaurus.
Abstract: The Australian e-Health Research Centre (AEHRC) recently participated in the ShARe/CLEF eHealth Evaluation Lab Task 1. The goal of this task is to individuate mentions of disorders in free-text electronic health records and map disorders to SNOMED CT concepts in the UMLS metathesaurus. This paper details our participation to this ShARe/CLEF task. Our approaches are based on using the clinical natural language processing tool Metamap and Conditional Random Fields (CRF) to individuate mentions of disorders and then to map those to SNOMED CT concepts.
Empirical results obtained on the 2013 ShARe/CLEF task highlight that our instance of Metamap (after ltering irrelevant semantic types), although achieving a high level of precision, is only able to identify a small amount of disorders (about 21% to 28%) from free-text health records. On the other hand, the addition of the CRF models allows for a much higher recall (57% to 79%) of disorders from free-text, without sensible detriment in precision. When evaluating the accuracy of the mapping of disorders to SNOMED CT concepts in the UMLS, we observe that the mapping obtained by our ltered instance of Metamap delivers state-of-the-art e ectiveness if only spans individuated by our system are considered (`relaxed' accuracy).
TL;DR: An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the ShARe/CLEF 2013 NLP Challenge, and a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge.
Abstract: We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.
TL;DR: This paper describes the THCIB systems that used in the ShARe/CLEF eHealth 2013 task 1, and implemented two baseline systems and a combination system using the existing technologies.
Abstract: This paper describes the THCIB systems that used in the ShARe/CLEF eHealth 2013 task 1. We implemented two baseline systems and a combination system using the existing technologies. One baseline system is built using MetaMap. We built another baseline system using cTAKES. Furthermore, we developed the combination system with a system combination method. The results of combination system were submitted because the combined results performed better than either single system. We also report the experimental results on the training set and the test set.
TL;DR: A semantic-based approach concerning the identification of particular author’s traits, such as age and gender, from social media texts is presented.
Abstract: In this article we present a semantic-based approach concerning the identification of particular author’s traits, such as age and gender, from social media texts. The model here described is intended to provide information on different levels of analysis: from textual markers to semantics. Different classifiers were used to assess the performance and scope of the model.
TL;DR: A mobile application that allows users to learn music in a funny and effective way so as to arouse students' interests towards music, and provide a convenient means to students’ learning and playing music through the mobile devices is introduced.
Abstract: This paper introduces a mobile application that allows users learn music in a funny and effective way so as to arouse students’ interests towards music, and provide a convenient means to students’ learning and playing music through the mobile devices. The mobile application enables users to know their learning progress. Students can learn music effectively through game-based quizzes and exercises. The mobile application provides an elementary level e-learning platform for music learners. It serves as a stepping stone for them to further develop their interests in this field. The application is divided into three parts: fundamental musical theory, educational games, and practical use of musical instruments. The musical activities includes introducing different musical instruments, reading scores, listening to different notes, writing and recognizing treble clef, calculating the tempo of a song and playing notes from a keyboard with the sounds of different instruments.
TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually cataloging and annotating documents in a number of languages.
Abstract: Evaluation and visualization.- Multilinguality and less-resourced languages.- Applications.- Lab overviews.
TL;DR: This paper describes an approach submitted to the 2013 PAN com- petiton for the source retrieval sub-task, which employed tf-idf, noun phrases and named entities in order to submit very different queries and maximize recall.
Abstract: This paper describes an approach submitted to the 2013 PAN com- petiton for the source retrieval sub-task Three different methods for extracting queries were used, which employed tf-idf, noun phrases and named entities, in order to submit very different queries and maximize recall
TL;DR: The question answering system for Entrance Exams, which is a pilot task of the Question Answering for Machine Reading Evaluation at Conference and Labs of the Evaluation Forum (CLEF) 2013, developed a component to detect all story characters in the documents and tag all personal pronouns using coreference resolution.
Abstract: This paper describes our question answering system for Entrance Exams, which is a pilot task of the Question Answering for Machine Reading Evaluation at Conference and Labs of the Evaluation Forum (CLEF) 2013. We conducted experiments in which participants were provided with documents and multiple-choice questions. Their goals was to select one answer or leave it unanswered for each question. In our system, we developed a component to detect all story characters in the documents and tag all personal pronouns using coreference resolution. For each question, we extracted related sentences and combined them with candidate answers to create inputs for a Recognizing Textual Entailment (RTE) component. The answers were then selected based on the confidence scores from the Recognizing Textual Entailment component. We submitted five runs in the task and the run that ranked highest obtained a c@1 score of 0.35, which outperformed the baseline c@1 score of 0.25.
TL;DR: Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signicantly improves the eectiveness of the language model baseline, and measures of readability are integrated in the language models used for retrieval via prior probabilities.
Abstract: This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of infor- mation retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the author- itativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signicantly improves the eectiveness of the language model baseline. Readability priors seem to increase retrieval eectiveness for graded relevance at early ranks (nDCG@5, but not pre- cision), but no improvements are found at later ranks and when consid- ering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.
TL;DR: This paper has applied a traditional approach of topic modeling using Latent Dirichlet Allocation to classify the documents based on gender and age of an author using Maxent and LDA.
Abstract: This paper describes the traditional authorship attribution subtask of the PAN/CLEF 2013 workshop. In our attempt to classify the documents based on gender and age of an author, we have applied a traditional approach of topic modeling using Latent Dirichlet Allocation(LDA). We used the content based features like topics and style based features like preposition-frequencies, which act as the efficient markers to demarcate the authorship attributes based on age and gender. We demonstrated tenfold cross validation and observed that our classification approach using Maxent and LDA gave an accuracy of 53.3% for English language and 52% for Spanish Language.
TL;DR: Use of a cascade of machine learners to automatically extract mentions of named entities about disorders from clinical notes seems to provide a reasonable strategy for automated extraction of disorders.
Abstract: Objective: There are abundant mentions of clinical conditions, anatomical sites, medications and procedures in clinical documents. This paper describes use of a cascade of machine learners to automatically extract mentions of named entities about disorders from clinical notes. Tasks: A Conditional Random Field (CRF) machine learner has been used for named entity recognition and to capture more complex (multiple word) named entities we have used Support Vector Machines (SVM). Firstly, the training data was converted to the CRF format. Different feature sets were ap- plied using 10-fold cross validation to find the best feature set for the machine learning model. Secondly, the identified named entities were passed to the SVM to find any relation among the identified disorder mentions to decide whether they are a part of a complex disorder. Approach: Our approach was based on a novel supervised learning model which incorporates two machine learning algorithms (CRF and SVM). Evalua- tion of each step included precision, recall and F-score metrics. Resources: We have used several tools which are created in our lab includ- ing TTSCT (Text to SNOMED CT) service, Lexical Management System (LMS) and Ring-fencing approach. A set of gazetteers was created from the training data and employed in analysis as well. Results: Evaluation results produced a precision of 0.766, recall of 0.726 and F-score of 0.746 for named entity recognition based on 10-fold cross vali- dation; and precision, recall and F-measure of 0.927 for relation extraction based on 5-fold cross validation on the training data. On the official test data on strict mode a precision of 0.686, recall of 0.539 and F-score of 0.604 was achieved. Based on the results our team was the 11 th out of 25 participating teams. In the relaxed mode a precision of 0.912, recall of 0.701 and F-score of 0.793 was recorded and our team was the 12 th . A multi stage supervised ma- chine learning method with mixed computational strategies seems to provide a reasonable strategy for automated extraction of disorders.
TL;DR: The Oregon Health & Science University team’s participation in task #3 (“addressing patients’ medical questions”) of this year's eHealth CLEF campaign included submissions from two different retrieval systems, including a traditional, Lucene-based system and a novel system that used statistical language modeling techniques to perform text retrieval.
Abstract: The Oregon Health & Science University team’s participation in task #3 (“addressing patients’ medical questions”) of this year’s eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modified from one used in previous years’ TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine’s MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.
TL;DR: This paper reports on the work of this team in the PAN 2013 author identification task, to automatically detect the author of the given text having small training sets with known authors by a system that used the PPM (Prediction by Partial Matching) compression algorithm based on an n-gram statistical model.
Abstract: This paper reports on our work in the PAN 2013 author identification task. The task is to automatically detect the author of the given text having small training sets with known authors. The task was solved by a system that used the PPM (Prediction by Partial Matching) compression algorithm based on an n-gram statistical model. With the emergence of user-generated web content, text author profiling is being increasingly studied by the NLP community. Various works describe experiments aiming to automatically discover hidden attributes of text which reveal author's gender, age, personality and others. Authorship identification is an important problem in many areas including information retrieval and computational linguistics. While a great number of works have presented investigations in this area there is need for a common ground to evaluate different author recognition techniques. PAN 2013 as part of the CLEF campaigns aims to provide the common conditions and data for this task. We participated in this shared task with the PPM (Prediction by Partial Matching) compression algorithm based on a character-based n-gram statistical model.
TL;DR: The Nara Institute of Science and Tech- nology's system for the entrance exam pilot task of CLEF 2013 QA4MRE uses minimum error rate training (MERT) to train the weights of the model and also proposes a novel method for MERT with the addition of a threshold that denes the certainty with which the model must answer questions.
Abstract: This paper describes the Nara Institute of Science and Tech- nology's system for the entrance exam pilot task of CLEF 2013 QA4MRE. The core of the system is a similar to the system for the main task of CLEF 2013 QA4MRE. We use minimum error rate training (MERT) to train the weights of the model and also propose a novel method for MERT with the addition of a threshold that denes the certainty with which we must answer questions. The system received a score of 22% c@1.
TL;DR: In this paper, an experimental evaluation on using a rened approach to the Latent Semantic Analysis (LSA) for eciently searching very large image databases is presented. And the results of their extensive exper- iments applying early data fusion with LSA on several low-level visual and textual features.
Abstract: This article presents an experimental evaluation on using a rened approach to the Latent Semantic Analysis (LSA) for eciently searching very large image databases. It also describes IPL's participa- tion to the image CLEF ad-hoc textual and visual retrieval as well as modality classication for the Medical Task in 2013. We report on our approaches and methods and present the results of our extensive exper- iments applying early data fusion with LSA on several low-level visual and textual features.
TL;DR: The automatic harmonization method used for building the English Silver Standard annotation supplied as a data source for the multilingual CLEF-ER named entity recognition challenge is described.
Abstract: We describe the automatic harmonization method used for building the English Silver Standard annotation supplied as a data source for the multilingual CLEF-ER named entity recognition challenge The use of an automatic Silver Standard is designed to remove the need for a costly and time-consuming expert annotation The final voting threshold of 3 for the harmonization of 6 different annotations from the project partners kept 45% of all available concept centroids On average, 19% (SD 14%) of the original annotations are removed 978% of the partner annotations that go into the Silver Standard Corpus have exactly the same boundaries as their harmonized representations
TL;DR: This paper built a baseline system using open source software, and improves the performance by adding dictionaries, showing that adding dictionary of acronym/abbreviation can improve the performance significantly.
Abstract: This paper describes the THCIB systems that used in the ShARe/CLEF eHealth Lab 2013 task 2. We built a baseline system using open source software, and improve the performance by adding dictionaries. The dictionary is built from training set and web resource using the existing technologies. The experimental results show that adding dictionary of acronym/abbreviation can improve the performance significantly.
TL;DR: A new collection containing all CLEF working notes including their metadata was created and analysed to take a look back at the developments and trends in different domains like evaluation measures and retrieval models.
Abstract: After seven years of participation in CLEF we take a look back at the developments and trends in different domains like evaluation measures and retrieval models. For that purpose a new collection containing all CLEF working notes including their metadata was created and analysed.
TL;DR: The authors' TopSig open-source indexing and retrieval tool was used to produce runs for the ShARe/CLEF eHealth 2013 track and was able to gain some benefit from utilising the discharge summaries, although the software needed to be modified to support this.
Abstract: We used our TopSig open-source indexing and retrieval tool to produce runs for the ShARe/CLEF eHealth 2013 track. TopSig was used to produce runs using the query fields and provided discharge summaries, where appropriate. Although the improvement was not great TopSig was able to gain some benefit from utilising the discharge summaries, although the software needed to be modified to support this. This was part of a larger experiment involving determining the applicability and limits to signature-based approaches.