Top 49 papers published in the topic of Clef in 2013

Showing papers on "Clef published in 2013"

Proceedings Article•

ShARe/CLEF eHealth Evaluation Lab 2013, Task 3: Information Retrieval to Address Patients' Questions when Reading Clinical Reports

[...]

Lorraine Goeuriot¹, Gareth J. F. Jones¹, Liadh Kelly¹, Johannes Leveling¹, Allan Hanbury², Henning Müller, Sanna Salanterä³, Hanna Suominen⁴, Hanna Suominen⁵, Guido Zuccon⁶ - Show less +6 more•Institutions (6)

Dublin City University¹, Vienna University of Technology², University of Turku³, NICTA⁴, Australian National University⁵, Commonwealth Scientific and Industrial Research Organisation⁶

23 Sep 2013

TL;DR: This paper presents the results of task 3 of the ShARe/CLEF eHealth Evaluation Lab 2013, investigating the effect of using additional information such as the discharge summaries and external resources such as medical ontologies on the IR eectiveness.

...read moreread less

Abstract: This paper presents the results of task 3 of the ShARe/CLEF eHealth Evaluation Lab 2013. This evaluation lab focuses on improving access to medical information on the web. The task objective was to investigate the eect of using additional information such as the discharge summaries and external resources such as medical ontologies on the IR eectiveness. The participants were allowed to submit up to seven runs, one mandatory run using no additional information or external resources, and three each using or not using discharge summaries.

...read moreread less

72 citations

Task 2: ShARe/CLEF eHealth Evaluation Lab 2013

[...]

Danielle L. Mowery¹, Brett R. South², Lee M. Christensen², Laura-Maria Murtola³, Sanna Salanterä³, Hanna Suominen⁴, Hanna Suominen⁵, David Martinez⁵, David Martinez⁶, Noémie Elhadad⁷, Sameer Pradhan⁸, Guergana Savova⁸, Wendy W. Chapman⁹ - Show less +9 more•Institutions (9)

University of Pittsburgh¹, University of Utah², University of Turku³, Australian National University⁴, NICTA⁵, University of Melbourne⁶, Columbia University⁷, Harvard University⁸, University of California, San Diego⁹

8 Sep 2013

TL;DR: This paper reports on Task 2 of the 2014 Share/CLEF eHealth evaluation lab which extended Task 1 of the 2013 ShARe/cleF e health evaluation lab by focusing on template filling of disorder attributes by instructed participants to develop a system which kept or updated a default attribute value for each task.

...read moreread less

Abstract: In this pilot study, we aimed to generate a reference stan- dard of clinical acronyms and abbreviations normalized to concepts from a standardized, medical vocabulary for the ShARe/CLEF eHealth 2013 challenge. In this paper, we review prior text normalization shared tasks, reference standard generation approaches, and recent clinical acronym and abbreviation normalization research. We report inter-annotator agree- ment for the reference standard and performance for participant systems.

...read moreread less

51 citations

Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model

[...]

Buzhou Tang¹, Buzhou Tang², Yonghui Wu¹, Min Jiang¹, Joshua C. Denny³, Hua Xu - Show less +2 more•Institutions (3)

University of Texas Health Science Center at Houston¹, Harbin Institute of Technology², Vanderbilt University³

1 Jan 2013

TL;DR: The first task of the 2013 ShARe/CLEF challenge was to extract disorder mention spans and their associated UMLS (Unified Medical Language System) concept unique identifiers (CUIs) as mentioned in this paper.

...read moreread less

Abstract: The ShARe/CLEF eHealth Evaluation Lab (SHEL) organized a chal- lenge on natural language processing (NLP) and information retrieval (IR) in the medical domain in 2013. The first task of the 2013 ShARe/CLEF challenge was to extract disorder mention spans and their associated UMLS (Unified Medical Language System) concept unique identifiers (CUIs). We participated in Task 1 and developed a clinical disorder recognition and encoding system. The proposed system consists of two components: a machine learning-based approach to recognize disorder entities and a vector space model-based method to encode disorders to UMLS CUIs. The challenge organizers manually anno- tated disorder entities and corresponding UMLS CUIs in 298 clinical notes, of which 199 notes were used for training and 99 were for testing. Evaluation on the test data set showed that our system achieved the best F-measure of 0.750 for entity recognition (ranked first) and the highest F-measure of 0.514 for UMLS CUI encoding (ranked third), indicating the promise of the proposed ap- proaches.

...read moreread less

48 citations

Ensemble-based classification for author profiling using various features Notebook for PAN at CLEF 2013

[...]

Michal Meina, Karolina Brodzinska, Bartosz Celmer, Maja Czoków, Martyna Patera, Jakub Pezacki, Mateusz Wilk - Show less +3 more

1 Jan 2013

TL;DR: The approach to author profiling task is summarized, which has used ensemble-based classification on large features set and evaluation of different methods and classification approaches are provided.

...read moreread less

Abstract: This paper summarize our approach to author profiling task - a part of evaluation lab PAN'13. We have used ensemble-based classification on large features set. All the features are roughly described and experimen- tal section provides evaluation of different methods and classification ap- proaches.

...read moreread less

45 citations

Task 1: ShARe/CLEF eHealth evaluation lab 2013

[...]

Sameer Pradhan¹, Noémie Elhadad², Brett R. South³, David Martinez⁴, Lee M. Christensen³, Amy Vogel², Hanna Suominen⁵, Hanna Suominen⁶, Wendy W. Chapman⁷, Guergana Savova¹ - Show less +6 more•Institutions (7)

Harvard University¹, Columbia University², University of Utah³, University of Melbourne⁴, Australian National University⁵, NICTA⁶, University of California, San Diego⁷

23 Sep 2013

TL;DR: The Task 1 of the ShARe/CLEF eHealth evaluation lab pilot as mentioned in this paper focused on identification and normalization of diseases and disorders in clinical reports, and the best systems had an F1 score of 0.75 (0.80 Precision, 0.71 Recall) in Task 1a and 0.59 in task 1b.

...read moreread less

Abstract: This report outlines the Task 1 of the ShARe/CLEF eHealth evaluation lab pilot. This task focused on identification (1a) and normalization (1b) of diseases and disorders in clinical reports. It used annotations from the ShARe corpus. A total of 22 teams competed in Task 1a and 17 of them also participated Task 1b. The best systems had an F1 score of 0.75 (0.80 Precision, 0.71 Recall) in Task 1a and an accuracy of 0.59 in Task 1b. The organizers have made the text corpora, annotations, and evaluation tools available for future research and development.

...read moreread less

44 citations

Overview of QA4MRE Main Task at CLEF 2013

[...]

Richard F. E. Sutcliffe¹, Anselmo Peñas², Eduard H. Hovy², Pamela Forner³, Álvaro Rodrigo, Corina Forascu⁴, Yassine Benajiba⁵, Petya Osenova - Show less +4 more•Institutions (5)

University of Essex¹, National University of Distance Education², Carnegie Mellon University³, Alexandru Ioan Cuza University⁴, Philips⁵

1 Jan 2013

TL;DR: The preparation of the data sets, the definition of the background collections, the metric used for the evaluation of the systems' submissions, and the results are described.

...read moreread less

Abstract: This paper describes the Question Answering for Machine Reading (QA4MRE) Main Task at the 2013 Cross Language Evaluation Forum. In the main task, systems answered multiple-choice questions on documents con- cerned with four different topics. There were also two pilot tasks, Machine Reading on Biomedical Texts about Alz- heimer's disease, and Japanese Entrance Exams. This paper describes the preparation of the data sets, the definition of the background collections, the metric used for the evaluation of the systems' submissions, and the results. We intro- duced two novelties this year: auxiliary questions to evaluate systems level of inference, and a portion of questions where none of the options were correct. Nineteen groups participated in the task submitting a total of 77 runs in five languages.

...read moreread less

33 citations

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013.

[...]

Leilei Kong, Haoliang Qi, Cuixia Du, Mingxing Wang, Zhongyuan Han - Show less +1 more

1 Jan 2013

TL;DR: This paper describes the approach at the PAN@CLEF2013 plagiarism detection competition, and proposes a method based on sentence similarity to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document.

...read moreread less

Abstract: In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment algorism and similar sentences merging algorism, called Bilateral Alternating Merging Algorithm, are described in detail.

...read moreread less

32 citations

Proceedings Article•

CLEF 2013 Evaluation Labs and Workshop, Online Working Notes

[...]

Mette Skov, Toine Bogers¹, Haakon Lund¹, Maj Lauge Ward Jensen, Erik Wistrup, Birger Larsen² - Show less +2 more•Institutions (2)

University of Copenhagen¹, Aalborg University²

1 Jan 2013

32 citations

Author Profiling Using Style-based Features Notebook for PAN at CLEF 2013.

[...]

Seifeddine Mechti, Maher Jaoua, Lamia Hadrich Belguith

1 Jan 2013

TL;DR: A method based on learning the author profile with a focus on dimensions age and gender of its author, which has shown a high level of accuracy and effectiveness in treating the gender dimension and got the best accuracy for the entire PAN 2013 competition.

...read moreread less

Abstract: In this paper, we present a method for profiling the author of an anonymous text. Our approach is based on learning the author profile with a focus on dimensions age and gender. Our system takes as input a document which is written in English or in Spanish and generates the age and the gender of its author. First, we computed a ranked list of words that occur in the corpus and we grouped them into classes according to their similarities. Then, we calculated the TF * IDF score of each class for each document in order to find the stylistic differences between men and women, on the one hand, and those between different age intervals on the other hand. After that, we applied the learning process on 66% of the English and the Spanish corpuses using decision trees through the J48 algorithm. In factwe got the second place in the competition for the English corpus;Our system has shown a high level of accuracy and effectiveness in treating the gender dimension and we got the best accuracy for the entire PAN 2013 competition.

...read moreread less

19 citations

Plagiarism Candidate Retrieval Using Selective Query Formulation and Discriminative Query Scoring Notebook for PAN at CLEF 2013.

[...]

Osama Haggag, Samhaa R. El-Beltagy

1 Jan 2013

TL;DR: Comparison to other PAN 2013 submissions for the same task, show the presented plagiarism source retrieval system to be one of the top performers.

...read moreread less

Abstract: This paper details the approach of implementing an English plagiarism source retrieval system to be presented at PAN 2013. The system uses the TextTiling algorithm to break a given document into segments that are centered around certain topics within the document. From these segments, keyphrases are generated using the KPMiner keyphrase extraction system. These keyphrases and segments are then used in generating queries indicative of the segment, and consequently the document. The queries are submitted to ChatNoir for finding plagiarism sources in the ClueWeb09 corpus from which the pan13 dataset is plagiarized. The target is to lessen the overall search effort while maximizing the performance by scoring unconsumed queries against the already downloaded candidate sources. Comparison to other PAN 2013 submissions for the same task, show the presented system to be one of the top performers.

...read moreread less

17 citations

Cultural Heritage in CLEF (CHiC)

[...]

Vivien Petras¹, Toine Bogers², Elaine G. Toms³, Mark M. Hall³, Jacques Savoy⁴, Piotr Malak⁴, Adam Pawłowski⁵, Nicola Ferro⁶, Masiero Ivano⁶ - Show less +5 more•Institutions (6)

Humboldt University of Berlin¹, University of Copenhagen², University of Sheffield³, University of Neuchâtel⁴, University of Wrocław⁵, University of Padua⁶

1 Jan 2013

Proceedings Article•

Identify disorders in health records using Conditional Random Fields and Metamap: AEHRC at ShARe/CLEF 2013 eHealth Evaluation Lab Task 1

[...]

Guido Zuccon¹, Alexander Holloway², Bevan Koopman³, Anthony Nguyen⁴•Institutions (4)

Commonwealth Scientific and Industrial Research Organisation¹, University of Queensland², Queensland University of Technology³, Pierre-and-Marie-Curie University⁴

1 Sep 2013

TL;DR: The Australian e-Health Research Centre (AEHRC) recently participated in the ShARe/CLEF eHealth Evaluation Lab Task 1, to individuate mentions of disorders in free-text electronic health records and map disorders to SNOMED CT concepts in the UMLS metathesaurus.

...read moreread less

Abstract: The Australian e-Health Research Centre (AEHRC) recently participated in the ShARe/CLEF eHealth Evaluation Lab Task 1. The goal of this task is to individuate mentions of disorders in free-text electronic health records and map disorders to SNOMED CT concepts in the UMLS metathesaurus. This paper details our participation to this ShARe/CLEF task. Our approaches are based on using the clinical natural language processing tool Metamap and Conditional Random Fields (CRF) to individuate mentions of disorders and then to map those to SNOMED CT concepts. Empirical results obtained on the 2013 ShARe/CLEF task highlight that our instance of Metamap (after ltering irrelevant semantic types), although achieving a high level of precision, is only able to identify a small amount of disorders (about 21% to 28%) from free-text health records. On the other hand, the addition of the CRF models allows for a much higher recall (57% to 79%) of disorders from free-text, without sensible detriment in precision. When evaluating the accuracy of the mapping of disorders to SNOMED CT concepts in the UMLS, we observe that the mapping obtained by our ltered instance of Metamap delivers state-of-the-art e ectiveness if only spans individuated by our system are considered (`relaxed' accuracy).

...read moreread less

Proceedings Article•

Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge

[...]

Jungwei Fan, Navdeep Sood, Yang Huang

1 Jan 2013

TL;DR: An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the ShARe/CLEF 2013 NLP Challenge, and a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge.

...read moreread less

Abstract: We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.

...read moreread less

Combining MetaMap and cTAKES in Disorder Recognition: THCIB at CLEF eHealth Lab 2013 Task 1

[...]

Yunqing Xia, Xiaoshi Zhong, Peng Liu, Cheng Tan, Sen Na, Qinan Hu, Yaohai Huang - Show less +3 more

1 Jan 2013

TL;DR: This paper describes the THCIB systems that used in the ShARe/CLEF eHealth 2013 task 1, and implemented two baseline systems and a combination system using the existing technologies.

...read moreread less

Abstract: This paper describes the THCIB systems that used in the ShARe/CLEF eHealth 2013 task 1. We implemented two baseline systems and a combination system using the existing technologies. One baseline system is built using MetaMap. We built another baseline system using cTAKES. Furthermore, we developed the combination system with a system combination method. The results of combination system were submitted because the combined results performed better than either single system. We also report the experimental results on the training set and the test set.

...read moreread less

Semantic-based Features for Author Profiling Identification: First insights Notebook for PAN at CLEF 2013.

[...]

Delia-Irazú Hernández, Rafael Guzmán-Cabrera, Antonio Reyes, Martha-Alicia Rocha

1 Jan 2013

TL;DR: A semantic-based approach concerning the identification of particular author’s traits, such as age and gender, from social media texts is presented.

...read moreread less

Abstract: In this article we present a semantic-based approach concerning the identification of particular author’s traits, such as age and gender, from social media texts. The model here described is intended to provide information on different levels of analysis: from textual markers to semantics. Different classifiers were used to assess the performance and scope of the model.

...read moreread less

Book Chapter•10.1007/978-3-642-45272-7_14•

An Interactive Mobile Application for Learning Music Effectively

[...]

Sin-Chun Ng¹, Andrew K. Lui¹, W. S. Lo¹•Institutions (1)

Open University of Hong Kong¹

10 Jul 2013

TL;DR: A mobile application that allows users to learn music in a funny and effective way so as to arouse students' interests towards music, and provide a convenient means to students’ learning and playing music through the mobile devices is introduced.

...read moreread less

Abstract: This paper introduces a mobile application that allows users learn music in a funny and effective way so as to arouse students’ interests towards music, and provide a convenient means to students’ learning and playing music through the mobile devices. The mobile application enables users to know their learning progress. Students can learn music effectively through game-based quizzes and exercises. The mobile application provides an elementary level e-learning platform for music learners. It serves as a stepping stone for them to further develop their interests in this field. The application is divided into three parts: fundamental musical theory, educational games, and practical use of musical instruments. The musical activities includes introducing different musical instruments, reading scores, listening to different notes, writing and recognizing treble clef, calculating the tempo of a song and playing notes from a keyboard with the sounds of different instruments.

...read moreread less

Book•

Information access evaluation : multilinguality, multimodality, and visualization : 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23-26, 2013 : proceedings

[...]

Cross-Language Evaluation Forum, Pamela Forner

1 Jan 2013

TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually cataloging and annotating documents in a number of languages.

...read moreread less

Abstract: Evaluation and visualization.- Multilinguality and less-resourced languages.- Applications.- Lab overviews.

...read moreread less

Using statistic and semantic analysis to detect plagiarism Notebook for PAN at CLEF 2013

[...]

Victoria Elizalde

1 Jan 2013

TL;DR: This paper describes an approach submitted to the 2013 PAN com- petiton for the source retrieval sub-task, which employed tf-idf, noun phrases and named entities in order to submit very different queries and maximize recall.

...read moreread less

Abstract: This paper describes an approach submitted to the 2013 PAN com- petiton for the source retrieval sub-task Three different methods for extracting queries were used, which employed tf-idf, noun phrases and named entities, in order to submit very different queries and maximize recall

...read moreread less

Question Answering System for Entrance Exams in QA4MRE.

[...]

Xinjian Li, Ran Tian, Ngan Luu-Thuy Nguyen, Yusuke Miyao, Akiko Aizawa - Show less +1 more

1 Jan 2013

TL;DR: The question answering system for Entrance Exams, which is a pilot task of the Question Answering for Machine Reading Evaluation at Conference and Labs of the Evaluation Forum (CLEF) 2013, developed a component to detect all story characters in the documents and tag all personal pronouns using coreference resolution.

...read moreread less

Abstract: This paper describes our question answering system for Entrance Exams, which is a pilot task of the Question Answering for Machine Reading Evaluation at Conference and Labs of the Evaluation Forum (CLEF) 2013. We conducted experiments in which participants were provided with documents and multiple-choice questions. Their goals was to select one answer or leave it unanswered for each question. In our system, we developed a component to detect all story characters in the documents and tag all personal pronouns using coreference resolution. For each question, we extracted related sentences and combined them with candidate answers to create inputs for a Recognizing Textual Entailment (RTE) component. The answers were then selected based on the confidence scores from the Recognizing Textual Entailment component. We submitted five runs in the task and the run that ranked highest obtained a c@1 score of 0.35, which outperformed the baseline c@1 score of 0.25.

...read moreread less

Proceedings Article•

Retrieval of health advice on the web: AEHRC at ShARe/CLEF eHealth Evaluation Lab Task 3

[...]

Guido Zuccon¹, Bevan Koopman², Anthony Nguyen³•Institutions (3)

Commonwealth Scientific and Industrial Research Organisation¹, Queensland University of Technology², Pierre-and-Marie-Curie University³

1 Sep 2013

TL;DR: Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signicantly improves the eectiveness of the language model baseline, and measures of readability are integrated in the language models used for retrieval via prior probabilities.

...read moreread less

Abstract: This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of infor- mation retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the author- itativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signicantly improves the eectiveness of the language model baseline. Readability priors seem to increase retrieval eectiveness for graded relevance at early ranks (nDCG@5, but not pre- cision), but no improvements are found at later ranks and when consid- ering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.

...read moreread less

Author profiling using LDA and Maximum Entropy Notebook for PAN at CLEF 2013

[...]

Aditya Pavan¹, Aditya Mogadala¹, Vasudeva Varma¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

1 Jan 2013

TL;DR: This paper has applied a traditional approach of topic modeling using Latent Dirichlet Allocation to classify the documents based on gender and age of an author using Maxent and LDA.

...read moreread less

Abstract: This paper describes the traditional authorship attribution subtask of the PAN/CLEF 2013 workshop. In our attempt to classify the documents based on gender and age of an author, we have applied a traditional approach of topic modeling using Latent Dirichlet Allocation(LDA). We used the content based features like topics and style based features like preposition-frequencies, which act as the efficient markers to demarcate the authorship attributes based on age and gender. We demonstrated tenfold cross validation and observed that our classification approach using Maxent and LDA gave an accuracy of 53.3% for English language and 52% for Spanish Language.

...read moreread less

ShARe/CLEF eHealth 2013 Named Entity Recognition and Normalization of Disorders Challenge

[...]

Jon Patrick¹, Leila Safari¹, Ying Ou¹•Institutions (1)

University of Sydney¹

1 Jan 2013

TL;DR: Use of a cascade of machine learners to automatically extract mentions of named entities about disorders from clinical notes seems to provide a reasonable strategy for automated extraction of disorders.

...read moreread less

Abstract: Objective: There are abundant mentions of clinical conditions, anatomical sites, medications and procedures in clinical documents. This paper describes use of a cascade of machine learners to automatically extract mentions of named entities about disorders from clinical notes. Tasks: A Conditional Random Field (CRF) machine learner has been used for named entity recognition and to capture more complex (multiple word) named entities we have used Support Vector Machines (SVM). Firstly, the training data was converted to the CRF format. Different feature sets were ap- plied using 10-fold cross validation to find the best feature set for the machine learning model. Secondly, the identified named entities were passed to the SVM to find any relation among the identified disorder mentions to decide whether they are a part of a complex disorder. Approach: Our approach was based on a novel supervised learning model which incorporates two machine learning algorithms (CRF and SVM). Evalua- tion of each step included precision, recall and F-score metrics. Resources: We have used several tools which are created in our lab includ- ing TTSCT (Text to SNOMED CT) service, Lexical Management System (LMS) and Ring-fencing approach. A set of gazetteers was created from the training data and employed in analysis as well. Results: Evaluation results produced a precision of 0.766, recall of 0.726 and F-score of 0.746 for named entity recognition based on 10-fold cross vali- dation; and precision, recall and F-measure of 0.927 for relation extraction based on 5-fold cross validation on the training data. On the official test data on strict mode a precision of 0.686, recall of 0.539 and F-score of 0.604 was achieved. Based on the results our team was the 11 th out of 25 participating teams. In the relaxed mode a precision of 0.912, recall of 0.701 and F-score of 0.793 was recorded and our team was the 12 th . A multi stage supervised ma- chine learning method with mixed computational strategies seems to provide a reasonable strategy for automated extraction of disorders.

...read moreread less

Proceedings Article•

Lucene, MetaMap, and Language Modeling: OHSU at CLEF eHealth 2013.

[...]

Steven Bedrick, Golnar Sheikshabbafghi¹•Institutions (1)

Oregon Health & Science University¹

1 Jan 2013

TL;DR: The Oregon Health & Science University team’s participation in task #3 (“addressing patients’ medical questions”) of this year's eHealth CLEF campaign included submissions from two different retrieval systems, including a traditional, Lucene-based system and a novel system that used statistical language modeling techniques to perform text retrieval.

...read moreread less

Abstract: The Oregon Health & Science University team’s participation in task #3 (“addressing patients’ medical questions”) of this year’s eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modified from one used in previous years’ TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine’s MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.

...read moreread less

Authorship Detection with PPM Notebook for PAN at CLEF 2013

[...]

Victoria Bobicev

1 Jan 2013

TL;DR: This paper reports on the work of this team in the PAN 2013 author identification task, to automatically detect the author of the given text having small training sets with known authors by a system that used the PPM (Prediction by Partial Matching) compression algorithm based on an n-gram statistical model.

...read moreread less

Abstract: This paper reports on our work in the PAN 2013 author identification task. The task is to automatically detect the author of the given text having small training sets with known authors. The task was solved by a system that used the PPM (Prediction by Partial Matching) compression algorithm based on an n-gram statistical model. With the emergence of user-generated web content, text author profiling is being increasingly studied by the NLP community. Various works describe experiments aiming to automatically discover hidden attributes of text which reveal author's gender, age, personality and others. Authorship identification is an important problem in many areas including information retrieval and computational linguistics. While a great number of works have presented investigations in this area there is need for a common ground to evaluate different author recognition techniques. PAN 2013 as part of the CLEF campaigns aims to provide the common conditions and data for this task. We participated in this shared task with the PPM (Prediction by Partial Matching) compression algorithm based on a character-based n-gram statistical model.

...read moreread less

Proceedings Article•

NAIST at the CLEF 2013 QA4MRE pilot task

[...]

Philip Arthur¹, Graham Neubig¹, Sakriani Sakti¹, Tomoki Toda¹, Satoshi Nakamura¹ - Show less +1 more•Institutions (1)

Nara Institute of Science and Technology¹

1 Jan 2013

TL;DR: The Nara Institute of Science and Tech- nology's system for the entrance exam pilot task of CLEF 2013 QA4MRE uses minimum error rate training (MERT) to train the weights of the model and also proposes a novel method for MERT with the addition of a threshold that denes the certainty with which the model must answer questions.

...read moreread less

Abstract: This paper describes the Nara Institute of Science and Tech- nology's system for the entrance exam pilot task of CLEF 2013 QA4MRE. The core of the system is a similar to the system for the main task of CLEF 2013 QA4MRE. We use minimum error rate training (MERT) to train the weights of the model and also propose a novel method for MERT with the addition of a threshold that denes the certainty with which we must answer questions. The system received a score of 22% c@1.

...read moreread less

IPL at CLEF 2013 Medical Retrieval Task

[...]

Spyridon Stathopoulos, Ismini Lourentzou, Antonia Kyriakopoulou, Theodore Kalamboukis

1 Jan 2013

TL;DR: In this paper, an experimental evaluation on using a rened approach to the Latent Semantic Analysis (LSA) for eciently searching very large image databases is presented. And the results of their extensive exper- iments applying early data fusion with LSA on several low-level visual and textual features.

...read moreread less

Abstract: This article presents an experimental evaluation on using a rened approach to the Latent Semantic Analysis (LSA) for eciently searching very large image databases. It also describes IPL's participa- tion to the image CLEF ad-hoc textual and visual retrieval as well as modality classication for the Medical Task in 2013. We report on our approaches and methods and present the results of our extensive exper- iments applying early data fusion with LSA on several low-level visual and textual features.

...read moreread less

10.5167/UZH-87213•

Deriving an English Biomedical Silver Standard Corpus for CLEF-ER

[...]

Ian Lewin, Simon Clematide¹•Institutions (1)

University of Zurich¹

26 Sep 2013

TL;DR: The automatic harmonization method used for building the English Silver Standard annotation supplied as a data source for the multilingual CLEF-ER named entity recognition challenge is described.

...read moreread less

Abstract: We describe the automatic harmonization method used for building the English Silver Standard annotation supplied as a data source for the multilingual CLEF-ER named entity recognition challenge The use of an automatic Silver Standard is designed to remove the need for a costly and time-consuming expert annotation The final voting threshold of 3 for the harmonization of 6 different annotations from the project partners kept 45% of all available concept centroids On average, 19% (SD 14%) of the original annotations are removed 978% of the partner annotations that go into the Silver Standard Corpus have exactly the same boundaries as their harmonized representations

...read moreread less

Normalization of Abbreviations/Acronyms: THCIB at CLEF eHealth 2013 Task 2

[...]

Yunqing Xia, Xiaoshi Zhong, Peng Liu, Cheng Tan, Sen Na, Qinan Hu, Yaohai Huang - Show less +3 more

1 Jan 2013

TL;DR: This paper built a baseline system using open source software, and improves the performance by adding dictionaries, showing that adding dictionary of acronym/abbreviation can improve the performance significantly.

...read moreread less

Abstract: This paper describes the THCIB systems that used in the ShARe/CLEF eHealth Lab 2013 task 2. We built a baseline system using open source software, and improve the performance by adding dictionaries. The dictionary is built from training set and web resource using the existing technologies. The experimental results show that adding dictionary of acronym/abbreviation can improve the performance significantly.

...read moreread less

Book Chapter•10.1007/978-3-642-40802-1_2•

A Quantitative Look at the CLEF Working Notes

[...]

Thomas Wilhelm-Stein¹, Maximilian Eibl¹•Institutions (1)

Chemnitz University of Technology¹

23 Sep 2013

TL;DR: A new collection containing all CLEF working notes including their metadata was created and analysed to take a look back at the developments and trends in different domains like evaluation measures and retrieval models.

...read moreread less

Abstract: After seven years of participation in CLEF we take a look back at the developments and trends in different domains like evaluation measures and retrieval models. For that purpose a new collection containing all CLEF working notes including their metadata was created and analysed.

...read moreread less

Working notes for TopSig at ShARe/CLEF eHealth 2013

[...]

Timothy Chappell, Shlomo Geva

1 Jan 2013

TL;DR: The authors' TopSig open-source indexing and retrieval tool was used to produce runs for the ShARe/CLEF eHealth 2013 track and was able to gain some benefit from utilising the discharge summaries, although the software needed to be modified to support this.

...read moreread less

Abstract: We used our TopSig open-source indexing and retrieval tool to produce runs for the ShARe/CLEF eHealth 2013 track. TopSig was used to produce runs using the query fields and provided discharge summaries, where appropriate. Although the improvement was not great TopSig was able to gain some benefit from utilising the discharge summaries, although the software needed to be modified to support this. This was part of a larger experiment involving determining the applicability and limits to signature-based approaches.

...read moreread less