Cross-Language Evaluation Forum

Conference Tools

Papers published on a yearly basis

Papers

Book Chapter•10.1007/3-540-45691-0_34•

The Philosophy of Information Retrieval Evaluation

[...]

Ellen M. Voorhees¹•Institutions (1)

National Institute of Standards and Technology¹

3 Sep 2001

TL;DR: The fundamental assumptions and appropriate uses of the Cranfield paradigm, especially as they apply in the context of the evaluation conferences, are reviewed.

...read moreread less

Abstract: Evaluation conferences such as TREC, CLEF, and NTCIR are modern examples of the Cranfield evaluation paradigm. In Cranfield, researchers perform experiments on test collections to compare the relative effectiveness of different retrieval approaches. The test collections allow the researchers to control the effects of different system parameters, increasing the power and decreasing the cost of retrieval experiments as compared to user-based evaluations. This paper reviews the fundamental assumptions and appropriate uses of the Cranfield paradigm, especially as they apply in the context of the evaluation conferences.

...read moreread less

507 citations

Proceedings Article•

Overview of the 2nd International Competition on Plagiarism Detection

[...]

Martin Potthast, Alberto Barrón-Cedeño, Andreas Eiselt, Benno Stein, Paolo Rosso¹, Bauhaus-Universiät Weimar - Show less +2 more•Institutions (1)

Polytechnic University of Valencia¹

1 Jan 2011

TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.

...read moreread less

Abstract: Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10. We start with a unified retrieval process that sum- marizes the best practices employed this year. Then, the detectors' performances are evaluated in detail, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length. Finally, all results are compared to those of last year's competition.

...read moreread less

426 citations

Book Chapter•10.1007/978-3-319-44564-9_3•

A Test Collection for Research on Depression and Language Use

[...]

David E. Losada¹, Fabio Crestani²•Institutions (2)

University of Santiago de Compostela¹, University of Lugano²

5 Sep 2016

TL;DR: A novel early detection task is proposed and a novel effectiveness measure is defined to systematically compare early detection algorithms that takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases.

...read moreread less

Abstract: Several studies in the literature have shown that the words people use are indicative of their psychological states. In particular, depression was found to be associated with distinctive linguistic patterns. However, there is a lack of publicly available data for doing research on the interaction between language and depression. In this paper, we describe our first steps to fill this gap. We outline the methodology we have adopted to build and make publicly available a test collection on depression and language use. The resulting corpus includes a series of textual interactions written by different subjects. The new collection not only encourages research on differences in language between depressed and non-depressed individuals, but also on the evolution of the language use of depressed individuals. Further, we propose a novel early detection task and define a novel effectiveness measure to systematically compare early detection algorithms. This new measure takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases. We also present baseline results with novel detection methods that process users’ interactions in different ways.

...read moreread less

264 citations

Book Chapter•10.1007/978-3-319-11382-1_17•

Overview of the ShARe/CLEF eHealth evaluation lab 2014

[...]

Liadh Kelly¹, Lorraine Goeuriot¹, Hanna Suominen², Tobias Schreck³, Gondy Leroy⁴, Danielle L. Mowery⁵, Sumithra Velupillai⁶, Wendy W. Chapman⁷, David Martinez⁸, Guido Zuccon⁹, Joao Palotti¹⁰ - Show less +7 more•Institutions (10)

Dublin City University¹, Australian National University², University of Konstanz³, University of Arizona⁴, University of Pittsburgh⁵, Stockholm University⁶, University of Utah⁷, University of Melbourne⁸, Queensland University of Technology⁹, Vienna University of Technology¹⁰

15 Sep 2014

TL;DR: The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients.

...read moreread less

Abstract: This paper reports on the 2nd ShARe/CLEFeHealth evaluation lab which continues our evaluation resource building activities for the medical domain. In this lab we focus on patients’ information needs as opposed to the more common campaign focus of the specialised information needs of physicians and other healthcare workers. The usage scenario of the lab is to ease patients and next-of-kins’ ease in understanding eHealth information, in particular clinical reports. The 1st ShARe/CLEFeHealth evaluation lab was held in 2013. This lab consisted of three tasks. Task 1 focused on named entity recognition and normalization of disorders; Task 2 on normalization of acronyms/abbreviations; and Task 3 on information retrieval to address questions patients may have when reading clinical reports. This year’s lab introduces a new challenge in Task 1 on visual-interactive search and exploration of eHealth data. Its aim is to help patients (or their next-of-kin) in readability issues related to their hospital discharge documents and related information search on the Internet. Task 2 then continues the information extraction work of the 2013 lab, specifically focusing on disorder attribute identification and normalization from clinical text. Finally, this year’s Task 3 further extends the 2013 information retrieval task, by cleaning the 2013 document collection and introducing a new query generation method and multilingual queries. De-identified clinical reports used by the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Tasks 1 and 3 were from the Internet and originated from the Khresmoi project. Task 2 annotations originated from the ShARe annotations. For Tasks 1 and 3, new annotations, queries, and relevance assessments were created. 50, 79, and 91 people registered their interest in Tasks 1, 2, and 3, respectively. 24 unique teams participated with 1, 10, and 14 teams in Tasks 1, 2 and 3, respectively. The teams were from Africa, Asia, Canada, Europe, and North America. The Task 1 submission, reviewed by 5 expert peers, related to the task evaluation category of Effective use of interaction and targeted the needs of both expert and novice users. The best system had an Accuracy of 0.868 in Task 2a, an F1-score of 0.576 in Task 2b, and Precision at 10 (P@10) of 0.756 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.

...read moreread less

242 citations

Book•10.1007/B102261•

Comparative Evaluation of Multilingual Information Access Systems

[...]

Carol Peters, Julio Gonzalo, Martin Braschler, Michael Kluck

1 Jan 2004

TL;DR: The paper discusses the evaluation approach adopted, describes the tracks and tasks offered and the test collections used, and provides an outline of the guidelines given to the participants.

...read moreread less

Abstract: We describe the overall organization of the CLEF 2003 evaluation campaign, with a particular focus on the cross-language ad hoc and domainspecific retrieval tracks. The paper discusses the evaluation approach adopted, describes the tracks and tasks offered and the test collections used, and provides an outline of the guidelines given to the participants. It concludes with an overview of the techniques employed for results calculation and analysis for the monolingual, bilingual and multilingual and GIRT tasks.

...read moreread less

227 citations

...

Expand

Year	Papers
2021	52
2020	37
2019	53
2018	54
2017	53
2016	63

Conference Tools

Papers published on a yearly basis

Papers

The Philosophy of Information Retrieval Evaluation

Overview of the 2nd International Competition on Plagiarism Detection

A Test Collection for Research on Depression and Language Use

Overview of the ShARe/CLEF eHealth evaluation lab 2014

Comparative Evaluation of Multilingual Information Access Systems

Performance Metrics