TL;DR: A general stopword list for various European languages (namely, French, Italian, German and Spanish) and a combined approach that might be implemented in order to facilitate effective access to multilingual collections are suggested.
Abstract: For our first participation in CLEF retrieval tasks, our first objective was to define a general stopword list for various European languages (namely, French, Italian, German and Spanish) and also to suggest simple and efficient stemming procedures for them. Our second aim was to suggest a combined approach that might be implemented in order to facilitate effective access to multilingual collections.
TL;DR: The primary objective was to define a general stopword list for various European languages and also to suggest simple and efficient stemming procedures for these languages.
Abstract: In our first participation in clef retrieval tasks, the primary objective was to define a general stopword list for various European languages (namely, French, Italian, German and Spanish) and also to suggest simple and efficient stemming procedures for these languages. Our second aim was to suggest a combined approach that could facilitate effective access to multilingual collections.
TL;DR: This paper describes the shared experiment design used at all three participating sites, summarizes preliminary re- sults from the evaluation, and concludes with observations on lessons learned that can inform the design of subsequent evaluation campaigns.
Abstract: The problem of finding documents written in a language that the searcher cannot read is perhaps the most challenging appli- cation of cross-language information retrieval technology. In interactive applications, that task involves at least two steps: (1) the machine lo- cates promising documents in a collection that is larger than the searcher could scan, and (2) the searcher recognizes documents relevant to their intended use from among those nominated by the machine. The goal of the 2001 Cross-Language Evaluation Forum's experimental interactive track was to explore the ability of present technology to support inter- active relevance assessment. This paper describes the shared experiment design used at all three participating sites, summarizes preliminary re- sults from the evaluation, and concludes with observations on lessons learned that can inform the design of subsequent evaluation campaigns.
TL;DR: This paper reports on experiments with the IR-n system, an information retrieval system that applies a new method for passage selection that has been tested for the monolingual (Spanish) and bilingual (Spanish-English) tasks at CLEF-2001 with different success rates.
Abstract: Previous work demonstrates that information retrieval system performance is sensibly improved when using document passages as the basic unit of information. However, the IR community has not yet arrived at consensus about the best way of defining text passages for retrieval purposes. This paper reports on experiments with the IR-n system, an information retrieval system that applies a new method for passage selection. Passages are defined as a fixed number of adjoining sentences in a document. This approach has been tested for the monolingual (Spanish) and bilingual (Spanish-English) tasks at CLEF-2001 with different success rates.
TL;DR: The CLEF multilingual test collection is examined with respect to the completeness of its relevance assessments and indicates that thetest collection is stable and well suited for use in future evaluations.
Abstract: CLEF, the Cross-Language Evaluation Forum, continued to grow substantially in the second year of its existence. Building on the success of the first CLEF campaign in 2000 and of its predecessors, the TREC cross-language tracks, CLEF 2001 attracted 34 participating groups which submitted nearly 200 different result sets. A description of the various tracks, and a summary of the main results and research directions are given in this overview. In addition, the CLEF multilingual test collection is examined with respect to the completeness of its relevance assessments. The analysis indicates that the test collection is stable and well suited for use in future evaluations.
TL;DR: The Tampere University CLEF research group participated in CLEF2001 with four automated bilingual runs, and test with two dictionaries for the German runs gives an indication that the new features for compound processing work well even with a limited dictionary.
Abstract: The Tampere University CLEF research group participated in CLEF2001 with four automated bilingual runs. Our cross-lingual software, UTACLIR, uses an automated method for query construction for cross-language information retrieval (CLIR). This method seeks to automatically extract topical information from request sentences written in one of the source languages and to create a target language query, based on translations given by a translation dictionary. The new features for the CLIR process from Finnish, Swedish and German to English focus on translating and matching compound words, and a new n-gram based technique for translating and matching proper names and other non-translatable words. Non-translatable words can also be components in compounds. The n-gram based method is clearly efficient in matching inflected proper names and spelling variants. However, using it for all non-identified and non-translatable words adds noise to the query. For German — English we have tested two types of dictionaries (two runs). The first included all translations from the standard dictionary. The second contained the same data, except that all direct translations of compounds were excluded. The test with two dictionaries for the German runs gives an indication that the new features for compound processing work well even with a limited dictionary.
TL;DR: The multilingual experiments, the main focus of this year’s work, combine multiple approaches to cross-language retrieval: machine translation, similarity thesauri, and machine-readable dictionaries.
Abstract: Eurospider participated in both the multilingual and monolingual retrieval tasks for CLEF 2001. Our multilingual experiments, the main focus of this year’s work, combine multiple approaches to cross-language retrieval: machine translation, similarity thesauri, and machine-readable dictionaries. We experimented with both query translation and document translation. The monolingual experiments focused on the use of two fundamentally different stemming components: a stemmer based on commercial considerations, and a linguistically motivated stemmer.
TL;DR: More recent experimental results from investigations of the combination of results from alternative machine translation outputs are described, looking at the use of data fusion of the output from individual retrieval runs and the combinations of alternative topic translations.
Abstract: The University of Exeter participated in the CLEF 2001 bilingual task. The main objectives of our experiments were to compare retrieval performance for different topic languages with similar easily available machine translation resources and to explore the application of new pseudo relevance feedback techniques recently developed at Exeter to Cross-Language Information Retrieval (CLIR). This paper also describes more recent experimental results from our investigations of the combination of results from alternative machine translation outputs; specifically we look at the use of data fusion of the output from individual retrieval runs and the combination of alternative topic translations.
TL;DR: The ability to perform solfeggio, i.e. oral reading of musical notes in MP, a 65 year-old female professional musician, who, following a left temporoparietal ischemia, showed a complex pattern of amusia is investigated.
TL;DR: Starting with TREC, the important parameters of topic generation are described and the main focus lies on evaluating multilingual functions.
Abstract: Topic generation is considered as one of the crucial elements in the information retrieval evaluation process. In the context of CLEF, the main focus lies on evaluating multilingual functions. With respect to topic generation this means that topics have to be created in various languages. Starting with TREC, the important parameters of topic generation are described.
TL;DR: Thomson Legal and Regulatory participated in the monolingual track for all five languages and in the bilingual track with Spanish-English runs, and their bilingual runs compared merging strategies for query translation resources.
Abstract: Thomson Legal and Regulatory participated in the monolingual track for all five languages and in the bilingual track with Spanish-English runs. Our monolingual runs for Dutch, Spanish and Italian use settings and rules derived from our runs in French and German last year. Our bilingual runs compared merging strategies for query translation resources.
TL;DR: Evaluating the effectiveness of query translation and disambiguation as well as expansion techniques on the CLEF Collections, using the SMART Information Retrieval System finds a dictionary-based method in combination with a statistics- based method to avoid the problem of translation ambiguity.
Abstract: This paper evaluates the effectiveness of query translation and disambiguation as well as expansion techniques on the CLEF Collections, using the SMART Information Retrieval System We focus on the query translation, disambiguation and methods used to improve the effectiveness of information retrieval A dictionary-based method in combination with a statistics-based method is used to avoid the problem of translation ambiguity In addition, two expansion strategies are tested to see whether they improve the effectiveness of information retrieval: expansion via relevance feedback before and after translation as well as expansion via domain feedback after translation This method achieved 8530% of the monolingual counterpart, in terms of average precision
TL;DR: This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum (CLEF) of 2001, which took part in the monolingual retrieval task, and the bilingual retrieval task.
Abstract: This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum (CLEF) of 2001. ITC-irst took part in two tracks: the monolingual retrieval task, and the bilingual retrieval task. In both cases, Italian was chosen as the query language, while English was chosen as the document language of the bilingual task. The retrieval engine that was used combines scores computed by an Okapi model and a statistical language model. The cross language system used a statistical query translation model, which is estimated on the target document collection and on a translation dictionary.
TL;DR: Evaluation for CLIR systems, translation resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval, and more.
Abstract: Evaluation for CLIR Systems.- CLIR Evaluation at TREC.- NTCIR Workshop : Japanese- and Chinese-English Cross-Lingual Information Retrieval and Multi-grade Relevance Judgments.- Language Resources in Cross-Language Text Retrieval: A CLEF Perspective.- The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval.- Evaluating Interactive Cross-Language Information Retrieval: Document Selection.- New Challenges for Cross-Language Information Retrieval: Multimedia Data and the User Experience.- Research to Improve Cross-Language Retrieval - Position Paper for CLEF.- The CLEF-2000 Experiments.- CLEF 2000 - Overview of Results.- Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval.- Cross-Language Retrieval for the CLEF Collections - Comparing Multiple Methods of Retrieval.- A Language-Independent Approach to European Text Retrieval.- Experiments with the Eurospider Retrieval System for CLEF 2000.- A Poor Man's Approach to CLEF.- Ambiguity Problem in Multiingual Information Retrieval.- The Use of NLP Techniques in CLIR.- CLEF Experiments at Maryland: Statistical Stemming and Backoff Translation.- Multilingual Information Retrieval Based on Parallel Texts from the Web.- Mercure at CLEF-1.- Bilingual Tests with Swedish, Finnish, and German Queries: Dealing with Morphology, Compound Words, and Query Structure.- A Simple Approach to the Spanish-English Bilingual Retrieval Task.- Cross-Language Information Retrieval Using Dutch Query Translation.- Bilingual Information Retrieval with HyREX and Internet Translation Services.- Sheffield University CLEF 2000 Submission - Bilingual Track: German to English.- West Group at CLEF 2000: Non-english Monolingual Retrieval.- ITC-irst at CLEF 2000: Italian Monolingual Track.- Automatic Language-Specific Stemming in Information Retrieval.
TL;DR: Using all words from the query to find the results and knowing on forehand that the undesired results would be found meant that it was nice to check out if the experiment could deliver better results than the worst participants with this approach.
Abstract: Before we started the experiment it was recognized that our score would not be very large: the queries specified rather specific what to return and what not. Because we used all words from the query to find our results we knew on forehand that we would find the undesired results as well. However, we felt that it was nice to check out if we could deliver better results than the worst participants with this approach.
TL;DR: A simple dictionary-based method is used to translate the French query into a bag of weighted English words, the English query, which is submitted to the SMART retrieval engine.
Abstract: In this paper, we describe our approach to the French English Bilingual Task in CLEF 2001. A simple dictionary-based method is used to translate the French query into a bag of weighted English words, the English query, which is submitted to the SMART retrieval engine. Despite the simplicity of the method, the results happen to be reasonable.
TL;DR: This experiment compared the effectiveness of several approaches in Chinese-English cross-language information retrieval and proposed five models that augmented restriction terms to the original queries to restrict the use of query terms in the target language.
Abstract: This paper reports the work of NTU in the bilingual-retrieval task at CLEF 2001. In this experiment, we compared the effectiveness of several approaches in Chinese-English cross-language information retrieval. Five models were proposed. Model 1 used co-occurrence information in the target language to disambiguate translation equivalents; Model 2 augmented restriction terms to the original queries to restrict the use of query terms in the target language; Model 3 used a Chinese-English WordNet to translate queries; Model 4 combined Model 3 with Model 2; Model 5 merged the queries constructed by Model 2 and 3.
TL;DR: The objective of this year’s clef participation has been to evaluate an improved German morpho-syntactic component, focusing on the impact decomposition information has on performance.
Abstract: The objective of this year’s clef participation has been to evaluate an improved German morpho-syntactic component, focusing on the impact decomposition information has on performance.
TL;DR: The tasks in CLEF 2000 involved a multilingual document collection in four core languages and several additional topic languages and the test collection was subsequently analyzed with respect to the completeness of the assessments in order to ensure the validity for future evaluation and benchmarking activities.
Abstract: The Cross-Language Evaluation Forum (CLEF) provides an infrastructure aimed at supporting the development, testing and evaluation of systems for cross-language information retrieval, and for monolingual information retrieval of European languages other than English. Originally started as a track at the TREC-6 conference, CLEF became an independent initiative in 2000 when the coordination moved to Europe. The diversity of languages commonly spoken in Europe has led to a multilingual, distributed setup that was chosen to best accommodate the unique linguistic properties of each language and the implications for topic development and relevance assessments. The tasks in CLEF 2000 involved a multilingual document collection in four core languages and several additional topic languages. Twenty groups participated in the campaign, submitting a wide range of experiments. The test collection was subsequently analyzed with respect to the completeness of the assessments in order to ensure the validity for future evaluation and benchmarking activities. CLEF will continue in 2001, with several additions to the individual tasks.
TL;DR: The work of NTU on bilingual-retrieval task at CLEF 2001 is reported, with the best one being Model 5, which is 53.06% of monolingual information retrieval.
Abstract: This paper reports the work of NTU on bilingual-retrieval task at CLEF 2001. We proposed five models. Model 1 used co-occurrence information to disambiguate translation equivalents; Model 2 augmented restriction terms to the original queries; Model 3 used C-E WordNet to translate queries; Model 4 combined Model 3 with Model 2; Model 5 merged the queries constructed by Model 2 and 3. The best one is Model 5. The average precision of Model 5 is 0.1135, which is 53.06% of monolingual information retrieval.
TL;DR: The experiments undertaken by the IRIT team in multilingual, bilingual and monolingual tasks at CLEF programme are presented and the general CLIR methodology, based on query translation, is described.
Abstract: 1 Summary This paper presents the experiments undertaken by our team (IRIT team) in multilingual, bilingual and monolingual tasks at CLEF programme. Our approach to CLIR is based on query translation. In bilingual experiment a dictionary is used to translate the queries from French t o English and two techniques for desambiguiation were tested: aligned corpus and dictionary strategy. D e s a m biguiation technique is applied to select the best terms from the (translated) targed queries. All these experiments were done using Mercure system 2] which is presented in section 2 of this paper. The section 3 describes our general CLIR methodology, and nally, section 4 describes experiments and results performed at CLEF programme. 2 Mercure model 2.1 Model description Mercure is an information retrieval system based on a connectionist approach and modelled by a m ulti-layered network. The network is composed of a query layer (set of query terms), a term layer representing the indexing terms and a document l a yer 1],,2]. Mercure includes the implementation of a retrieval process based on spreading activation forward and backward through the weighted links. Queries and documents can be either inputs or outputs of the network.The links between two l a yers are symmetric and their weights are based on the tf idf measure inspired from the OKAPII3] term weighting formula. the term-document l i n k w eights are expressed by: d ij = tf ij (h 1 + h 2 log(N ni)) h 3 + h 4 dlj d + h 5 tf ij (1) the query-term (at stage s) links are weighted as follows: q (s) ui = nquqtfui nqu;qtfui si (nq u > q t f ui) qtf ui otherwise (2)
TL;DR: The experiment findings indicate that Okapi, the text retrieval system in use, can successfully be used for non-English text retrieval and there is significant difference between French and English retrieval depending on the adaptation of indexing and search strategies in use.
Abstract: This paper presents work on document retrieval based on first time participation in the CLEF 2001 monolingual retrieval task using French. The experiment findings indicate that Okapi, the text retrieval system in use, can successfully be used for non-English text retrieval. A lot of internal preprocessing is required in the basic search system for conversion into Okapi access formats. Various shell scripts were written to achieve the conversion in a UNIX environment, failure of which would significantly have impeded the overall performance. Based on the experiment findings using Okapi-originally designed for English - it was clear that, although most European languages share conventional word boundaries and variant word morphemes formed by the addition of suffixes, there is significant difference between French and English retrieval depending on the adaptation of indexing and search strategies in use. No sophisticated method for higher recall and precision such as stemming techniques, phrase translation or de-compounding was employed for the experiment and our results were suggestively poor. Future participation would include more refined query translation tools.
TL;DR: The discussion includes an analysis of the first results and proposals for possible developments in the future of the CLEF (Cross-Language Evaluation Forum) series of evaluation campaigns for information retrieval systems operating on European languages.
Abstract: The goals of the CLEF (Cross-Language Evaluation Forum) series of evaluation campaigns for information retrieval systems operating on European languages are described. The difficulties of organizing an activity which aims at an objective evaluation of systems running on and over a number of different languages are examined. The discussion includes an analysis of the first results and proposals for possible developments in the future.
TL;DR: The treble clef motif as discussed by the authors is assembled around the central zinc ion and consists of a zinc knuckle, loop, β-hairpin and an α-helix.
Abstract: Detection of similarity is particularly difficult for small proteins and thus connections between many of them remain unnoticed. Structure and sequence analysis of several metal-binding proteins reveals unexpected similarities in structural domains classified as different protein folds in SCOP and suggests unification of seven folds that belong to two protein classes. The common motif, termed treble clef finger in this study, forms the protein structural core and is 25–45 residues long. The treble clef motif is assembled around the central zinc ion and consists of a zinc knuckle, loop, β-hairpin and an α-helix. The knuckle and the first turn of the helix each incorporate two zinc ligands. Treble clef domains constitute the core of many structures such as ribosomal proteins L24E and S14, RING fingers, protein kinase cysteine-rich domains, nuclear receptor-like fingers, LIM domains, phosphatidylinositol-3-phosphate-binding domains and His-Me finger endonucleases. The treble clef finger is a uniquely versatile motif adaptable for various functions. This small domain with a 25 residue structural core can accommodate eight different metal-binding sites and can have many types of functions from binding of nucleic acids, proteins and small molecules, to catalysis of phosphodiester bond hydrolysis. Treble clef motifs are frequently incorporated in larger structures or occur in doublets. Present analysis suggests that the treble clef motif defines a distinct structural fold found in proteins with diverse functional properties and forms one of the major zinc finger groups.
TL;DR: The organization of the CLEF 2001 evaluation campaign is described, the guidelines given to participants are outlined, and the techniques and measures used in CLEF campaigns for result calculation and analysis are explained.
Abstract: We describe the organization of the CLEF 2001 evaluation campaign, outline the guidelines given to participants, and explain the techniques and measures used in CLEF campaigns for result calculation and analysis.