Proceedings Article10.1145/1242572.1242584
Web object retrieval
Zaiqing Nie,Yunxiao Ma,Shuming Shi,Ji-Rong Wen,Wei-Ying Ma +4 more
- 08 May 2007
- pp 81-90
TL;DR: This paper proposes several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieved model, and a hybrid model with both structured and unstructuring retrieval features, and concludes that the hybrid model is the superior by taking into account the extraction errors at varying levels.
read more
Abstract: The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
ArnetMiner: extraction and mining of academic social networks
Jie Tang,Jing Zhang,Limin Yao,Juanzi Li,Li Zhang,Zhong Su +5 more
- 24 Aug 2008
TL;DR: The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.
2.4K
YAGO: A Large Ontology from Wikipedia and WordNet
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.
1K
YAGO: A Large Ontology from Wikipedia and WordNet
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.
261
•Proceedings Article
EntityRank: searching entities directly and holistically
Tao Cheng,Xifeng Yan,Kevin Chen-Chuan Chang +2 more
- 23 Sep 2007
TL;DR: This work focuses on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking.
From information to knowledge: harvesting entities and relationships from web sources
Gerhard Weikum,Martin Theobald +1 more
- 06 Jun 2010
TL;DR: This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting, to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall.
References
•Book
Data Mining: Concepts and Techniques
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
•Book
Modern Information Retrieval
Ricardo Baeza-Yates,Berthier Ribeiro-Neto +1 more
- 15 May 1999
TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
A re-examination of text categorization methods
Yiming Yang,Xin Liu +1 more
- 01 Aug 1999
TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.
3K
Simple BM25 extension to multiple weighted fields
Stephen Robertson,Hugo Zaragoza,Michael J. Taylor +2 more
- 13 Nov 2004
TL;DR: This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents and proposes a much more intuitive alternative which weights term frequencies before the non-linear term frequency saturation function is applied.
860
Related Papers (5)
Fabian M. Suchanek,Gjergji Kasneci,Gerhard Weikum +2 more
- 08 May 2007
Tao Cheng,Xifeng Yan,Kevin Chen-Chuan Chang +2 more
- 23 Sep 2007