Web object retrieval

doi:10.1145/1242572.1242584

Proceedings Article10.1145/1242572.1242584

Web object retrieval

Zaiqing Nie, +4 more

- 08 May 2007

- pp 81-90

147

TL;DR: This paper proposes several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieved model, and a hybrid model with both structured and unstructuring retrieval features, and concludes that the hybrid model is the superior by taking into account the extraction errors at varying levels.

Abstract: The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/1401890.1402008

ArnetMiner: extraction and mining of academic social networks

Jie Tang, +5 more

- 24 Aug 2008

TL;DR: The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.

...read moreread less

2.4K

•Journal Article•10.1016/J.WEBSEM.2008.06.001

YAGO: A Large Ontology from Wikipedia and WordNet

Fabian M. Suchanek, +2 more

- 01 Sep 2008

- Journal of Web Semantics

TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.

...read moreread less

1K

Journal Article•10.1016/j.websem.2008.06.001

YAGO: A Large Ontology from Wikipedia and WordNet

Fabian M. Suchanek, +2 more

- 01 Sep 2008

- Journal of Web Semantics

TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.

...read moreread less

261

•Proceedings Article

EntityRank: searching entities directly and holistically

Tao Cheng, +2 more

- 23 Sep 2007

TL;DR: This work focuses on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking.

...read moreread less

210

Proceedings Article•10.1145/1807085.1807097

From information to knowledge: harvesting entities and relationships from web sources

Gerhard Weikum, +1 more

- 06 Jun 2010

TL;DR: This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting, to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall.

...read moreread less

177

...

Expand

References

•Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

- 08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

29.9K

Data Mining - Concepts and Techniques.

Petra Perner

- 01 Jan 2002

14.6K

•Book

Modern Information Retrieval

Ricardo Baeza-Yates, +1 more

- 15 May 1999

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.

...read moreread less

11.6K

Proceedings Article•10.1145/312624.312647

A re-examination of text categorization methods

Yiming Yang, +1 more

- 01 Aug 1999

TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.

...read moreread less

3K

Proceedings Article•10.1145/1031171.1031181

Simple BM25 extension to multiple weighted fields

Stephen Robertson, +2 more

- 13 Nov 2004

TL;DR: This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents and proposes a much more intuitive alternative which weights term frequencies before the non-linear term frequency saturation function is applied.

...read moreread less

860

...

Expand

Web object retrieval

Chat with Paper

AI Agents for this Paper

Citations

ArnetMiner: extraction and mining of academic social networks

YAGO: A Large Ontology from Wikipedia and WordNet

YAGO: A Large Ontology from Wikipedia and WordNet

EntityRank: searching entities directly and holistically

From information to knowledge: harvesting entities and relationships from web sources

References

Data Mining: Concepts and Techniques

Data Mining - Concepts and Techniques.

Modern Information Retrieval

A re-examination of text categorization methods

Simple BM25 extension to multiple weighted fields

Related Papers (5)

DBpedia: a nucleus for a web of open data

Yago: a core of semantic knowledge

EntityRank: searching entities directly and holistically

Open information extraction from the web

Keyword searching and browsing in databases using BANKS