Top 804 papers published in the topic of Web query classification in 2009

Showing papers on "Web query classification published in 2009"

Journal Article•10.1145/1459352.1459357•

Web page classification: Features and algorithms

[...]

Xiaoguang Qi¹, Brian D. Davison¹•Institutions (1)

23 Feb 2009-ACM Computing Surveys

TL;DR: As work in Web page classification is reviewed, the importance of these Web-specific features and algorithms are noted, state-of-the-art practices are described, and the underlying assumptions behind the use of information from neighboring pages are tracked.

...read moreread less

Abstract: Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining Web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process.As we review work in Web page classification, we note the importance of these Web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages.

...read moreread less

551 citations

Proceedings Article•10.1109/ICDE.2009.77•

Keyword Search in Spatial Databases: Towards Searching by Document

[...]

Dongxiang Zhang¹, Yeow Meng Chee², Anirban Mondal³, Anthony K. H. Tung¹, Masaru Kitsuregawa³ - Show less +1 more•Institutions (3)

National University of Singapore¹, Nanyang Technological University², University of Tokyo³

29 Mar 2009

TL;DR: This work addresses a novel spatial keyword query called the m-closest keywords (mCK) query, which aims to find the spatially closest tuples which match m user-specified keywords, and introduces a new index called the bR*-tree, which is an extension of the R-tree.

...read moreread less

Abstract: This work addresses a novel spatial keyword query called the m-closest keywords (mCK) query. Given a database of spatial objects, each tuple is associated with some descriptive information represented in the form of keywords. The mCK query aims to find the spatially closest tuples which match m user-specified keywords. Given a set of keywords from a document, mCK query can be very useful in geotagging the document by comparing the keywords to other geotagged documents in a database. To answer mCK queries efficiently, we introduce a new index called the bR*-tree, which is an extension of the R*-tree. Based on bR*-tree, we exploit a priori-based search strategies to effectively reduce the search space. We also propose two monotone constraints, namely the distance mutex and keyword mutex, as our a priori properties to facilitate effective pruning. Our performance study demonstrates that our search strategy is indeed efficient in reducing query response time and demonstrates remarkable scalability in terms of the number of query keywords which is essential for our main application of searching by document.

...read moreread less

343 citations

Proceedings Article•10.1145/1526709.1526764•

Inverted index compression and query processing with optimized document ordering

[...]

Hao Yan¹, Shuai Ding¹, Torsten Suel²•Institutions (2)

New York University¹, Yahoo!²

20 Apr 2009

TL;DR: This work performs an extensive study of compression techniques for document IDs and presents new optimizations of existing techniques which can achieve significant improvement in both compression and decompression performances.

...read moreread less

Abstract: Web search engines use highly optimized compression schemes to decrease inverted index size and improve query throughput, and many index compression techniques have been studied in the literature. One approach taken by several recent studies first performs a renumbering of the document IDs in the collection that groups similar documents together, and then applies standard compression techniques. It is known that this can significantly improve index compression compared to a random document ordering. We study index compression and query processing techniques for such reordered indexes. Previous work has focused on determining the best possible ordering of documents. In contrast, we assume that such an ordering is already given, and focus on how to optimize compression methods and query processing for this case. We perform an extensive study of compression techniques for document IDs and present new optimizations of existing techniques which can achieve significant improvement in both compression and decompression performances. We also propose and evaluate techniques for compressing frequency values for this case. Finally, we study the effect of this approach on query processing performance. Our experiments show very significant improvements in index size and query processing speed on the TREC GOV2 collection of 25.2 million web pages.

...read moreread less

310 citations

Proceedings Article•10.3115/1690219.1690290•

Phrase Clustering for Discriminative Learning

[...]

Dekang Lin¹, Xiaoyun Wu¹•Institutions (1)

Google¹

2 Aug 2009

TL;DR: A simple and scalable algorithm for clustering tens of millions of phrases and using the resulting clusters as features in discriminative classifiers to demonstrate the power and generality of this approach.

...read moreread less

Abstract: We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification. Our results show that phrase clusters offer significant improvements over word clusters. Our NER system achieves the best current result on the widely used CoNLL benchmark. Our query classifier is on par with the best system in KDDCUP 2005 without resorting to labor intensive knowledge engineering efforts.

...read moreread less

270 citations

Journal Article•10.1007/S10791-008-9074-8•

Evaluation of query expansion using MeSH in PubMed

[...]

Zhiyong Lu¹, Won Kim¹, W. John Wilbur¹•Institutions (1)

National Institutes of Health¹

01 Feb 2009-Information Retrieval

TL;DR: Experimental results suggest that query expansion using MeSH in PubMed can generally improve retrieval performance, but the improvement may not affect end PubMed users in realistic situations.

...read moreread less

Abstract: This paper investigates the effectiveness of using MeSH® in PubMed through its automatic query expansion process: Automatic Term Mapping (ATM). We run Boolean searches based on a collection of 55 topics and about 160,000 MEDLINE® citations used in the 2006 and 2007 TREC Genomics Tracks. For each topic, we first automatically construct a query by selecting keywords from the question. Next, each query is expanded by ATM, which assigns different search tags to terms in the query. Three search tags: [MeSH Terms], [Text Words], and [All Fields] are chosen to be studied after expansion because they all make use of the MeSH field of indexed MEDLINE citations. Furthermore, we characterize the two different mechanisms by which the MeSH field is used. Retrieval results using MeSH after expansion are compared to those solely based on the words in MEDLINE title and abstracts. The aggregate retrieval performance is assessed using both F-measure and mean rank precision. Experimental results suggest that query expansion using MeSH in PubMed can generally improve retrieval performance, but the improvement may not affect end PubMed users in realistic situations.

...read moreread less

208 citations

Proceedings Article•10.1145/1631272.1631278•

Visual query suggestion

[...]

Zheng-Jun Zha¹, Linjun Yang², Tao Mei², Meng Wang², Zengfu Wang¹ - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

19 Oct 2009

TL;DR: This paper proposes a new query suggestion scheme named Visual Query Suggestion (VQS), which provides a more effective query interface to formulate an intent-specific query by joint text and image suggestions, and shows that VQS outperforms these engines in terms of both the quality of query suggestion and search performance.

...read moreread less

Abstract: Query suggestion is an effective approach to improve the usability of image search. Most existing search engines are able to automatically suggest a list of textual query terms based on users' current query input, which can be called Textual Query Suggestion. This paper proposes a new query suggestion scheme named Visual Query Suggestion (VQS) which is dedicated to image search. It provides a more effective query interface to formulate an intent-specific query by joint text and image suggestions. We show that VQS is able to more precisely and more quickly help users specify and deliver their search intents. When a user submits a text query, VQS first provides a list of suggestions, each containing a keyword and a collection of representative images in a dropdown menu. If the user selects one of the suggestions, the corresponding keyword will be added to complement the initial text query as the new text query, while the image collection will be formulated as the visual query. VQS then performs image search based on the new text query using text search techniques, as well as content-based visual retrieval to refine the search results by using the corresponding images as query examples. We compare VQS with three popular image search engines, and show that VQS outperforms these engines in terms of both the quality of query suggestion and search performance.

...read moreread less

198 citations

Proceedings Article•10.1145/1559845.1559966•

Keyword search on structured and semi-structured data

[...]

Yi Chen¹, Wei Wang², Ziyang Liu¹, Xuemin Lin²•Institutions (2)

Arizona State University¹, University of New South Wales²

29 Jun 2009

TL;DR: An overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation are given.

...read moreread less

Abstract: Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon keyword search, such as keyword based database selection, query generation, and analytical processing. Finally we identify the challenges and opportunities of future research to advance the field.

...read moreread less

188 citations

Journal Article•10.1145/1508857.1508858•

The design of a query monitoring system

[...]

Chaitanya Mishra¹, Nick Koudas¹•Institutions (1)

University of Toronto¹

23 Apr 2009-ACM Transactions on Database Systems

TL;DR: A query monitoring system from the ground up is presented, describing various new techniques for query monitoring, their implementation inside a real database system, and a novel interface that presents the observed and predicted information in an accessible manner.

...read moreread less

Abstract: Query monitoring refers to the problem of observing and predicting various parameters related to the execution of a query in a database system. In addition to being a useful tool for database users and administrators, it can also serve as an information collection service for resource allocation and adaptive query processing techniques. In this article, we present a query monitoring system from the ground up, describing various new techniques for query monitoring, their implementation inside a real database system, and a novel interface that presents the observed and predicted information in an accessible manner. To enable this system, we introduce several lightweight online techniques for progressively estimating and refining the cardinality of different relational operators using information collected at query execution time. These include binary and multiway joins as well as typical grouping operations and combinations thereof. We describe the various algorithms used to efficiently implement estimators and present the results of an evaluation of a prototype implementation of our framework in an open-source data management system. Our results demonstrate the feasibility and practical utility of the approach presented herein.

...read moreread less

180 citations

Patent•

Method and apparatus for creating and utilizing information representation of queries

[...]

Ian Oliver¹, Jukka Honkola¹, Juha-Pekka Luoma¹•Institutions (1)

Nokia¹

29 Sep 2009

TL;DR: In this paper, an approach for creating and utilizing information representation of queries is presented, where a query application receives a query and expresses the query as a resource description framework graph, and the query application causes at least in part storage of the query resource description graph.

...read moreread less

Abstract: An approach is provided for creating and utilizing information representation of queries. A query application receives a query. The query application expresses the query as a resource description framework graph. The query application causes at least in part storage of the query resource description framework graph.

...read moreread less

178 citations

Journal Article•10.1016/J.WEBSEM.2009.07.005•

From keywords to semantic queries-Incremental query construction on the semantic web

[...]

Gideon Zenz, Xuan Zhou¹, Enrico Minack, Wolf Siberski, Wolfgang Nejdl - Show less +1 more•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

01 Sep 2009-Journal of Web Semantics

TL;DR: The overall design of QUICK is described, the core algorithms to enable efficient query construction are presented, and the effectiveness of the system is demonstrated through an experimental study.

...read moreread less

176 citations

Patent•

System and Method for Combining Geographic Metadata in Automatic Speech Recognition Language and Acoustic Models

[...]

Enrico Bocchieri¹, Diamantino Caseiro¹•Institutions (1)

AT&T¹

15 Dec 2009

TL;DR: In this paper, a spoken search query is received by a portable device and the portable device then determines its present location, that information is incorporated into a local language model that is used to process the search query.

...read moreread less

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.

...read moreread less

Patent•

Accessing media data using metadata repository

[...]

Walter Chang¹, Michael J. Welch¹•Institutions (1)

Adobe Systems¹

13 Nov 2009

TL;DR: In this article, a computer-implemented method includes parsing a user query to determine whether the user query assigns a field to the first term, parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, search identifying a set of candidate scenes from the video content.

...read moreread less

Abstract: A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.

...read moreread less

Proceedings Article•10.1145/1559845.1559902•

Query by output

[...]

Quoc Trung Tran¹, Chee-Yong Chan¹, Srinivasan Parthasarathy²•Institutions (2)

National University of Singapore¹, Ohio State University²

29 Jun 2009

TL;DR: This paper presents a novel data-driven approach, called Query By Output (QBO), which can enhance the usability of database systems and designs several optimization techniques to reduce processing overhead and introduce a set of criteria to rank order output queries by various notions of utility.

...read moreread less

Abstract: It has recently been asserted that the usability of a database is as important as its capability. Understanding the database schema, the hidden relationships among attributes in the data all play an important role in this context. Subscribing to this viewpoint, in this paper, we present a novel data-driven approach, called Query By Output (QBO), which can enhance the usability of database systems. The central goal of QBO is as follows: given the output of some query Q on a database D, denoted by Q(D), we wish to construct an alternative query Q′ such that Q(D) and Q′ (D) are instance-equivalent. To generate instance-equivalent queries from Q(D), we devise a novel data classification-based technique that can handle the at-least-one semantics that is inherent in the query derivation. In addition to the basic framework, we design several optimization techniques to reduce processing overhead and introduce a set of criteria to rank order output queries by various notions of utility. Our framework is evaluated comprehensively on three real data sets and the results show that the instance-equivalent queries we obtain are interesting and that the approach is scalable and robust to queries of different selectivities.

...read moreread less

Book Chapter•10.1007/978-3-642-02279-1_2•

Query Recommendations for Interactive Database Exploration

[...]

Gloria Chatzopoulou¹, Magdalini Eirinaki², Neoklis Polyzotis³•Institutions (3)

University of California, Riverside¹, San Jose State University², University of California, Santa Cruz³

2 Jun 2009

TL;DR: The idea is to track the querying behavior of each user, identify which parts of the database may be of interest for the corresponding data analysis task, and recommend queries that retrieve relevant data.

...read moreread less

Abstract: Relational database systems are becoming increasingly popular in the scientific community to support the interactive exploration of large volumes of data. In this scenario, users employ a query interface (typically, a web-based client) to issue a series of SQL queries that aim to analyze the data and mine it for interesting information. First-time users, however, may not have the necessary knowledge to know where to start their exploration. Other times, users may simply overlook queries that retrieve important information. To assist users in this context, we draw inspiration from Web recommender systems and propose the use of personalized query recommendations. The idea is to track the querying behavior of each user, identify which parts of the database may be of interest for the corresponding data analysis task, and recommend queries that retrieve relevant data. We discuss the main challenges in this novel application of recommendation systems, and outline a possible solution based on collaborative filtering. Preliminary experimental results on real user traces demonstrate that our framework can generate effective query recommendations.

...read moreread less

Proceedings Article•10.1109/ICDE.2009.71•

Web Query Recommendation via Sequential Query Prediction

[...]

Qi He¹, Daxin Jiang², Zhen Liao², Steven C. H. Hoi¹, Kuiyu Chang¹, Ee-Peng Lim¹, Hang Li² - Show less +3 more•Institutions (2)

Nanyang Technological University¹, Microsoft²

29 Mar 2009

TL;DR: A novel "sequential query prediction" approach that tries to grasp a user's search intent based on his/her past query sequence and its resemblance to historical query sequence models mined from massive search engine logs is proposed.

...read moreread less

Abstract: Web query recommendation has long been considered a key feature of search engines. Building a good Web query recommendation system, however, is very difficult due to the fundamental challenge of predicting users' search intent, especially given the limited user context information. In this paper, we propose a novel "sequential query prediction" approach that tries to grasp a user's search intent based on his/her past query sequence and its resemblance to historical query sequence models mined from massive search engine logs. Different query sequence models were examined, including the naive variable length N-gram model, Variable Memory Markov (VMM) model, and our proposed Mixture Variable Memory Markov (MVMM) model. Extensive experiments were conducted to benchmark our sequence prediction algorithms against two conventional pairwise approaches on large-scale search logs extracted from a commercial search engine. Results show that the sequence-wise approaches significantly outperform the conventional pair-wise ones in terms of prediction accuracy. In particular, our MVMM approach, consistently leads the pack, making it an effective and practical approach towards Web query recommendation.

...read moreread less

Journal Article•10.1109/TKDE.2008.113•

A Relation-Based Page Rank Algorithm for Semantic Web Search Engines

[...]

Fabrizio Lamberti, Andrea Sanna, Claudio Giovanni Demartini

01 Jan 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes a relation-based page rank algorithm to be used in conjunction with semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources.

...read moreread less

Abstract: With the tremendous growth of information available to end users through the Web, search engines come to play ever a more critical role. Nevertheless, because of their general-purpose approach, it is always less uncommon that obtained result sets provide a burden of useless pages. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing overcoming this limitation. Several search engines have been proposed, which allow increasing information retrieval accuracy by exploiting a key content of semantic Web resources, that is, relations. However, in order to rank results, most of the existing solutions need to work on the whole annotated knowledge base. In this paper, we propose a relation-based page rank algorithm to be used in conjunction with semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources. Relevance is measured as the probability that a retrieved resource actually contains those relations whose existence was assumed by the user at the time of query definition.

...read moreread less

Journal Article•10.1007/S00778-008-0117-Y•

Multi-dimensional top-k dominating queries

[...]

Man Lung Yiu¹, Nikos Mamoulis²•Institutions (2)

Aalborg University¹, University of Hong Kong²

1 Jun 2009

TL;DR: An extensive study on the evaluation of top-k dominating queries, which proposes a set of algorithms that apply on indexed multi-dimensional data and investigates query evaluation on data that are not indexed.

...read moreread less

Abstract: The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of top-k dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness of its results on real data.

...read moreread less

Proceedings Article•10.1145/1516360.1516459•

Interactive query refinement

[...]

Chaitanya Mishra¹, Nick Koudas¹•Institutions (1)

University of Toronto¹

24 Mar 2009

TL;DR: This work formalizes the problem of query refinement and proposes a framework to support it in a database system, and introduces an interactive model of refinement that incorporates user feedback to best capture user preferences.

...read moreread less

Abstract: We investigate the problem of refining SQL queries to satisfy cardinality constraints on the query result. This has applications to the many/few answers problems often faced by database users. We formalize the problem of query refinement and propose a framework to support it in a database system. We introduce an interactive model of refinement that incorporates user feedback to best capture user preferences. Our techniques are designed to handle queries having range and equality predicates on numerical and categorical attributes. We present an experimental evaluation of our framework implemented in an open source data manager and demonstrate the feasibility and practical utility of our approach.

...read moreread less

Proceedings Article•10.1145/1559845.1559918•

Efficient type-ahead search on relational data: a TASTIER approach

[...]

Guoliang Li¹, Shengyue Ji², Chen Li², Jianhua Feng¹•Institutions (2)

Tsinghua University¹, University of California, Irvine²

29 Jun 2009

TL;DR: A novel approach to keyword search in the relational world, called Tastier, which proposes efficient index structures and algorithms for finding relevant answers on-the-fly by joining tuples in the database and devise a partition-based method to improve query performance.

...read moreread less

Abstract: Existing keyword-search systems in relational databases require users to submit a complete query to compute answers. Often users feel "left in the dark" when they have limited knowledge about the data, and have to use a try-and-see approach for modifying queries and finding answers. In this paper we propose a novel approach to keyword search in the relational world, called Tastier. A Tastier system can bring instant gratification to users by supporting type-ahead search, which finds answers "on the fly" as the user types in query keywords. A main challenge is how to achieve a high interactive speed for large amounts of data in multiple tables, so that a query can be answered efficiently within milliseconds. We propose efficient index structures and algorithms for finding relevant answers on-the-fly by joining tuples in the database. We devise a partition-based method to improve query performance by grouping highly relevant tuples and pruning irrelevant tuples efficiently. We also develop a technique to answer a query efficiently by predicting the highly relevant complete queries for the user. We have conducted a thorough experimental evaluation of the proposed techniques on real data sets to demonstrate the efficiency and practicality of this new search paradigm.

...read moreread less

Proceedings Article•10.1145/1498759.1498806•

Query by document

[...]

Yin Yang¹, Nilesh Bansal², Wisam Dakka³, Panagiotis G. Ipeirotis⁴, Nick Koudas², Dimitris Papadias¹ - Show less +2 more•Institutions (4)

Hong Kong University of Science and Technology¹, University of Toronto², Google³, New York University⁴

9 Feb 2009

TL;DR: This paper introduces methodologies to extract phrases from a given "query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document and considers two techniques to extract and score key phrases.

...read moreread less

Abstract: We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa.In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given "query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular, we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank for this purpose.We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons's Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document.

...read moreread less

Proceedings Article•10.1145/1557019.1557123•

Exploring social tagging graph for web object classification

[...]

Zhijun Yin¹, Rui Li¹, Qiaozhu Mei¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Jun 2009

TL;DR: An efficient algorithm is proposed which not only utilizes social tags as enriched semantic features for the objects, but also infers the categories of unlabeled objects from both homogeneous and heterogeneous labeled objects, through the implicit connection of social tags.

...read moreread less

Abstract: This paper studies web object classification problem with the novel exploration of social tags. Automatically classifying web objects into manageable semantic categories has long been a fundamental preprocess for indexing, browsing, searching, and mining these objects. The explosive growth of heterogeneous web objects, especially non-textual objects such as products, pictures, and videos, has made the problem of web classification increasingly challenging. Such objects often suffer from a lack of easy-extractable features with semantic information, interconnections between each other, as well as training examples with category labels.In this paper, we explore the social tagging data to bridge this gap. We cast web object classification problem as an optimization problem on a graph of objects and tags. We then propose an efficient algorithm which not only utilizes social tags as enriched semantic features for the objects, but also infers the categories of unlabeled objects from both homogeneous and heterogeneous labeled objects, through the implicit connection of social tags. Experiment results show that the exploration of social tags effectively boosts web object classification. Our algorithm significantly outperforms the state-of-the-art of general classification methods.

...read moreread less

Book Chapter•

Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine

[...]

Abraham Bernstein, Esther Kaufmann, Christoph Kiefer, Simon Clematide, Manfred Klenner, Martin Volk - Show less +2 more

1 Jan 2009

TL;DR: This work presents Ginseng, a quasi natural language guided query interface to the Semantic Web which relies on a simple question grammar which gets dynamically extended by the structure of an ontology to guide users in formulating queries in a language seemingly akin to English.

...read moreread less

Abstract: The Semantic Web presents the vision of a distributed, dynamically growing knowledge base founded on formal logic. Common users, however, seem to have problems even with the simplest Boolean expression. As queries from web search engines show, the great majority of users simply do not use Boo- lean expressions. So how can we help users to query a web of logic that they do not seem to understand? We address this problem by presenting Ginseng, a quasi natural language guided query interface to the Semantic Web. Ginseng relies on a simple question grammar which gets dynamically extended by the structure of an ontology to guide users in formulating queries in a language seemingly akin to English. Based on the grammar Ginseng then translates the queries into a Semantic Web query language (RDQL), which allows their execution. Our evaluation with 20 users shows that Ginseng is extremely simple to use without any training (as opposed to any logic-based querying approach) resulting in very good query per- formance (precision = 92.8%, recall = 98.4%). We, furthermore, found that even with its simple gram- mar/approach Ginseng could process over 40% of questions from a query corpus without modification.

...read moreread less

Journal Article•10.1016/J.IPM.2008.07.001•

Using query expansion in graph-based approach for query-focused multi-document summarization

[...]

Lin Zhao¹, Lide Wu¹, Xuanjing Huang¹•Institutions (1)

Fudan University¹

01 Jan 2009-Information Processing and Management

TL;DR: A novel query expansion method is presented, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query.

...read moreread less

Abstract: This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.

...read moreread less

Patent•

Suggesting related search queries during web browsing

[...]

Ryen W. White¹, Robert L. Rounthwaite¹, Silviu Cucerzan¹•Institutions (1)

Microsoft¹

21 Sep 2009

TL;DR: In this paper, the authors present a method for generating suggested queries for web pages that are not search engine results pages, based upon the URL and/or content of a currently displayed page.

...read moreread less

Abstract: Described is the presenting of suggested queries for web pages that are not search engine results pages, based upon the URL and/or content of a currently displayed page. The suggested query set may be dynamically extracted (locally or remotely) based upon the content of the web page, and/or obtained from a data store of per-URL suggested query sets, e.g., generated from historical logs. Also described are various techniques for generating suggested queries, and user interface mechanisms that display and allow interaction with suggested queries.

...read moreread less

Book Chapter•10.1007/978-3-642-04930-9_27•

Learning Semantic Query Suggestions

[...]

Edgar Meij¹, Marc Bron¹, Laura Hollink², Bouke Huurnink¹, Maarten de Rijke¹ - Show less +1 more•Institutions (2)

University of Amsterdam¹, VU University Amsterdam²

6 Nov 2009

TL;DR: This paper uses a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features for semantic query suggestion, and evaluates the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts.

...read moreread less

Abstract: An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide facilities that let users complete, specify, or reformulate their queries. We study the problem of semantic query suggestion , a special type of query transformation based on identifying semantic concepts contained in user queries. We use a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features. We apply our method to the task of linking queries from real-world query logs (the transaction logs of the Netherlands Institute for Sound and Vision) to the DBpedia knowledge base. We evaluate the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts using a manually developed test bed and show significant improvements over an already high baseline. The resources developed for this paper, i.e., queries, human assessments, and extracted features, are available for download.

...read moreread less

Journal Article•10.1016/J.WEBSEM.2009.08.001•

Semplore: A scalable IR approach to search the Web of Data

[...]

Haofen Wang¹, Qiaoling Liu¹, Thomas Penin¹, Linyun Fu¹, Lei Zhang², Thanh Tran³, Yong Yu¹, Yue Pan² - Show less +4 more•Institutions (3)

Shanghai Jiao Tong University¹, IBM², Karlsruhe Institute of Technology³

01 Sep 2009-Journal of Web Semantics

TL;DR: The experimental results show that Semplore is an efficient and effective system for searching the Web of Data and can be used as a basic infrastructure for Web-scale Semantic Web search engines.

...read moreread less

Proceedings Article•10.1109/WI-IAT.2009.34•

From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns

[...]

Paolo Boldi¹, Francesco Bonchi², Carlos Castillo², Sebastiano Vigna¹•Institutions (2)

University of Milan¹, Yahoo!²

15 Sep 2009

TL;DR: An accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92\% accuracy is built and it is demonstrated that the reformulation classifier leads to improved recommendations in a query recommendation system.

...read moreread less

Abstract: Understanding query reformulation patterns is a key step towards next generation web search engines: it can help improving users' web-search experience by predicting their intent, and thus helping them to locate information more effectively. As a step in this direction, we build an accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92\% accuracy. We apply the model to automatically label two large query logs, creating annotated query-flow graphs. We study the resulting reformulation patterns, finding results consistent with previous studies done on smaller manually annotated datasets, and discovering new interesting patterns, including connections between reformulation types and topical categories. Finally, applying our findings to a third query log that is publicly available for research purposes, we demonstrate that our reformulation classifier leads to improved recommendations in a query recommendation system.

...read moreread less

Patent•

SYSTEM FOR FINDING QUERIES AIMING AT TAIL URLs

[...]

Xiaoxin Yin¹, Vijay Ravindran Nair¹, Ryan Stewart¹, Fang Liu¹, Junhua Wang¹, Tiffany Kumi Dohzen¹, Yi-Min Wang¹ - Show less +3 more•Institutions (1)

Microsoft¹

9 Jan 2009

TL;DR: In this article, a query prediction model can be constructed from a set of training data (e.g., diagnostic data obtained from an automatic diagnostic system and/or other suitable data) using a machine learning-based technique.

...read moreread less

Abstract: Systems and methodologies for improved query classification and processing are provided herein. As described herein, a query prediction model can be constructed from a set of training data (e.g., diagnostic data obtained from an automatic diagnostic system and/or other suitable data) using a machine learning-based technique. Subsequently upon receiving a query, a set of features corresponding to the query, such as the length and/or frequency of the query, unigram probabilities of respective words and/or groups of words in the query, presence of pre-designated words or phrases in the query, or the like, can be generated. The generated features can then be analyzed in combination with the query prediction model to classify the query by predicting whether the query is aimed at a head Uniform Resource Locator (URL) or a tail URL. Based on this prediction, an appropriate index or combination of indexes can be assigned to answer the query.

...read moreread less

Patent•

Visual and Textual Query Suggestion

[...]

Linjun Yang¹, Meng Wang¹, Zheng-Jun Zha¹, Tao Mei¹, Xian-Sheng Hua¹ - Show less +1 more•Institutions (1)

Microsoft¹

11 Feb 2009

TL;DR: In this paper, a set of images associated with one of these keywords are clustered into multiple groups and a representative image of each cluster is determined based on user selection of a keyword and representative image.

...read moreread less

Abstract: Techniques described herein enable better understanding of the intent of a user that submits a particular search query. These techniques receive a search request for images associated with a particular query. In response, the techniques determine images that are associated with the query, as well as other keywords that are associated with these images. The techniques then cluster, for each set of images associated with one of these keywords, the set of images into multiple groups. The techniques then rank the images and determine a representative image of each cluster. Finally, the tools suggest, to the user that submitted the query, to refine the search based on user selection of a keyword and a representative image. Thus, the techniques better understand the user's intent by allowing the user to refine the search based on another keyword and based on an image on which the user wishes to focus the search.

...read moreread less

Proceedings Article•10.1109/AST.2009.24•

Concept Based Query Expansion Using WordNet

[...]

Jiuling Zhang¹, Beixing Deng¹, Xing Li¹•Institutions (1)

Tsinghua University¹

7 Mar 2009

TL;DR: This paper proposed a new query expansion technique using the comprehensive thesaurus WordNet and its semantic relatedness measure modules, demonstrating a 7% precision improvement over retrieval methods not employing query expansion techniques.

...read moreread less

Abstract: Query expansion is a widely studied technique for improving information retrieval effectiveness. In this paper we proposed a new query expansion technique using the comprehensive thesaurus WordNet and its semantic relatedness measure modules. Word sense disambiguation are performed on original query sentence, yielding the concept of each term in the query. Based on those recovered concepts, expanded query terms are generated from WordNet lexical database. The proposed method has been evaluated in document retrieval on the Web using query sentence. Our extensive experimental results demonstrate a 7% precision improvement over retrieval methods not employing query expansion techniques.

...read moreread less

...

Expand