Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Web query classification
  4. 2009
  1. Home
  2. Topics
  3. Web query classification
  4. 2009
Showing papers on "Web query classification published in 2009"
Journal Article•10.1145/1459352.1459357•
Web page classification: Features and algorithms

[...]

Xiaoguang Qi1, Brian D. Davison1•
Lehigh University1
23 Feb 2009-ACM Computing Surveys
TL;DR: As work in Web page classification is reviewed, the importance of these Web-specific features and algorithms are noted, state-of-the-art practices are described, and the underlying assumptions behind the use of information from neighboring pages are tracked.
Abstract: Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining Web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process.As we review work in Web page classification, we note the importance of these Web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages.

551 citations

Proceedings Article•10.1109/ICDE.2009.77•
Keyword Search in Spatial Databases: Towards Searching by Document

[...]

Dongxiang Zhang1, Yeow Meng Chee2, Anirban Mondal3, Anthony K. H. Tung1, Masaru Kitsuregawa3 •
National University of Singapore1, Nanyang Technological University2, University of Tokyo3
29 Mar 2009
TL;DR: This work addresses a novel spatial keyword query called the m-closest keywords (mCK) query, which aims to find the spatially closest tuples which match m user-specified keywords, and introduces a new index called the bR*-tree, which is an extension of the R-tree.
Abstract: This work addresses a novel spatial keyword query called the m-closest keywords (mCK) query. Given a database of spatial objects, each tuple is associated with some descriptive information represented in the form of keywords. The mCK query aims to find the spatially closest tuples which match m user-specified keywords. Given a set of keywords from a document, mCK query can be very useful in geotagging the document by comparing the keywords to other geotagged documents in a database. To answer mCK queries efficiently, we introduce a new index called the bR*-tree, which is an extension of the R*-tree. Based on bR*-tree, we exploit a priori-based search strategies to effectively reduce the search space. We also propose two monotone constraints, namely the distance mutex and keyword mutex, as our a priori properties to facilitate effective pruning. Our performance study demonstrates that our search strategy is indeed efficient in reducing query response time and demonstrates remarkable scalability in terms of the number of query keywords which is essential for our main application of searching by document.

343 citations

Proceedings Article•10.1145/1526709.1526764•
Inverted index compression and query processing with optimized document ordering

[...]

Hao Yan1, Shuai Ding1, Torsten Suel2•
New York University1, Yahoo!2
20 Apr 2009
TL;DR: This work performs an extensive study of compression techniques for document IDs and presents new optimizations of existing techniques which can achieve significant improvement in both compression and decompression performances.
Abstract: Web search engines use highly optimized compression schemes to decrease inverted index size and improve query throughput, and many index compression techniques have been studied in the literature. One approach taken by several recent studies first performs a renumbering of the document IDs in the collection that groups similar documents together, and then applies standard compression techniques. It is known that this can significantly improve index compression compared to a random document ordering. We study index compression and query processing techniques for such reordered indexes. Previous work has focused on determining the best possible ordering of documents. In contrast, we assume that such an ordering is already given, and focus on how to optimize compression methods and query processing for this case. We perform an extensive study of compression techniques for document IDs and present new optimizations of existing techniques which can achieve significant improvement in both compression and decompression performances. We also propose and evaluate techniques for compressing frequency values for this case. Finally, we study the effect of this approach on query processing performance. Our experiments show very significant improvements in index size and query processing speed on the TREC GOV2 collection of 25.2 million web pages.

310 citations

Proceedings Article•10.3115/1690219.1690290•
Phrase Clustering for Discriminative Learning

[...]

Dekang Lin1, Xiaoyun Wu1•
Google1
2 Aug 2009
TL;DR: A simple and scalable algorithm for clustering tens of millions of phrases and using the resulting clusters as features in discriminative classifiers to demonstrate the power and generality of this approach.
Abstract: We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification. Our results show that phrase clusters offer significant improvements over word clusters. Our NER system achieves the best current result on the widely used CoNLL benchmark. Our query classifier is on par with the best system in KDDCUP 2005 without resorting to labor intensive knowledge engineering efforts.

270 citations

Journal Article•10.1007/S10791-008-9074-8•
Evaluation of query expansion using MeSH in PubMed

[...]

Zhiyong Lu1, Won Kim1, W. John Wilbur1•
National Institutes of Health1
01 Feb 2009-Information Retrieval
TL;DR: Experimental results suggest that query expansion using MeSH in PubMed can generally improve retrieval performance, but the improvement may not affect end PubMed users in realistic situations.
Abstract: This paper investigates the effectiveness of using MeSH® in PubMed through its automatic query expansion process: Automatic Term Mapping (ATM). We run Boolean searches based on a collection of 55 topics and about 160,000 MEDLINE® citations used in the 2006 and 2007 TREC Genomics Tracks. For each topic, we first automatically construct a query by selecting keywords from the question. Next, each query is expanded by ATM, which assigns different search tags to terms in the query. Three search tags: [MeSH Terms], [Text Words], and [All Fields] are chosen to be studied after expansion because they all make use of the MeSH field of indexed MEDLINE citations. Furthermore, we characterize the two different mechanisms by which the MeSH field is used. Retrieval results using MeSH after expansion are compared to those solely based on the words in MEDLINE title and abstracts. The aggregate retrieval performance is assessed using both F-measure and mean rank precision. Experimental results suggest that query expansion using MeSH in PubMed can generally improve retrieval performance, but the improvement may not affect end PubMed users in realistic situations.

208 citations

Proceedings Article•10.1145/1631272.1631278•
Visual query suggestion

[...]

Zheng-Jun Zha1, Linjun Yang2, Tao Mei2, Meng Wang2, Zengfu Wang1 •
University of Science and Technology of China1, Microsoft2
19 Oct 2009
TL;DR: This paper proposes a new query suggestion scheme named Visual Query Suggestion (VQS), which provides a more effective query interface to formulate an intent-specific query by joint text and image suggestions, and shows that VQS outperforms these engines in terms of both the quality of query suggestion and search performance.
Abstract: Query suggestion is an effective approach to improve the usability of image search. Most existing search engines are able to automatically suggest a list of textual query terms based on users' current query input, which can be called Textual Query Suggestion. This paper proposes a new query suggestion scheme named Visual Query Suggestion (VQS) which is dedicated to image search. It provides a more effective query interface to formulate an intent-specific query by joint text and image suggestions. We show that VQS is able to more precisely and more quickly help users specify and deliver their search intents. When a user submits a text query, VQS first provides a list of suggestions, each containing a keyword and a collection of representative images in a dropdown menu. If the user selects one of the suggestions, the corresponding keyword will be added to complement the initial text query as the new text query, while the image collection will be formulated as the visual query. VQS then performs image search based on the new text query using text search techniques, as well as content-based visual retrieval to refine the search results by using the corresponding images as query examples. We compare VQS with three popular image search engines, and show that VQS outperforms these engines in terms of both the quality of query suggestion and search performance.

198 citations

Proceedings Article•10.1145/1559845.1559966•
Keyword search on structured and semi-structured data

[...]

Yi Chen1, Wei Wang2, Ziyang Liu1, Xuemin Lin2•
Arizona State University1, University of New South Wales2
29 Jun 2009
TL;DR: An overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation are given.
Abstract: Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon keyword search, such as keyword based database selection, query generation, and analytical processing. Finally we identify the challenges and opportunities of future research to advance the field.

188 citations

Journal Article•10.1145/1508857.1508858•
The design of a query monitoring system

[...]

Chaitanya Mishra1, Nick Koudas1•
University of Toronto1
23 Apr 2009-ACM Transactions on Database Systems
TL;DR: A query monitoring system from the ground up is presented, describing various new techniques for query monitoring, their implementation inside a real database system, and a novel interface that presents the observed and predicted information in an accessible manner.
Abstract: Query monitoring refers to the problem of observing and predicting various parameters related to the execution of a query in a database system. In addition to being a useful tool for database users and administrators, it can also serve as an information collection service for resource allocation and adaptive query processing techniques. In this article, we present a query monitoring system from the ground up, describing various new techniques for query monitoring, their implementation inside a real database system, and a novel interface that presents the observed and predicted information in an accessible manner. To enable this system, we introduce several lightweight online techniques for progressively estimating and refining the cardinality of different relational operators using information collected at query execution time. These include binary and multiway joins as well as typical grouping operations and combinations thereof. We describe the various algorithms used to efficiently implement estimators and present the results of an evaluation of a prototype implementation of our framework in an open-source data management system. Our results demonstrate the feasibility and practical utility of the approach presented herein.

180 citations

Patent•
Method and apparatus for creating and utilizing information representation of queries

[...]

Ian Oliver1, Jukka Honkola1, Juha-Pekka Luoma1•
Nokia1
29 Sep 2009
TL;DR: In this paper, an approach for creating and utilizing information representation of queries is presented, where a query application receives a query and expresses the query as a resource description framework graph, and the query application causes at least in part storage of the query resource description graph.
Abstract: An approach is provided for creating and utilizing information representation of queries. A query application receives a query. The query application expresses the query as a resource description framework graph. The query application causes at least in part storage of the query resource description framework graph.

178 citations

Journal Article•10.1016/J.WEBSEM.2009.07.005•
From keywords to semantic queries-Incremental query construction on the semantic web

[...]

Gideon Zenz, Xuan Zhou1, Enrico Minack, Wolf Siberski, Wolfgang Nejdl •
Commonwealth Scientific and Industrial Research Organisation1
01 Sep 2009-Journal of Web Semantics
TL;DR: The overall design of QUICK is described, the core algorithms to enable efficient query construction are presented, and the effectiveness of the system is demonstrated through an experimental study.

176 citations

Patent•
System and Method for Combining Geographic Metadata in Automatic Speech Recognition Language and Acoustic Models

[...]

Enrico Bocchieri1, Diamantino Caseiro1•
AT&T1
15 Dec 2009
TL;DR: In this paper, a spoken search query is received by a portable device and the portable device then determines its present location, that information is incorporated into a local language model that is used to process the search query.
Abstract: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.
Patent•
Accessing media data using metadata repository

[...]

Walter Chang1, Michael J. Welch1•
Adobe Systems1
13 Nov 2009
TL;DR: In this article, a computer-implemented method includes parsing a user query to determine whether the user query assigns a field to the first term, parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, search identifying a set of candidate scenes from the video content.
Abstract: A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.
Proceedings Article•10.1145/1559845.1559902•
Query by output

[...]

Quoc Trung Tran1, Chee-Yong Chan1, Srinivasan Parthasarathy2•
National University of Singapore1, Ohio State University2
29 Jun 2009
TL;DR: This paper presents a novel data-driven approach, called Query By Output (QBO), which can enhance the usability of database systems and designs several optimization techniques to reduce processing overhead and introduce a set of criteria to rank order output queries by various notions of utility.
Abstract: It has recently been asserted that the usability of a database is as important as its capability. Understanding the database schema, the hidden relationships among attributes in the data all play an important role in this context. Subscribing to this viewpoint, in this paper, we present a novel data-driven approach, called Query By Output (QBO), which can enhance the usability of database systems. The central goal of QBO is as follows: given the output of some query Q on a database D, denoted by Q(D), we wish to construct an alternative query Q′ such that Q(D) and Q′ (D) are instance-equivalent. To generate instance-equivalent queries from Q(D), we devise a novel data classification-based technique that can handle the at-least-one semantics that is inherent in the query derivation. In addition to the basic framework, we design several optimization techniques to reduce processing overhead and introduce a set of criteria to rank order output queries by various notions of utility. Our framework is evaluated comprehensively on three real data sets and the results show that the instance-equivalent queries we obtain are interesting and that the approach is scalable and robust to queries of different selectivities.
Book Chapter•10.1007/978-3-642-02279-1_2•
Query Recommendations for Interactive Database Exploration

[...]

Gloria Chatzopoulou1, Magdalini Eirinaki2, Neoklis Polyzotis3•
University of California, Riverside1, San Jose State University2, University of California, Santa Cruz3
2 Jun 2009
TL;DR: The idea is to track the querying behavior of each user, identify which parts of the database may be of interest for the corresponding data analysis task, and recommend queries that retrieve relevant data.
Abstract: Relational database systems are becoming increasingly popular in the scientific community to support the interactive exploration of large volumes of data. In this scenario, users employ a query interface (typically, a web-based client) to issue a series of SQL queries that aim to analyze the data and mine it for interesting information. First-time users, however, may not have the necessary knowledge to know where to start their exploration. Other times, users may simply overlook queries that retrieve important information. To assist users in this context, we draw inspiration from Web recommender systems and propose the use of personalized query recommendations. The idea is to track the querying behavior of each user, identify which parts of the database may be of interest for the corresponding data analysis task, and recommend queries that retrieve relevant data. We discuss the main challenges in this novel application of recommendation systems, and outline a possible solution based on collaborative filtering. Preliminary experimental results on real user traces demonstrate that our framework can generate effective query recommendations.
Proceedings Article•10.1109/ICDE.2009.71•
Web Query Recommendation via Sequential Query Prediction

[...]

Qi He1, Daxin Jiang2, Zhen Liao2, Steven C. H. Hoi1, Kuiyu Chang1, Ee-Peng Lim1, Hang Li2 •
Nanyang Technological University1, Microsoft2
29 Mar 2009
TL;DR: A novel "sequential query prediction" approach that tries to grasp a user's search intent based on his/her past query sequence and its resemblance to historical query sequence models mined from massive search engine logs is proposed.
Abstract: Web query recommendation has long been considered a key feature of search engines. Building a good Web query recommendation system, however, is very difficult due to the fundamental challenge of predicting users' search intent, especially given the limited user context information. In this paper, we propose a novel "sequential query prediction" approach that tries to grasp a user's search intent based on his/her past query sequence and its resemblance to historical query sequence models mined from massive search engine logs. Different query sequence models were examined, including the naive variable length N-gram model, Variable Memory Markov (VMM) model, and our proposed Mixture Variable Memory Markov (MVMM) model. Extensive experiments were conducted to benchmark our sequence prediction algorithms against two conventional pairwise approaches on large-scale search logs extracted from a commercial search engine. Results show that the sequence-wise approaches significantly outperform the conventional pair-wise ones in terms of prediction accuracy. In particular, our MVMM approach, consistently leads the pack, making it an effective and practical approach towards Web query recommendation.
Journal Article•10.1109/TKDE.2008.113•
A Relation-Based Page Rank Algorithm for Semantic Web Search Engines

[...]

Fabrizio Lamberti, Andrea Sanna, Claudio Giovanni Demartini
01 Jan 2009-IEEE Transactions on Knowledge and Data Engineering
TL;DR: This paper proposes a relation-based page rank algorithm to be used in conjunction with semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources.
Abstract: With the tremendous growth of information available to end users through the Web, search engines come to play ever a more critical role. Nevertheless, because of their general-purpose approach, it is always less uncommon that obtained result sets provide a burden of useless pages. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing overcoming this limitation. Several search engines have been proposed, which allow increasing information retrieval accuracy by exploiting a key content of semantic Web resources, that is, relations. However, in order to rank results, most of the existing solutions need to work on the whole annotated knowledge base. In this paper, we propose a relation-based page rank algorithm to be used in conjunction with semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources. Relevance is measured as the probability that a retrieved resource actually contains those relations whose existence was assumed by the user at the time of query definition.
Journal Article•10.1007/S00778-008-0117-Y•
Multi-dimensional top-k dominating queries

[...]

Man Lung Yiu1, Nikos Mamoulis2•
Aalborg University1, University of Hong Kong2
1 Jun 2009
TL;DR: An extensive study on the evaluation of top-k dominating queries, which proposes a set of algorithms that apply on indexed multi-dimensional data and investigates query evaluation on data that are not indexed.
Abstract: The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of top-k dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness of its results on real data.
Proceedings Article•10.1145/1516360.1516459•
Interactive query refinement

[...]

Chaitanya Mishra1, Nick Koudas1•
University of Toronto1
24 Mar 2009
TL;DR: This work formalizes the problem of query refinement and proposes a framework to support it in a database system, and introduces an interactive model of refinement that incorporates user feedback to best capture user preferences.
Abstract: We investigate the problem of refining SQL queries to satisfy cardinality constraints on the query result. This has applications to the many/few answers problems often faced by database users. We formalize the problem of query refinement and propose a framework to support it in a database system. We introduce an interactive model of refinement that incorporates user feedback to best capture user preferences. Our techniques are designed to handle queries having range and equality predicates on numerical and categorical attributes. We present an experimental evaluation of our framework implemented in an open source data manager and demonstrate the feasibility and practical utility of our approach.
Proceedings Article•10.1145/1559845.1559918•
Efficient type-ahead search on relational data: a TASTIER approach

[...]

Guoliang Li1, Shengyue Ji2, Chen Li2, Jianhua Feng1•
Tsinghua University1, University of California, Irvine2
29 Jun 2009
TL;DR: A novel approach to keyword search in the relational world, called Tastier, which proposes efficient index structures and algorithms for finding relevant answers on-the-fly by joining tuples in the database and devise a partition-based method to improve query performance.
Abstract: Existing keyword-search systems in relational databases require users to submit a complete query to compute answers. Often users feel "left in the dark" when they have limited knowledge about the data, and have to use a try-and-see approach for modifying queries and finding answers. In this paper we propose a novel approach to keyword search in the relational world, called Tastier. A Tastier system can bring instant gratification to users by supporting type-ahead search, which finds answers "on the fly" as the user types in query keywords. A main challenge is how to achieve a high interactive speed for large amounts of data in multiple tables, so that a query can be answered efficiently within milliseconds. We propose efficient index structures and algorithms for finding relevant answers on-the-fly by joining tuples in the database. We devise a partition-based method to improve query performance by grouping highly relevant tuples and pruning irrelevant tuples efficiently. We also develop a technique to answer a query efficiently by predicting the highly relevant complete queries for the user. We have conducted a thorough experimental evaluation of the proposed techniques on real data sets to demonstrate the efficiency and practicality of this new search paradigm.
Proceedings Article•10.1145/1498759.1498806•
Query by document

[...]

Yin Yang1, Nilesh Bansal2, Wisam Dakka3, Panagiotis G. Ipeirotis4, Nick Koudas2, Dimitris Papadias1 •
Hong Kong University of Science and Technology1, University of Toronto2, Google3, New York University4
9 Feb 2009
TL;DR: This paper introduces methodologies to extract phrases from a given "query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document and considers two techniques to extract and score key phrases.
Abstract: We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa.In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given "query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular, we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank for this purpose.We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons's Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document.
Proceedings Article•10.1145/1557019.1557123•
Exploring social tagging graph for web object classification

[...]

Zhijun Yin1, Rui Li1, Qiaozhu Mei1, Jiawei Han1•
University of Illinois at Urbana–Champaign1
28 Jun 2009
TL;DR: An efficient algorithm is proposed which not only utilizes social tags as enriched semantic features for the objects, but also infers the categories of unlabeled objects from both homogeneous and heterogeneous labeled objects, through the implicit connection of social tags.
Abstract: This paper studies web object classification problem with the novel exploration of social tags. Automatically classifying web objects into manageable semantic categories has long been a fundamental preprocess for indexing, browsing, searching, and mining these objects. The explosive growth of heterogeneous web objects, especially non-textual objects such as products, pictures, and videos, has made the problem of web classification increasingly challenging. Such objects often suffer from a lack of easy-extractable features with semantic information, interconnections between each other, as well as training examples with category labels.In this paper, we explore the social tagging data to bridge this gap. We cast web object classification problem as an optimization problem on a graph of objects and tags. We then propose an efficient algorithm which not only utilizes social tags as enriched semantic features for the objects, but also infers the categories of unlabeled objects from both homogeneous and heterogeneous labeled objects, through the implicit connection of social tags. Experiment results show that the exploration of social tags effectively boosts web object classification. Our algorithm significantly outperforms the state-of-the-art of general classification methods.
Book Chapter•
Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine

[...]

Abraham Bernstein, Esther Kaufmann, Christoph Kiefer, Simon Clematide, Manfred Klenner, Martin Volk 
1 Jan 2009
TL;DR: This work presents Ginseng, a quasi natural language guided query interface to the Semantic Web which relies on a simple question grammar which gets dynamically extended by the structure of an ontology to guide users in formulating queries in a language seemingly akin to English.
Abstract: The Semantic Web presents the vision of a distributed, dynamically growing knowledge base founded on formal logic. Common users, however, seem to have problems even with the simplest Boolean expression. As queries from web search engines show, the great majority of users simply do not use Boo- lean expressions. So how can we help users to query a web of logic that they do not seem to understand? We address this problem by presenting Ginseng, a quasi natural language guided query interface to the Semantic Web. Ginseng relies on a simple question grammar which gets dynamically extended by the structure of an ontology to guide users in formulating queries in a language seemingly akin to English. Based on the grammar Ginseng then translates the queries into a Semantic Web query language (RDQL), which allows their execution. Our evaluation with 20 users shows that Ginseng is extremely simple to use without any training (as opposed to any logic-based querying approach) resulting in very good query per- formance (precision = 92.8%, recall = 98.4%). We, furthermore, found that even with its simple gram- mar/approach Ginseng could process over 40% of questions from a query corpus without modification.
Journal Article•10.1016/J.IPM.2008.07.001•
Using query expansion in graph-based approach for query-focused multi-document summarization

[...]

Lin Zhao1, Lide Wu1, Xuanjing Huang1•
Fudan University1
01 Jan 2009-Information Processing and Management
TL;DR: A novel query expansion method is presented, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query.
Abstract: This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.
Patent•
Suggesting related search queries during web browsing

[...]

Ryen W. White1, Robert L. Rounthwaite1, Silviu Cucerzan1•
Microsoft1
21 Sep 2009
TL;DR: In this paper, the authors present a method for generating suggested queries for web pages that are not search engine results pages, based upon the URL and/or content of a currently displayed page.
Abstract: Described is the presenting of suggested queries for web pages that are not search engine results pages, based upon the URL and/or content of a currently displayed page. The suggested query set may be dynamically extracted (locally or remotely) based upon the content of the web page, and/or obtained from a data store of per-URL suggested query sets, e.g., generated from historical logs. Also described are various techniques for generating suggested queries, and user interface mechanisms that display and allow interaction with suggested queries.
Book Chapter•10.1007/978-3-642-04930-9_27•
Learning Semantic Query Suggestions

[...]

Edgar Meij1, Marc Bron1, Laura Hollink2, Bouke Huurnink1, Maarten de Rijke1 •
University of Amsterdam1, VU University Amsterdam2
6 Nov 2009
TL;DR: This paper uses a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features for semantic query suggestion, and evaluates the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts.
Abstract: An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide facilities that let users complete, specify, or reformulate their queries. We study the problem of semantic query suggestion , a special type of query transformation based on identifying semantic concepts contained in user queries. We use a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features. We apply our method to the task of linking queries from real-world query logs (the transaction logs of the Netherlands Institute for Sound and Vision) to the DBpedia knowledge base. We evaluate the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts using a manually developed test bed and show significant improvements over an already high baseline. The resources developed for this paper, i.e., queries, human assessments, and extracted features, are available for download.
Journal Article•10.1016/J.WEBSEM.2009.08.001•
Semplore: A scalable IR approach to search the Web of Data

[...]

Haofen Wang1, Qiaoling Liu1, Thomas Penin1, Linyun Fu1, Lei Zhang2, Thanh Tran3, Yong Yu1, Yue Pan2 •
Shanghai Jiao Tong University1, IBM2, Karlsruhe Institute of Technology3
01 Sep 2009-Journal of Web Semantics
TL;DR: The experimental results show that Semplore is an efficient and effective system for searching the Web of Data and can be used as a basic infrastructure for Web-scale Semantic Web search engines.
Proceedings Article•10.1109/WI-IAT.2009.34•
From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns

[...]

Paolo Boldi1, Francesco Bonchi2, Carlos Castillo2, Sebastiano Vigna1•
University of Milan1, Yahoo!2
15 Sep 2009
TL;DR: An accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92\% accuracy is built and it is demonstrated that the reformulation classifier leads to improved recommendations in a query recommendation system.
Abstract: Understanding query reformulation patterns is a key step towards next generation web search engines: it can help improving users' web-search experience by predicting their intent, and thus helping them to locate information more effectively. As a step in this direction, we build an accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92\% accuracy. We apply the model to automatically label two large query logs, creating annotated query-flow graphs. We study the resulting reformulation patterns, finding results consistent with previous studies done on smaller manually annotated datasets, and discovering new interesting patterns, including connections between reformulation types and topical categories. Finally, applying our findings to a third query log that is publicly available for research purposes, we demonstrate that our reformulation classifier leads to improved recommendations in a query recommendation system.
Patent•
SYSTEM FOR FINDING QUERIES AIMING AT TAIL URLs

[...]

Xiaoxin Yin1, Vijay Ravindran Nair1, Ryan Stewart1, Fang Liu1, Junhua Wang1, Tiffany Kumi Dohzen1, Yi-Min Wang1 •
Microsoft1
9 Jan 2009
TL;DR: In this article, a query prediction model can be constructed from a set of training data (e.g., diagnostic data obtained from an automatic diagnostic system and/or other suitable data) using a machine learning-based technique.
Abstract: Systems and methodologies for improved query classification and processing are provided herein. As described herein, a query prediction model can be constructed from a set of training data (e.g., diagnostic data obtained from an automatic diagnostic system and/or other suitable data) using a machine learning-based technique. Subsequently upon receiving a query, a set of features corresponding to the query, such as the length and/or frequency of the query, unigram probabilities of respective words and/or groups of words in the query, presence of pre-designated words or phrases in the query, or the like, can be generated. The generated features can then be analyzed in combination with the query prediction model to classify the query by predicting whether the query is aimed at a head Uniform Resource Locator (URL) or a tail URL. Based on this prediction, an appropriate index or combination of indexes can be assigned to answer the query.
Patent•
Visual and Textual Query Suggestion

[...]

Linjun Yang1, Meng Wang1, Zheng-Jun Zha1, Tao Mei1, Xian-Sheng Hua1 •
Microsoft1
11 Feb 2009
TL;DR: In this paper, a set of images associated with one of these keywords are clustered into multiple groups and a representative image of each cluster is determined based on user selection of a keyword and representative image.
Abstract: Techniques described herein enable better understanding of the intent of a user that submits a particular search query. These techniques receive a search request for images associated with a particular query. In response, the techniques determine images that are associated with the query, as well as other keywords that are associated with these images. The techniques then cluster, for each set of images associated with one of these keywords, the set of images into multiple groups. The techniques then rank the images and determine a representative image of each cluster. Finally, the tools suggest, to the user that submitted the query, to refine the search based on user selection of a keyword and a representative image. Thus, the techniques better understand the user's intent by allowing the user to refine the search based on another keyword and based on an image on which the user wishes to focus the search.
Proceedings Article•10.1109/AST.2009.24•
Concept Based Query Expansion Using WordNet

[...]

Jiuling Zhang1, Beixing Deng1, Xing Li1•
Tsinghua University1
7 Mar 2009
TL;DR: This paper proposed a new query expansion technique using the comprehensive thesaurus WordNet and its semantic relatedness measure modules, demonstrating a 7% precision improvement over retrieval methods not employing query expansion techniques.
Abstract: Query expansion is a widely studied technique for improving information retrieval effectiveness. In this paper we proposed a new query expansion technique using the comprehensive thesaurus WordNet and its semantic relatedness measure modules. Word sense disambiguation are performed on original query sentence, yielding the concept of each term in the query. Based on those recovered concepts, expanded query terms are generated from WordNet lexical database. The proposed method has been evaluated in document retrieval on the Web using query sentence. Our extensive experimental results demonstrate a 7% precision improvement over retrieval methods not employing query expansion techniques.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve