Top 848 papers published in the topic of Web query classification in 2010

Showing papers on "Web query classification published in 2010"

Proceedings Article•10.1145/1772690.1772780•

Exploiting query reformulations for web search result diversification

[...]

Rodrygo L. T. Santos¹, Craig Macdonald¹, Iadh Ounis¹•Institutions (1)

26 Apr 2010

TL;DR: A novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query, is introduced and diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole.

...read moreread less

Abstract: When a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.

...read moreread less

506 citations

Patent•

Immediate search feedback

[...]

Scott Forstall¹, Donald Melton¹, John William Sullivan¹, Darin B. Adler¹•Institutions (1)

Apple Inc.¹

27 Aug 2010

TL;DR: In this article, search input is received within a search field of a web browser application and a determination is made whether to automatically submit a query to a search engine based on characteristics of the search input.

...read moreread less

Abstract: Providing immediate search feedback is disclosed. Search input is received within a search field of a web browser application. Based on characteristics of the search input, a determination is made whether to automatically submit a query to a search engine. In one aspect, the query is automatically submitted to the search engine. The query is based on the received first search input. Results are displayed within the web browser application, the results web page returned from the query submitted to the search engine.

...read moreread less

379 citations

Book•

Estimating the Query Difficulty for Information Retrieval

[...]

David Carmel¹, Elad Yom-Tov¹•Institutions (1)

IBM¹

30 Apr 2010

TL;DR: This tutorial is to expose participants to the current research on query performance prediction (also known as query difficulty estimation), and participants will become familiar with states-of-the-art performance prediction methods, and with common evaluation methodologies for prediction quality.

...read moreread less

Abstract: Many information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is desirable that IR systems will be able to identify "difficult" queries in order to handle them properly. Understanding why some queries are inherently more difficult than others is essential for IR, and a good answer to this important question will help search engines to reduce the variance in performance, hence better servicing their customer needs. The high variability in query performance has driven a new research direction in the IR field on estimating the expected quality of the search results, i.e. the query difficulty, when no relevance feedback is given. Estimating the query difficulty is a significant challenge due to the numerous factors that impact retrieval performance. Many prediction methods have been proposed recently. However, as many researchers observed, the prediction quality of state-of-the-art predictors is still too low to be widely used by IR applications. The low prediction quality is due to the complexity of the task, which involves factors such as query ambiguity, missing content, and vocabulary mismatch. The goal of this tutorial is to expose participants to the current research on query performance prediction (also known as query difficulty estimation). Participants will become familiar with states-of-the-art performance prediction methods, and with common evaluation methodologies for prediction quality. We will discuss the reasons that cause search engines to fail for some of the queries, and provide an overview of several approaches for estimating query difficulty. We then describe common methodologies for evaluating the prediction quality of those estimators, and some experiments conducted recently with their prediction quality, as measured over several TREC benchmarks. We will cover a few potential applications that can utilize query difficulty estimators by handling each query individually and selectively based on its estimated difficulty. Finally we will summarize with a discussion on open issues and challenges in the field.

...read moreread less

240 citations

Patent•

Electronic document classification

[...]

Christopher A. McHenry, Scott W. Burt

18 May 2010

TL;DR: An electronic document classification system disclosed in this article classifies electronic documents by analyzing the document and the information attached to the document to generate a set of classification data and comparing the classification data with one or more classification rules.

...read moreread less

Abstract: An electronic document classification system disclosed herein classifies electronic documents The classification of the documents may involve analyzing the document and the information attached to the document to generate a set of classification data and comparing the classification data with one or more classification rules to generate a set of classifying data The system attaches the set of classifying data to the electronic document and displays the electronic document based on the set of classifying data The classification data may also be used to prioritize the electronic documents and to assign a retention period to the electronic documents The system is further adapted to receive user feedback regarding the classification of the electronic document and to update the classification rules

...read moreread less

236 citations

Proceedings Article•

The combined approach to query answering in DL-Lite

[...]

Roman Kontchakov¹, Carsten Lutz², David Toman³, Frank Wolter⁴, Michael Zakharyaschev¹ - Show less +1 more•Institutions (4)

Birkbeck, University of London¹, University of Bremen², University of Waterloo³, University of Liverpool⁴

9 May 2010

TL;DR: This paper proposes an alternative approach to implementing ontology-based data access in DL-Lite by allowing rewriting of both the query and the data and demonstrates that query execution in the proposed approach is often more efficient than in existing approaches, especially for large ontologies.

...read moreread less

Abstract: Databases and related information systems can benefit from the use of ontologies to enrich the data with general background knowledge. The DL-Lite family of ontology languages was specifically tailored towards such ontology-based data access, enabling an implementation in a relational database management system (RDBMS) based on a query rewriting approach. In this paper, we propose an alternative approach to implementing ontology-based data access in DL-Lite.The distinguishing feature of our approach is to allow rewriting of both the query and the data. We show that, in contrast to the existing approaches, no exponential blowup is produced by the rewritings. Based on experiments with a number of real-world ontologies, we demonstrate that query execution in the proposed approach is often more efficient than in existing approaches, especially for large ontologies. We also show how to seamlessly integrate the data rewriting step of our approach into an RDBMS using views (which solves the update problem) and make an interesting observation regarding the succinctness of queries in the original query rewriting approach.

...read moreread less

211 citations

Journal Article•10.1016/J.JAL.2009.09.004•

Tractable query answering and rewriting under description logic constraints

[...]

Hector Perez-Urbina¹, Boris Motik¹, Ian Horrocks¹•Institutions (1)

University of Oxford¹

01 Jun 2010-Journal of Applied Logic

TL;DR: A novel query rewriting algorithm is presented that handles constraints modeled in the DL ELHIO ¬ and it is used to show that answering conjunctive queries in this setting is PTime -complete w.r.t. data complexity.

...read moreread less

204 citations

Journal Article•10.1145/1670679.1670682•

Location-dependent query processing: Where we are and where we are heading

[...]

Sergio Ilarri¹, Eduardo Mena¹, Arantza Illarramendi²•Institutions (2)

University of Zaragoza¹, University of the Basque Country²

29 Mar 2010-ACM Computing Surveys

TL;DR: The technological context (mobile computing) and support middleware (such as moving object databases and data stream technology) are described, location-based services and location-dependent queries are defined and classified, and different query processing approaches are reviewed and compared.

...read moreread less

Abstract: The continuous development of wireless networks and mobile devices has motivated an intense research in mobile data services. Some of these services provide the user with context-aware information. Specifically, location-based services and location-dependent queries have attracted a lot of interest.In this article, the existing literature in the field of location-dependent query processing is reviewed. The technological context (mobile computing) and support middleware (such as moving object databases and data stream technology) are described, location-based services and location-dependent queries are defined and classified, and different query processing approaches are reviewed and compared.

...read moreread less

197 citations

Patent•

Query and document topic category transition analysis system and method and query expansion-based information retrieval system and method

[...]

Sung-Hyon Myaeng¹, Yuchul Jung¹, Kyung-min Kim¹•Institutions (1)

KAIST¹

17 Feb 2010

TL;DR: In this paper, a query expansion-based information retrieval method using query/document topic category transition analysis is proposed, in which a query input from a user is expanded using a topic-category transition analysis result, and corresponding information or documents are retrieved using the expanded query are provided.

...read moreread less

Abstract: An information retrieval system and method, and more particularly, a query/document topic category transition analysis system and method in which a query topic category of a query input from a user as an information retrieval keyword and a document topic category of a document which a user regards as relevant and selects from information retrieval results are classified to analyze transition between the query topic category and the document topic category, and a query expansion-based information retrieval system and method using query/document topic category transition analysis in which a query input from a user is expanded using a topic category transition analysis result, and corresponding information or documents are retrieved using the expanded query are provided. The query expansion-based information retrieval method using query/document topic category transition analysis, includes: in a state in which a topic category transition map is generated as a result of analyzing topic category transition between a user query and a relevant document, and corresponding documents are generated as pseudo documents according to each topic category for the user query and the relevant document, determining a corresponding query topic category based on query/document text information for an input query input from a user; allocating a relevant document topic category for the classified query topic category based on the topic category transition map; ranking representative keywords for the query topic category and the relevant document topic category based on the pseudo documents; expanding the input query using the ranked representative keywords; and retrieving corresponding documents using the expanded query.

...read moreread less

172 citations

Patent•

System for and method of providing reusable software service information based on natural language queries

[...]

Thomas Kozempel¹•Institutions (1)

Verizon Communications¹

10 May 2010

TL;DR: In this article, a system for and method of providing reusable software service information based on natural language queries is presented, where the system and method may include receiving, from a user system, query data in a natural language format that indicates a request for a plurality of reusable software services applications that are configured to perform a particular task, processing the query data to generate search criteria that include query values, and searching a database, for the plurality of reuse software service applications based on the query values.

...read moreread less

Abstract: A system for and method of providing reusable software service information based on natural language queries. The system and method may include receiving, from a user system, query data in a natural language format that indicates a request for a plurality of reusable software service applications that are configured to perform a particular task, processing the query data to generate search criteria that include query values, and searching, a database, for the plurality of reusable software service applications based on the query values.

...read moreread less

170 citations

Patent•

Architecture for responding to a visual query

[...]

David Petrou¹•Institutions (1)

Google¹

4 Aug 2010

TL;DR: In this article, a visual query such as a photograph, a screen shot, a scanned image, a video frame, or an image created by a content authoring application is submitted to a VQS system.

...read moreread less

Abstract: A visual query such as a photograph, a screen shot, a scanned image, a video frame, or an image created by a content authoring application is submitted to a visual query search system. The search system processes the visual query by sending it to a plurality of parallel search systems, each implementing a distinct visual query search process. These parallel search systems may include but are not limited to optical character recognition (OCR), facial recognition, product recognition, bar code recognition, object-or-object-category recognition, named entity recognition, and color recognition. Then at least one search result is sent to the client system. In some embodiments, when the visual query is an image containing a text element and a non-text element, at least one search result includes an optical character recognition result for the text element and at least one image-match result for the non-text element.

...read moreread less

165 citations

Proceedings Article•10.1109/CVPR.2010.5540092•

Improving web image search results using query-relative classifiers

[...]

Josip Krapac¹, Moray Allan², Jakob Verbeek¹, Frederic Juried²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Caen Lower Normandy²

13 Jun 2010

TL;DR: Generic classifiers that are based on query-relative features which can be used for new queries without additional training are introduced, which improve significantly over the raw search engine ranking, and also outperform the query-specific models.

...read moreread less

Abstract: Web image search using text queries has received considerable attention. However, current state-of-the-art approaches require training models for every new query, and are therefore unsuitable for real-world web search applications. The key contribution of this paper is to introduce generic classifiers that are based on query-relative features which can be used for new queries without additional training. They combine textual features, based on the occurence of query terms in web pages and image meta-data, and visual histogram representations of images. The second contribution of the paper is a new database for the evaluation of web image search algorithms. It includes 71478 images returned by a web search engine for 353 different search queries, along with their meta-data and ground-truth annotations. Using this data set, we compared the image ranking performance of our model with that of the search engine, and with an approach that learns a separate classifier for each query. Our generic models that use query-relative features improve significantly over the raw search engine ranking, and also outperform the query-specific models.

...read moreread less

Proceedings Article•10.1145/1718487.1718538•

Early exit optimizations for additive machine learned ranking systems

[...]

B. Barla Cambazoglu¹, Hugo Zaragoza¹, Olivier Chapelle¹, Jiang Chen¹, Ciya Liao¹, Zhaohui Zheng¹, Jon Degenhardt¹ - Show less +3 more•Institutions (1)

Yahoo!¹

4 Feb 2010

TL;DR: By proposing optimization strategies that allow short-circuiting score computations in additive learning systems, this paper is able to speedup the score computation process by more than four times with almost no loss in result quality.

...read moreread less

Abstract: Some commercial web search engines rely on sophisticated machine learning systems for ranking web documents. Due to very large collection sizes and tight constraints on query response times, online efficiency of these learning systems forms a bottleneck. An important problem in such systems is to speedup the ranking process without sacrificing much from the quality of results. In this paper, we propose optimization strategies that allow short-circuiting score computations in additive learning systems. The strategies are evaluated over a state-of-the-art machine learning system and a large, real-life query log, obtained from Yahoo!. By the proposed strategies, we are able to speedup the score computations by more than four times with almost no loss in result quality.

...read moreread less

Proceedings Article•10.1145/1772690.1772770•

Diversifying web search results

[...]

Davood Rafiei¹, Krishna Bharat², Anand Shukla²•Institutions (2)

University of Alberta¹, Google²

26 Apr 2010

TL;DR: On a more selective set of queries that are expected to benefit from diversification, the algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.

...read moreread less

Abstract: Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how 'diversity' interplays with 'quality' and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14% to 38% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.

...read moreread less

Patent•

Method and means for data searching and language translation

[...]

Mikko Kalervo Väänänen

10 Aug 2010

TL;DR: A computer implemented method comprising at least one computer in accordance with the invention is characterised by the following steps: receiving a search query comprising at most of the search terms, deriving at most one synonym for each search term, expanding the received search query with the synonym, and retrieving the search results obtained with the expanded query as discussed by the authors.

...read moreread less

Abstract: The invention relates to data searching and translation. In particular, the invention relates to searching documents from the Internet or databases. Even further, the invention also relates to translating words in documents, WebPages, images or speech from one language to the next. A computer implemented method comprising at least one computer in accordance with the invention is characterised by the following steps: receiving a search query comprising at least one search term, deriving at least one synonym for at least one search term, expanding the received search query with the at least one synonym, searching at least one document using the said expanded search query, retrieving the search results obtained with the said expanded query, ranking the said search results based on context of occurrence of at least one search term. The best mode of the invention is considered to be an Internet search engine that delivers better search results.

...read moreread less

Journal Article•10.1145/1823746.1823747•

Visual query suggestion: Towards capturing user intent in internet image search

[...]

Zheng-Jun Zha¹, Linjun Yang², Tao Mei², Meng Wang², Zengfu Wang¹, Tat-Seng Chua³, Xian-Sheng Hua² - Show less +3 more•Institutions (3)

University of Science and Technology of China¹, Microsoft², National University of Singapore³

27 Aug 2010-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: A new query suggestion scheme named Visual Query Suggestion (VQS) is proposed which is dedicated to image search and provides a more effective query interface to help users to precisely express their search intents by joint text and image suggestions.

...read moreread less

Abstract: Query suggestion is an effective approach to bridge the Intention Gap between the users' search intents and queries Most existing search engines are able to automatically suggest a list of textual query terms based on users' current query input, which can be called Textual Query Suggestion This article proposes a new query suggestion scheme named Visual Query Suggestion (VQS) which is dedicated to image search VQS provides a more effective query interface to help users to precisely express their search intents by joint text and image suggestions When a user submits a textual query, VQS first provides a list of suggestions, each containing a keyword and a collection of representative images in a dropdown menu Once the user selects one of the suggestions, the corresponding keyword will be added to complement the initial query as the new textual query, while the image collection will be used as the visual query to further represent the search intent VQS then performs image search based on the new textual query using text search techniques, as well as content-based visual retrieval to refine the search results by using the corresponding images as query examples We compare VQS against three popular image search engines, and show that VQS outperforms these engines in terms of both the quality of query suggestion and the search performance

...read moreread less

Book Chapter•10.1007/978-3-642-17746-0_29•

Linked data query processing strategies

[...]

Günter Ladwig¹, Thanh Tran¹•Institutions (1)

Karlsruhe Institute of Technology¹

7 Nov 2010

TL;DR: This work identifies and systematically discusses three main strategies: a bottom-up strategy that discovers new sources during query processing by following links between sources, a top-down strategy that relies on complete knowledge about the sources to select and process relevant sources, and a mixed strategy that assumes some incomplete knowledge and finds new sources at run-time.

...read moreread less

Abstract: Recently, processing of queries on linked data has gained attention. We identify and systematically discuss three main strategies: a bottom-up strategy that discovers new sources during query processing by following links between sources, a top-down strategy that relies on complete knowledge about the sources to select and process relevant sources, and a mixed strategy that assumes some incomplete knowledge and discovers new sources at run-time. To exploit knowledge discovered at run-time, we propose an additional step, explicitly scheduled during query processing, called correct source ranking. Additionally, we propose the adoption of stream-based query processing to deal with the unpredictable nature of data access in the distributed Linked Data environment. In experiments, we show that our implementation of the mixed strategy leads to early reporting of results and thus, more responsive query processing, while not requiring complete knowledge.

...read moreread less

Proceedings Article•10.1145/1772690.1772703•

Classification-enhanced ranking

[...]

Paul N. Bennett¹, Krysta M. Svore¹, Susan T. Dumais¹•Institutions (1)

Microsoft¹

26 Apr 2010

TL;DR: A simple framework for classification-enhanced ranking that uses clicks in combination with the classification of web pages to derive a class distribution for the query, which can be used to derive query classes for a variety of different taxonomies.

...read moreread less

Abstract: Many have speculated that classifying web pages can improve a search engine's ranking of results. Intuitively results should be more relevant when they match the class of a query. We present a simple framework for classification-enhanced ranking that uses clicks in combination with the classification of web pages to derive a class distribution for the query. We then go on to define a variety of features that capture the match between the class distributions of a web page and a query, the ambiguity of a query, and the coverage of a retrieved result relative to a query's set of classes. Experimental results demonstrate that a ranker learned with these features significantly improves ranking over a competitive baseline. Furthermore, our methodology is agnostic with respect to the classification space and can be used to derive query classes for a variety of different taxonomies.

...read moreread less

Journal Article•10.1145/1842890.1842906•

Predicting the effectiveness of queries and retrieval systems

[...]

Claudia Hauff¹•Institutions (1)

University of Twente¹

18 Aug 2010

TL;DR: The thesis investigates how the observed quality of predictors affects the retrieval effectiveness in two adaptive system settings: selective query expansion and meta-search and provides an analysis of its sensitivity towards different variables such as the collection, the query set and the retrieval approach.

...read moreread less

Abstract: We consider users' attempts to express their information needs through queries, or search requests and try to predict whether those requests will be of high or low quality. The second type of methods under investigation are those which attempt to estimate the quality of search systems themselves. Given a number of search systems to consider, these methods estimate how well or how poorly the systems will perform in comparison to each other.First, pre-retrieval predictors are investigated, which predict a query's effectiveness before the retrieval step and are thus independent of the ranked list of results. Such predictors base their predictions solely on query terms, collection statistics and possibly external sources. Twenty-two prediction algorithms are categorized and their quality is assessed on three different TREC test collections. A number of newly applied methods for combining various predictors are examined to obtain a better prediction of a query's effectiveness.Building on the analysis of pre-retrieval predictors, post-retrieval approaches are then investigated, which estimate a query's effectiveness on the basis of the retrieved results. The thesis focuses in particular on the Clarity Score approach and provides an analysis of its sensitivity towards different variables such as the collection, the query set and the retrieval approach. Adaptations to Clarity Score are introduced which improve the estimation accuracy of the original algorithm.The utility of query effectiveness prediction methods is commonly evaluated by reporting correlation coefficients, such as Kendall's Tau. Largely unexplored though is the question of the relationship between the current evaluation methodology for query effectiveness prediction and the change in effectiveness of retrieval systems that employ a predictor. We investigate this question by examining how the observed quality of predictors (with respect to Kendall's Tau) affects the retrieval effectiveness in two adaptive system settings: selective query expansion and meta-search.The last part of the thesis is concerned with the task of estimating the ranking of retrieval systems according to their retrieval effectiveness without relying on costly relevance judgments. Five different system ranking estimation approaches are evaluated on a wide range of data sets which cover a variety of retrieval tasks and test collections. It is shown that under certain conditions, automatic methods yield a highly accurate ranking of systems.Available online at http://www.cs.utwente.nl/~hauffc/phd/thesis.pdf.

...read moreread less

Patent•

Translation-based query pattern mining

[...]

Xin Li¹, Shi Chen¹•Institutions (1)

Google¹

23 Jun 2010

TL;DR: In this article, the authors describe technologies relating to search systems that can be embodied in methods that include the actions of receiving a query pattern, the query pattern identifying a particular rule to interpret a particular type of query, the pattern being in a first language, identifying a collection of queries in the first language matching the query patterns, annotating each query of the collection of querying with one or more labels, and aligning the translated collection of query terms.

...read moreread less

Abstract: This specification describes technologies relating to search systems. In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving a query pattern, the query pattern identifying a particular rule to interpret a particular type of query, the query pattern being in a first language; identifying a collection of queries in the first language matching the query pattern; annotating each query of the collection of queries with one or more labels; translating the collection of annotated queries in the first language into a translated collection of queries in a second language; aligning the translated collection of queries including identifying a most common term in the translated collection of queries and determining the corresponding positions of the annotations relative to the translated query terms; and extracting a translated query pattern from the aligned translated collection of queries.

...read moreread less

Patent•

Location based query suggestion

[...]

Jussi Myllymaki¹, David Singleton¹, Al Cutter¹, Matt Lewis¹, Scott Eblen¹ - Show less +1 more•Institutions (1)

Google¹

29 Jan 2010

Proceedings Article•10.1145/1867699.1867706•

Towards location-based social networking services

[...]

Chi-Yin Chow¹, Jie Bao², Mohamed F. Mokbel²•Institutions (2)

City University of Hong Kong¹, University of Minnesota²

2 Nov 2010

TL;DR: GeoSocialDB as mentioned in this paper is a location-aware query operator for location-based social networking services, namely, location based news feed, locationbased news ranking, and location based recommendation.

...read moreread less

Abstract: Social networking applications have become very important web services that provide Internet-based platforms for their users to interact with their friends. With the advances in the location-aware hardware and software technologies, location-based social networking applications have been proposed to provide services for their users, taking into account both the spatial and social aspects. Unfortunately, none of existing location-based social networking applications is a holistic system nor equips database management systems to support scalable location-based social networking services. In this paper, we present GeoSocialDB; a holistic system providing three location-based social networking services, namely, location-based news feed, location-based news ranking, and location-based recommendation. In GeoSocialDB, we aim to implement these services as query operators inside a database engine to optimize the query processing performance. Within the GeoSocialDB framework, we discuss research challenges and directions towards the realization of scalable and practical query processing for location-based social networking services. In general, we discuss the challenges in designing location- and/or rank-aware query operators, materializing query answers, supporting continuous query processing, and providing privacy-aware query processing for our three location-based social networking services.

...read moreread less

Proceedings Article•10.1109/ICDE.2010.5447837•

K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free

[...]

Bin Yao¹, Feifei Li¹, Piyush Kumar¹•Institutions (1)

Florida State University¹

1 Mar 2010

TL;DR: This work designs algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan.

...read moreread less

Abstract: Finding the k nearest neighbors (kNN) of a query point, or a set of query points (kNN-Join) are fundamental problems in many application domains. Many previous efforts to solve these problems focused on spatial databases or stand-alone systems, where changes to the database engine may be required, which may limit their application on large data sets that are stored in a relational database management system. Furthermore, these methods may not automatically optimize kNN queries or kNN-Joins when additional query conditions are specified. In this work, we study both the kNN query and the kNN-Join in a relational database, possibly augmented with additional query conditions. We search for relational algorithms that require no changes to the database engine. The straightforward solution uses the user-defined-function (UDF) that a query optimizer cannot optimize.We design algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan. Using only a small constant number of random shifts for databases in any fixed dimension, our approach guarantees to find the approximate kNN with only logarithmic number of page accesses in expectation with a constant approximation ratio and it could be extended to find the exact kNN efficiently in any fixed dimension. Our design paradigm easily supports the kNN-Join and updates. Extensive experiments on large, real and synthetic, data sets confirm the efficiency and practicality of our approach.

...read moreread less

Proceedings Article•10.1145/1772690.1772724•

Do you want to take notes?: identifying research missions in Yahoo! search pad

[...]

Debora Donato¹, Francesco Bonchi¹, Tom Chi¹, Yoelle Maarek¹•Institutions (1)

Yahoo!¹

26 Apr 2010

TL;DR: It is demonstrated in this paper that research missions can be automatically identified on-the-fly, as the user interacts with the search engine, through careful runtime analysis of query flows and query sessions.

...read moreread less

Abstract: Addressing user's information needs has been one of the main goals of Web search engines since their early days. In some cases, users cannot see their needs immediately answered by search results, simply because these needs are too complex and involve multiple aspects that are not covered by a single Web or search results page. This typically happens when users investigate a certain topic in domains such as education, travel or health, which often require collecting facts and information from many pages. We refer to this type of activities as "research missions". These research missions account for 10% of users' sessions and more than 25% of all query volume, as verified by a manual analysis that was conducted by Yahoo! editors.We demonstrate in this paper that such missions can be automatically identified on-the-fly, as the user interacts with the search engine, through careful runtime analysis of query flows and query sessions.The on-the-fly automatic identification of research missions has been implemented in Search Pad, a novel Yahoo! application that was launched in 2009, and that we present in this paper. Search Pad helps users keeping trace of results they have consulted. Its novelty however is that unlike previous notes taking products, it is automatically triggered only when the system decides, with a fair level of confidence, that the user is undertaking a research mission and thus is in the right context for gathering notes. Beyond the Search Pad specific application, we believe that changing the level of granularity of query modeling, from an isolated query to a list of queries pertaining to the same research missions, so as to better reflect a certain type of information needs, can be beneficial in a number of other Web search applications. Session-awareness is growing and it is likely to play, in the near future, a fundamental role in many on-line tasks: this paper presents a first step on this path.

...read moreread less

Patent•

User interface for presenting search results for multiple regions of a visual query

[...]

David Petrou¹, Theodore Power¹•Institutions (1)

Google¹

4 Aug 2010

TL;DR: In this article, a visual query such as a photograph, screen shot, scanned image, or video frame is submitted to a VQS system from a client system, which processes the visual query by sending it to a plurality of parallel search systems, each implementing a distinct visual query search process.

...read moreread less

Abstract: A visual query such as a photograph, screen shot, scanned image, or video frame is submitted to a visual query search system from a client system. The search system processes the visual query by sending it to a plurality of parallel search systems, each implementing a distinct visual query search process. A plurality of results is received from the parallel search systems. Utilizing the search results, an interactive results document is created and sent to the client system. The interactive results document has at least one visual identifier for a sub-portion of the visual query with a selectable link to at least one search result for that sub-portion. The visual identifier may be a bounding box around the respective sub-portion, or a semi-transparent label over the respective sub-portion. Optionally, the bounding box or label is color coded by type of result.

...read moreread less

Patent•

Generating and presenting a suggested search query

[...]

Robert J. Williams¹, Nitin Agrawal¹, Farid Hosseini¹, Sanaz Ahari¹, Maxim Stepin¹, Jason A. Bolla¹, Bo-June Hsu¹ - Show less +3 more•Institutions (1)

Microsoft¹

28 Jun 2010

TL;DR: In this paper, a suggested search query is generated using various techniques, such as by applying an n-gram language model, and a classification of the suggested search queries is determined, and the suggested query is presented together with a visual indicator.

...read moreread less

Abstract: The present invention is directed to presenting a suggested search query. Responsive to receiving a user-devised search parameter, a suggested search query is identified. The user-devised search parameter might have been previously received by a search system, or alternatively, might be a unique query that has not been previously received. A suggested search query might be generated using various techniques, such as by applying an n-gram language model. A classification of the suggested search query is determined, and the suggested search query is presented together with a visual indicator, which signifies the classification.

...read moreread less

Proceedings Article•

Diversifying query suggestion results

[...]

Hao Ma¹, Michael R. Lyu¹, Irwin King¹•Institutions (1)

The Chinese University of Hong Kong¹

11 Jul 2010

TL;DR: This paper presents a novel unified method to suggest both semantically relevant and diverse queries to Web users based on Markov random walk and hitting time analysis on the query-URL bipartite graph, which can effectively prevent semantically redundant queries from receiving a high rank.

...read moreread less

Abstract: In order to improve the user search experience, Query Suggestion, a technique for generating alternative queries to Web users, has become an indispensable feature for commercial search engines. However, previous work mainly focuses on suggesting relevant queries to the original query while ignoring the diversity in the suggestions, which will potentially dissatisfy Web users' information needs. In this paper, we present a novel unified method to suggest both semantically relevant and diverse queries to Web users. The proposed approach is based on Markov random walk and hitting time analysis on the query-URL bipartite graph. It can effectively prevent semantically redundant queries from receiving a high rank, hence encouraging diversities in the results. We evaluate our method on a large commercial clickthrough dataset in terms of relevance measurement and diversity measurement. The experimental results show that our method is very effective in generating both relevant and diverse query suggestions.

...read moreread less

Patent•

Generation of refinement terms for search queries

[...]

Emil Ismalon

14 Jun 2010

TL;DR: In this article, a computer-implemented method includes receiving from a user, by a search system, a search query comprising terms, using at least one association graph comprising terms.

...read moreread less

Abstract: A computer-implemented method includes receiving from a user, by a search system, a search query comprising terms. Using at least one association graph comprising terms, the search system generates a suggested replacement query by designating one or more of the terms of the search query as anchor terms, and the remaining terms of the search query as non-anchor terms, and replacing one or more of the non-anchor terms of the search query with one or more suggested replacement terms, to generate the suggested replacement query that includes the one or more anchor terms and the one or more suggested replacement terms. The suggested replacement query is presented to the user. Responsively to a selection of the suggested replacement query by the user, the search query received from the user is replaced with the suggested replacement query, and search results are generated responsively to the suggested replacement query and presented.

...read moreread less

Proceedings Article•10.1145/1871437.1871745•

Towards query log based personalization using topic models

[...]

Mark James Carman¹, Fabio Crestani¹, Morgan Harvey², Mark Baillie²•Institutions (2)

University of Lugano¹, University of Strathclyde²

26 Oct 2010

TL;DR: This work defines generative models that take both the user and the clicked document into account when estimating the probability of query terms and can be used to rank documents by their likelihood given a particular query and user pair.

...read moreread less

Abstract: We investigate the utility of topic models for the task of personalizing search results based on information present in a large query log. We define generative models that take both the user and the clicked document into account when estimating the probability of query terms. These models can then be used to rank documents by their likelihood given a particular query and user pair.

...read moreread less

Patent•

Personalize Search Results for Search Queries with General Implicit Local Intent

[...]

Yumao Lu¹, Fuchun Peng¹, Benoit Dumoulin¹•Institutions (1)

Yahoo!¹

27 Jan 2010

TL;DR: In this paper, one particular embodiment accesses a first set of search queries comprising one or more first search queries, extracts one or several features based on the first set, trains a search-query classifier using the features, and then determines whether the second search query has implicit and general local intent.

...read moreread less

Abstract: One particular embodiment accesses a first set of search queries comprising one or more first search queries; extracts one or more features based on the first set of search queries, trains a search-query classifier using the features; accesses a second search query provided by a user; determines whether the second search query has implicit and general local intent using the search-query classifier; if the second search query has implicit and general local intent, then determines a location associated with the user; and identifies a search result in response to the second search query based at least in part on the location associated with the user; and presents the search result to the user.

...read moreread less

Proceedings Article•10.1109/ICWS.2010.67•

Measuring Similarity of Web Services Based on WSDL

[...]

Fangfang Liu¹, Yuliang Shi², Jie Yu¹, Tianhong Wang¹, Jingzhe Wu¹ - Show less +1 more•Institutions (2)

Shanghai University¹, Shandong University²

5 Jul 2010

TL;DR: This work provides the method which tries to reflect the underlying semantics of web services by utilizing the terms within WSDL fully and shows that this method works well on both service classification and query.

...read moreread less

Abstract: Web service has already been an important paradigm for web applications. Growing number of services need efficiently locating the desired web services. The similarity metric of web services plays important role in service search and classification. The very small text fragments in WSDL of web services are unsuitable for applying the traditional IR techniques. We describe our approach which supports the similarity search and classification of service operations. The approach firstly employs the external knowledge to compute the semantic distance of terms from two compared services. The similarity of services is measured upon these distances. Previous researches treat terms within the same WSDL documents as the isolated words and neglect the semantic association among them, hence lower down the accuracy of the similarity metric. We provide our method which tries to reflect the underlying semantics of web services by utilizing the terms within WSDL fully. The experiments show that our method works well on both service classification and query.

...read moreread less

...

Expand