Top 809 papers published in the topic of Web query classification in 2008

Showing papers on "Web query classification published in 2008"

Book Chapter•10.1007/978-3-540-68234-9_39•

Querying distributed RDF data sources with SPARQL

[...]

Bastian Quilitz¹, Ulf Leser¹•Institutions (1)

1 Jun 2008

TL;DR: DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web, and uses query rewriting and cost-based query optimization to speed up query execution.

...read moreread less

Abstract: Integrated access to multiple distributed and autonomous RDF data sources is a key challenge for many semantic web applications. As a reaction to this challenge, SPARQL, the W3C Recommendation for an RDF query language, supports querying of multiple RDF graphs. However, the current standard does not provide transparent query federation, which makes query formulation hard and lengthy. Furthermore, current implementations of SPARQL load all RDF graphs mentioned in a query to the local machine. This usually incurs a large overhead in network traffic, and sometimes is simply impossible for technical or legal reasons. To overcome these problems we present DARQ, an engine for federated SPARQL queries. DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web. A service description language enables the query engine to decompose a query into sub-queries, each of which can be answered by an individual service. DARQ also uses query rewriting and cost-based query optimization to speed-up query execution. Experiments show that these optimizations significantly improve query performance even when only a very limited amount of statistical information is available. DARQ is available under GPL License at http://darq.sf.net/.

...read moreread less

611 citations

Proceedings Article•10.1145/1458082.1458176•

Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

[...]

Rosie Jones¹, Kristina Lisa Klinkner²•Institutions (2)

Yahoo!¹, Carnegie Mellon University²

26 Oct 2008

TL;DR: This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure, and paves the way for evaluating search engines in terms of user task completion.

...read moreread less

Abstract: Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tasks at a variety of granularities, issuing multiple queries as they attempt to accomplish tasks. In this work we study real sessions manually labeled into hierarchical tasks, and show that timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%. We report on properties of this search task hierarchy, as seen in a random sample of user interactions from a major web search engine's log, annotated by human editors, learning that 17% of tasks are interleaved, and 20% are hierarchically organized. No previous work has analyzed or addressed automatic identification of interleaved and hierarchically organized search tasks. We propose and evaluate a method for the automated segmentation of users' query streams into hierarchical units. Our classifiers can improve on timeout segmentation, as well as other previously published approaches, bringing the accuracy up to 92% for identifying fine-grained task boundaries, and 89-97% for identifying pairs of queries from the same task when tasks are interleaved hierarchically. This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure. The ability to perform this kind of segmentation paves the way for evaluating search engines in terms of user task completion.

...read moreread less

472 citations

Journal Article•10.1016/J.IPM.2007.07.015•

Determining the informational, navigational, and transactional intent of Web queries

[...]

Bernard J. Jansen¹, Danielle L. Booth¹, Amanda Spink²•Institutions (2)

Penn State College of Information Sciences and Technology¹, Queensland University of Technology²

01 May 2008-Information Processing and Management

TL;DR: In this article, the authors define and present a comprehensive classification of user intent for Web searching, which consists of three hierarchical levels of informational, navigational, and transactional intent, and then develop a software application that automatically classified queries using a web search engine log of over a million and a half queries submitted by several hundred thousand users.

...read moreread less

Abstract: In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.

...read moreread less

414 citations

Proceedings Article•10.1145/1390334.1390364•

To personalize or not to personalize: modeling queries with variation in user intent

[...]

Jaime Teevan¹, Susan T. Dumais¹, Daniel J. Liebling¹•Institutions (1)

Microsoft¹

20 Jul 2008

TL;DR: Variation in user intent is examined using both explicit relevance judgments and large-scale log analysis of user behavior patterns to identify queries that can benefit from personalization.

...read moreread less

Abstract: In most previous work on personalized search algorithms, the results for all queries are personalized in the same manner. However, as we show in this paper, there is a lot of variation across queries in the benefits that can be achieved through personalization. For some queries, everyone who issues the query is looking for the same thing. For other queries, different people want very different results even though they express their need in the same way. We examine variability in user intent using both explicit relevance judgments and large-scale log analysis of user behavior patterns. While variation in user behavior is correlated with variation in explicit relevance judgments the same query, there are many other factors, such as result entropy, result quality, and task that can also affect the variation in behavior. We characterize queries using a variety of features of the query, the results returned for the query, and people's interaction history with the query. Using these features we build predictive models to identify queries that can benefit from personalization.

...read moreread less

325 citations

Proceedings Article•10.1145/1367497.1367546•

Spatial variation in search engine queries

[...]

Lars Backstrom¹, Jon Kleinberg¹, Ravi Kumar², Jasmine Novak²•Institutions (2)

Cornell University¹, Yahoo!²

21 Apr 2008

TL;DR: A probabilistic framework for quantifying spatial variation in search queries is developed; on complete Yahoo! query logs, this model is able to localize large classes of queries to within a few miles of their natural centers based only on the distribution of activity for the query.

...read moreread less

Abstract: Local aspects of Web search - associating Web content and queries with geography - is a topic of growing interest. However, the underlying question of how spatial variation is manifested in search queries is still not well understood. Here we develop a probabilistic framework for quantifying such spatial variation; on complete Yahoo! query logs, we find that our model is able to localize large classes of queries to within a few miles of their natural centers based only on the distribution of activity for the query. Our model provides not only an estimate of a query's geographic center, but also a measure of its spatial dispersion, indicating whether it has highly local interest or broader regional or national appeal. We also show how variations on our model can track geographically shifting topics over time, annotate a map with each location's "distinctive queries", and delineate the "spheres of influence" for competing queries in the same general domain.

...read moreread less

253 citations

Journal Article•10.1109/TVCG.2008.175•

VisGets: Coordinated Visualizations for Web-based Information Exploration and Discovery

[...]

Marian Dörk¹, Sheelagh Carpendale¹, Christopher Collins², Carey Williamson¹•Institutions (2)

University of Calgary¹, University of Toronto²

01 Nov 2008-IEEE Transactions on Visualization and Computer Graphics

TL;DR: This work introduces VisGets - interactive query visualizations of Web-based information that operate with online information within a Web browser and facilitates the construction of dynamic search queries that combine filters from more than one data dimension.

...read moreread less

Abstract: In common Web-based search interfaces, it can be difficult to formulate queries that simultaneously combine temporal, spatial, and topical data filters. We investigate how coordinated visualizations can enhance search and exploration of information on the World Wide Web by easing the formulation of these types of queries. Drawing from visual information seeking and exploratory search, we introduce VisGets - interactive query visualizations of Web-based information that operate with online information within a Web browser. VisGets provide the information seeker with visual overviews of Web resources and offer a way to visually filter the data. Our goal is to facilitate the construction of dynamic search queries that combine filters from more than one data dimension. We present a prototype information exploration system featuring three linked VisGets (temporal, spatial, and topical), and used it to visually explore news items from online RSS feeds.

...read moreread less

221 citations

Journal Article•10.1613/JAIR.2372•

Conjunctive query answering for the description logic SHIQ

[...]

Birte Glimm¹, Ian Horrocks¹, Carsten Lutz², Uli Sattler³•Institutions (3)

University of Oxford¹, Dresden University of Technology², University of Manchester³

01 Jan 2008-Journal of Artificial Intelligence Research

TL;DR: In this paper, the authors consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHIQ and allow transitive roles in both the query and the knowledge base, and show decidability of query answering in this setting and establish two tight complexity bounds.

...read moreread less

Abstract: Conjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive query answering over DL knowledge bases is only poorly understood if transitive roles are admitted in the query. In this paper, we consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHIQ and allow transitive roles in both the query and the knowledge base. We show decidability of query answering in this setting and establish two tight complexity bounds: regarding combined complexity, we prove that there is a deterministic algorithm for query answering that needs time single exponential in the size of the KB and double exponential in the size of the query, which is optimal. Regarding data complexity, we prove containment in co-NP.

...read moreread less

207 citations

Patent•

System and method to manage and distribute media using a predictive media cache

[...]

Amir Ansari, George A. Cowgill, Ramprakash Masina, Jude P. Ramayya, Alvin R. McQuarters, Atousa Raissyan, Leon E. Nicholls, Marshall T. Rose, II Robert A. Clavenna - Show less +5 more

3 Jul 2008

TL;DR: In this article, the authors propose a system for decreasing the perceived end user latency while interacting with a database, which consists of the database storing metadata associated with one or more of media, files, data, devices and services, a user interface operable to receive a user generated query selected from a plurality of user generated queries, and a processor having a predictive module having a capability to generate at least one background query of a database prior to the user interface receiving the query.

...read moreread less

Abstract: A system for decreasing the perceived end user latency while interacting with a database. The system comprises the database storing metadata associated with one or more of media, files, data, devices and services, a user interface operable to receive a user generated query selected from a plurality of user generated query options, the plurality of user generated query options representing at least one of a user selectable object displayed by the user interface, and a processor having a predictive module operable to generate at least one background query of the database prior to the user interface receiving the user generated query, the at least one background query correlating to at least one of the user generated query options. The predictive module compares the user generated query to the at least one background query prior to sending the user generated query to the database such that if the user generated query corresponds to the at least one background query the user interface displays a result to the at least one background query.

...read moreread less

206 citations

Patent•

Methods and systems for providing a response to a query

[...]

Andy Curtis¹, Alan Levin¹, Apostolos Gerasoulis¹•Institutions (1)

IAC¹

22 Apr 2008

TL;DR: In this article, a response is provided based upon the correlated search engine activity information, which is used to provide relevant search results without the limitations imposed by the key-word-based systems of the prior art.

...read moreread less

Abstract: Methods and systems for providing a response to a query. Multiple users' search engine activity in regard to a query is correlated. A response is provided based upon this correlated search engine activity information. For one embodiment of the invention, in the context of search engine result optimization, the user activity and/or user information of multiple users, during a search session, is correlated with queries to effect an evolving association between queries and the organization and presentation of documents. Systems in accordance with such embodiments employ the ability to store users' activity over the entire search session, thus making possible the correlation of a number of different types of user activity and user information. The use of correlated user input allows such systems to provide relevant search results without the limitations imposed by the key-word-based systems of the prior art.

...read moreread less

199 citations

Proceedings Article•10.1145/1458082.1458311•

A survey of pre-retrieval query performance predictors

[...]

Claudia Hauff¹, Djoerd Hiemstra¹, Franciska de Jong¹•Institutions (1)

University of Twente¹

26 Oct 2008

TL;DR: In this poster, 22 pre-retrieval predictors are categorized and assessed on three different TREC test collections and such predictors base their predictions solely on query terms, the collection statistics and possibly external sources such as WordNet.

...read moreread less

Abstract: The focus of research on query performance prediction is to predict the effectiveness of a query given a search system and a collection of documents. If the performance of queries can be estimated in advance of, or during the retrieval stage, specific measures can be taken to improve the overall performance of the system. In particular, pre-retrieval predictors predict the query performance before the retrieval step and are thus independent of the ranked list of results; such predictors base their predictions solely on query terms, the collection statistics and possibly external sources such as WordNet. In this poster, 22 pre-retrieval predictors are categorized and assessed on three different TREC test collections.

...read moreread less

185 citations

Proceedings Article•10.5555/1793274.1793285•

Effective pre-retrieval query performance prediction using similarity and variability evidence

[...]

Ying Zhao¹, Falk Scholer¹, Yohannes Tsegay¹•Institutions (1)

RMIT University¹

30 Mar 2008

TL;DR: Experimental evaluation of the proposed new family of pre-retrieval predictors shows that the new predictors give more consistent performance than previously proposed pre- retrieval methods across a variety of data types and search tasks.

...read moreread less

Abstract: Query performance prediction aims to estimate the quality of answers that a search system will return in response to a particular query. In this paper we propose a new family of pre-retrieval predictors based on information at both the collection and document level. Pre-retrieval predictors are important because they can be calculated from information that is available at indexing time; they are therefore more efficient than predictors that incorporate information obtained from actual search results. Experimental evaluation of our approach shows that the new predictors give more consistent performance than previously proposed pre-retrieval methods across a variety of data types and search tasks.

...read moreread less

Journal Article•10.1109/TKDE.2008.84•

Personalized Concept-Based Clustering of Search Engine Queries

[...]

Kenneth Wai-Ting Leung¹, Wilfred Ng¹, Dik Lun Lee¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Nov 2008-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An effective approach that captures the user's conceptual preferences in order to provide personalized query suggestions is introduced and a new two-phase personalized agglomerative clustering algorithm is proposed that is able to generate personalized query clusters.

...read moreread less

Abstract: The exponential growth of information on the Web has introduced new challenges for building effective search engines. A major problem of Web search is that search queries are usually short and ambiguous, and thus are insufficient for specifying the precise user needs. To alleviate this problem, some search engines suggest terms that are semantically related to the submitted queries so that users can choose from the suggestions the ones that reflect their information needs. In this paper, we introduce an effective approach that captures the user's conceptual preferences in order to provide personalized query suggestions. We achieve this goal with two new strategies. First, we develop online techniques that extract concepts from the Web-snippets of the search result returned from a query and use the concepts to identify related queries for that query. Second, we propose a new two-phase personalized agglomerative clustering algorithm that is able to generate personalized query clusters. To the best of the authors' knowledge, no previous work has addressed personalization for query suggestions. To evaluate the effectiveness of our technique, a Google middleware was developed for collecting clickthrough data to conduct experimental evaluation. Experimental results show that our approach has better precision and recall than the existing query clustering methods.

...read moreread less

Patent•

Search query transformation using direct manipulation

[...]

Ryen W. White¹, Mikhail Bilenko¹, Robert L. Rounthwaite¹, Dan Morris¹•Institutions (1)

Microsoft¹

11 Nov 2008

TL;DR: A search query transformation system and method for transforming and refining a search query are described in this paper, where the searcher is driving the changes in the search queries using a pointing device.

...read moreread less

Abstract: A search query transformation system and method for transforming and refining a search query are described. Embodiments of the system and method use various graphical components and controls. Direct manipulation ensures that the searcher is driving the changes in the search queries using a pointing device. Embodiments of the search query transformation system and method include a search query re-weighting user interface (UI) component for graphically adjusting and re-weighting weights of search terms, and a search query term replacement UI component for graphically replacing a search term in a query or add a synonym to the query. Embodiments of the system and method also include a search query suggestion component, which provides query revision recommendations to a searcher that are tailored to the direct manipulation query refinement interface.

...read moreread less

Proceedings Article•10.1145/1357054.1357242•

SearchBar: a search-centric web history for task resumption and information re-finding

[...]

Dan Morris¹, Meredith Ringel Morris¹, Gina Venolia¹•Institutions (1)

Microsoft¹

6 Apr 2008

TL;DR: SearchBar is introduced, a system for proactively and persistently storing query histories, browsing histories, and users' notes and ratings in an interrelated fashion, and it is shown that users find SearchBar valuable for task reacquisition.

...read moreread less

Abstract: Current user interfaces for Web search, including browsers and search engine sites, typically treat search as a transient activity. However, people often conduct complex, multi-query investigations that may span long durations and may be interrupted by other tasks. In this paper, we first present the results of a survey of users' search habits, which show that many search tasks span long periods of time. We then introduce SearchBar, a system for proactively and persistently storing query histories, browsing histories, and users' notes and ratings in an interrelated fashion. SearchBar supports multi-session investigations by assisting with task context resumption and information re-finding. We describe a user study comparing use of SearchBar to status-quo tools such as browser histories, and discuss our findings, which show that users find SearchBar valuable for task reacquisition. Our study also reveals the strategies employed by users of status-quo tools for handling multi-query, multi-session search tasks.

...read moreread less

Proceedings Article•10.1145/1367798.1367806•

Analysis of geographic queries in a search engine log

[...]

Qingqing Gan¹, Josh Attenberg¹, Alexander Markowetz², Torsten Suel¹•Institutions (2)

New York University¹, Hong Kong University of Science and Technology²

22 Apr 2008

TL;DR: This paper performs an analysis of 36 million queries of the recently released AOL query trace and proposes a new taxonomy for geographic search queries, i.e., text queries that employ geographical terms in an attempt to restrict results to a particular region or location.

...read moreread less

Abstract: Geography is becoming increasingly important in web search. Search engines can often return better results to users by analyzing features such as user location or geographic terms in web pages and user queries. This is also of great commercial value as it enables location specific advertising and improved search for local businesses. As a result, major search companies have invested significant resources into geographic search technologies, also often called local search.This paper studies geographic search queries, i.e., text queries such as "hotel new york" that employ geographical terms in an attempt to restrict results to a particular region or location. Our main motivation is to identify opportunities for improving geographical search and related technologies, and we perform an analysis of 36 million queries of the recently released AOL query trace. First, we identify typical properties of geographic search (geo) queries based on a manual examination of several thousand queries. Based on these observations, we build a classifier that separates the trace into geo and non-geo queries. We then investigate the properties of geo queries in more detail, and relate them to web sites and users associated with such queries. We also propose a new taxonomy for geographic search queries.

...read moreread less

Patent•

Friendly search and socially augmented search query assistance layer

[...]

Elizabeth F. Churchill¹, Joseph O'Sullivan¹, Anthony D. Thrall¹•Institutions (1)

Yahoo!¹

5 Jun 2008

TL;DR: In this article, the authors proposed a community search query technology that provides a collaborative search engine that utilizes community feedback and personal profiles, which includes personal task, information management, project creation, listing queries by activity categories, setting deadlines for ongoing search needs, setting up search queues and annotation of search sessions.

...read moreread less

Abstract: Community search query technology operable to provide users with the means to collaborate on search queries and share their query results with other users in a community is disclosed. The community search query technology provides a collaborative search engine that utilizes community feedback and personal profiles. The community search query technology also comprises personal task, information management, project creation, listing queries by activity categories, setting deadlines for ongoing search needs, setting up search queues, and annotation of search sessions.

...read moreread less

Patent•

Intent-aware search

[...]

Dragos A. Manolescu¹, Henricus Johannes Maria Meijer¹, Laura Jean Kern¹•Institutions (1)

Microsoft¹

7 Mar 2008

TL;DR: In this article, a system is provided to improve the relevance of information searches, which includes a search component to facilitate information retrieval in response to a user's query, and an inference component refines the user query or filters search results associated with the query in view of a determined intent of the user.

...read moreread less

Abstract: A system is provided to improve the relevance of information searches. The system includes a search component to facilitate information retrieval in response to a user's query. An inference component refines the user's query or filters search results associated with the query in view of a determined intent of the user. This can also include a “sensor component” that collects the information fed to the inference component.

...read moreread less

Proceedings Article•10.1145/1458082.1458147•

Mining term association patterns from search logs for effective query reformulation

[...]

Xuanhui Wang¹, ChengXiang Zhai¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

26 Oct 2008

TL;DR: This paper proposes to mine search engine logs for patterns at the level of terms through analyzing the relations of terms inside a query and defines two novel term association patterns (i.e., context-sensitive term substitutions and term additions) that can be used to address the mis-specification and under- Specification problems of ineffective queries.

...read moreread less

Abstract: Search engine logs are an emerging new type of data that offers interesting opportunities for data mining. Existing work on mining such data has mostly attempted to discover knowledge at the level of queries (e.g., query clusters). In this paper, we propose to mine search engine logs for patterns at the level of terms through analyzing the relations of terms inside a query. We define two novel term association patterns (i.e., context-sensitive term substitutions and term additions) and propose new methods for mining such patterns from search engine logs. These two patterns can be used to address the mis-specification and under-specification problems of ineffective queries. Experiment results on real search engine logs show that the mined context-sensitive term substitutions can be used to effectively reword queries and improve their accuracy, while the mined context-sensitive term addition patterns can be used to support query refinement in a more effective way.

...read moreread less

Journal Article•10.1145/1670243.1670244•

Supporting views in data stream management systems

[...]

Thanaa M. Ghanem¹, Ahmed K. Elmagarmid², Per-Ake Larson³, Walid G. Aref²•Institutions (3)

University of St. Thomas (Minnesota)¹, Purdue University², Microsoft³

15 Feb 2008-ACM Transactions on Database Systems

TL;DR: The sliding window approach is generalized by introducing the synchronization principle that empowers SyncSQL with a formal mechanism to express queries with arbitrary refresh conditions and the article introduces the Nile-SyncSQL prototype to support SyncSQL queries.

...read moreread less

Abstract: In relational database management systems, views supplement basic query constructs to cope with the demand for “higher-level” views of data Moreover, in traditional query optimization, answering a query using a set of existing materialized views can yield a more efficient query execution plan Due to their effectiveness, views are attractive to data stream management systems In order to support views over streams, a data stream management system should employ a closed (or composable) continuous query language A closed query language is a language in which query inputs and outputs are interpreted in the same way, hence allowing query compositionThis article introduces the Synchronized SQL (or SyncSQL) query language that defines a data stream as a sequence of modify operations against a relation SyncSQL enables query composition through the unified interpretation of query inputs and outputs An important issue in continuous queries over data streams is the frequency by which the answer gets refreshed and the conditions that trigger the refresh Coarser periodic refresh requirements are typically expressed as sliding windows In this article, the sliding window approach is generalized by introducing the synchronization principle that empowers SyncSQL with a formal mechanism to express queries with arbitrary refresh conditions After introducing the semantics and syntax, we lay the algebraic foundation for SyncSQL and propose a query-matching algorithm for deciding containment of SyncSQL expressions Then, the article introduces the Nile-SyncSQL prototype to support SyncSQL queries Nile-SyncSQL employs a pipelined incremental evaluation paradigm in which the query pipeline consists of a set of differential operators A cost model is developed to estimate the cost of SyncSQL query execution pipelines and to choose the best execution plan from a set of different plans for the same query An experimental study is conducted to evaluate the performance of Nile-SyncSQL The experimental results illustrate the effectiveness of Nile-SyncSQL and the significant performance gains when views are enabled in data stream management systems

...read moreread less

Patent•

Automated healthcare information composition and query enhancement

[...]

Steven Linthicum¹, Steven Fors¹, Anthony Ricamato¹, Eric Jester¹, Louis J. Hoebel¹, Gerald Bowden Wise¹, John Michael Lizzi¹ - Show less +3 more•Institutions (1)

General Electric¹

26 Nov 2008

TL;DR: In this paper, a query generation and query enhancement system is described, which includes a query generator and an information composition engine to generate query results from one or more data sources based on user input and a data context.

...read moreread less

Abstract: Certain embodiments of the present invention provide systems and methods for information composition and query enhancement. Certain embodiments provide an information composition and query enhancement system. The system includes a query generation and enhancement engine generating and conducting a query of one or more data sources based on user input and a data context to produce query results. The system also includes an information composition engine assembling the query results to provide a bundle of documents meaningful to the particular user. The system further includes a document summarization engine clustering and summarizing the bundle of documents to provide a content summary in addition to the bundle of documents for output in a presentation to a user.

...read moreread less

Patent•

Method for calculating score for search query

[...]

Sumio Fujita¹, Georges Dupret¹•Institutions (1)

Yahoo!¹

9 Apr 2008

TL;DR: In this article, a method and system for automatically calculating, regarding an input search query, a score for evaluating a new query or URL which is a candidate for recommendation information according to a user's search intention.

...read moreread less

Abstract: A method and system for automatically calculating, regarding an input search query, a score for evaluating a new query or URL which is a candidate for recommendation information according to a user's search intention. To this end, a recommendation server 10 extracts recommended queries or URLs regarding a certain query, and configures a graph structure in which a plurality of queries are sequentially connected via URLs, based on historical data of URLs searched and browsed by the user in the past. The recommendation server 10 then calculates a score for indicating a degree of popularity of each query, by analyzing a relationship between input and output of edges, i.e. a linking relationship of URLs, in which each query is a node in this graph structure.

...read moreread less

Patent•

Automatic expanded language search

[...]

Johnny Chen¹•Institutions (1)

Google¹

18 Jul 2008

TL;DR: A computer-implemented method can include translating a search query from a first language to a second language, and identifying content in the second language relevant to the translated query based on the comparing as discussed by the authors.

...read moreread less

Abstract: A computer-implemented method can include translating a search query from a first language to a second language, comparing the translated query with content in the second language, and identifying content in the second language relevant to the translated query based on the comparing. Also, a computer-implemented method can include translating content in a second language at one or more network locations into a first language, comparing the translated content with a search query written in the first language, and identifying, from the translated content, content relevant to the query based on the comparing.

...read moreread less

Proceedings Article•10.1145/1367497.1367628•

Can chinese web pages be classified with english data source

[...]

Xiao Ling¹, Gui-Rong Xue¹, Wenyuan Dai¹, Yun Jiang¹, Qiang Yang², Yong Yu¹ - Show less +2 more•Institutions (2)

Shanghai Jiao Tong University¹, Hong Kong University of Science and Technology²

21 Apr 2008

TL;DR: In this paper, the authors proposed an information bottleneck based approach to address the cross-language classification problem in Chinese-English Web pages, where all the Chinese Web pages including Chinese and English ones, are encoded through an Information bottleneck which can allow only limited information to pass.

...read moreread less

Abstract: As the World Wide Web in China grows rapidly, mining knowledge in Chinese Web pages becomes more and more important. Mining Web information usually relies on the machine learning techniques which require a large amount of labeled data to train credible models. Although the number of Chinese Web pages increases quite fast, it still lacks Chinese labeled data. However, there are relatively sufficient English labeled Web pages. These labeled data, though in different linguistic representations, share a substantial amount of semantic information with Chinese ones, and can be utilized to help classify Chinese Web pages. In this paper, we propose an information bottleneck based approach to address this cross-language classification problem. Our algorithm first translates all the Chinese Web pages to English. Then, all the Web pages, including Chinese and English ones, are encoded through an information bottleneck which can allow only limited information to pass. Therefore, in order to retain as much useful information as possible, the common part between Chinese and English Web pages is inclined to be encoded to the same code (i.e. class label), which makes the cross-language classification accurate. We evaluated our approach using the Web pages collected from Open Directory Project (ODP). The experimental results show that our method significantly improves several existing supervised and semi-supervised classifiers.

...read moreread less

Patent•

Using related users data to enhance web search

[...]

Meredith June Morris¹, Jaime Teevan¹, James Mickens¹, Saleema Amershi¹•Institutions (1)

Microsoft¹

31 Dec 2008

TL;DR: In this paper, the claimed subject matter provides a system and/or a method that facilitates generating a personalized query result for a specific user, where an interface can receive at least one of a portion of a text query to be searched or personalized content related to a user that submits the portion of the text query.

...read moreread less

Abstract: The claimed subject matter provides a system and/or a method that facilitates generating a personalized query result for a specific user. An interface can receive at least one of a portion of a text query to be searched or a portion of personalized content related to a user that submits the portion of the text query. A personalization component can combine the portion of personalized content related to the user with a portion of personalized content related to one or more disparate users to create group personalized content, wherein the group personalized content is compared with the portion of the text query to identify a relationship there between to generate a personalized query result in accordance with the relationship.

...read moreread less

Journal Article•10.1080/13658810701626186•

Geographic intention and modification in web search

[...]

Rosie Jones¹, Wei V. Zhang¹, Benjamin Rey¹, Pradhuman Jhala¹, Eugene Stipp¹ - Show less +1 more•Institutions (1)

Yahoo!¹

01 Mar 2008-International Journal of Geographical Information Science

TL;DR: In this paper, the authors examine aggregated data of queries with locations, and locations identified from IP addresses, to identify overall distance preferences, as well as distance preferences by search topic, and find that automatically-modified queries are perceived as much more relevant when the geographic component is unchanged.

...read moreread less

Abstract: Web searchers signal their geographic intent by using place-names in search queries. They also indicate their flexibility about geographic specificity by reformulating their queries. By examining this data we can learn to understand web searcher flexibility with respect to geographic intent. We examine aggregated data of queries with locations, and locations identified from IP addresses, to identify overall distance preferences, as well as distance preferences by search topic. We also examine query rewriting: both deliberate query rewriting, conducted in web search sessions, and automated query rewriting, with manual relevance judgments of geo-modified queries. We find geo-specification in 12.7% of user query rewrites in search sessions, and show the breakdown into sub-classes such as same-city, same-state, same-country and different-country. We also measure the dependence between US-state-name and distance-of-modified-location-from-original-location, finding that Vermont web searchers modify their locations greater distances than California web searchers. We find that automatically-modified queries are perceived as much more relevant when the geographic component is unchanged. We look at the relationship between the non-location part of a query and the distance from the user. We see that people search for child day-care near their locations and maps far from where they are located. We also give distance profiles for the top topics which cooccur with place-names in queries, which could be used to set document priors based on document proximity and query topic.

...read moreread less

Proceedings Article•10.1145/1458082.1458177•

Learning latent semantic relations from clickthrough data for query suggestion

[...]

Hao Ma¹, Haixuan Yang¹, Irwin King¹, Michael R. Lyu¹•Institutions (1)

The Chinese University of Hong Kong¹

26 Oct 2008

TL;DR: This paper develops a novel, effective and efficient two-level query suggestion model by mining clickthrough data, in the form of two bipartite graphs (user-query and query-URL bipartITE graphs) extracted from theclickthrough data.

...read moreread less

Abstract: For a given query raised by a specific user, the Query Suggestion technique aims to recommend relevant queries which potentially suit the information needs of that user. Due to the complexity of the Web structure and the ambiguity of users' inputs, most of the suggestion algorithms suffer from the problem of poor recommendation accuracy. In this paper, aiming at providing semantically relevant queries for users, we develop a novel, effective and efficient two-level query suggestion model by mining clickthrough data, in the form of two bipartite graphs (user-query and query-URL bipartite graphs) extracted from the clickthrough data. Based on this, we first propose a joint matrix factorization method which utilizes two bipartite graphs to learn the low-rank query latent feature space, and then build a query similarity graph based on the features. After that, we design an online ranking algorithm to propagate similarities on the query similarity graph, and finally recommend latent semantically relevant queries to users. Experimental analysis on the clickthrough data of a commercial search engine shows the effectiveness and the efficiency of our method.

...read moreread less

Proceedings Article•

Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs

[...]

Marius Pasca¹, Benjamin Van Durme²•Institutions (2)

Google¹, University of Rochester²

1 Jun 2008

TL;DR: A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances, along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale, manually-assembled classes.

...read moreread less

Abstract: A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances, along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale, manually-assembled classes.

...read moreread less

Proceedings Article•10.1145/1367497.1367533•

Deciphering mobile search patterns: a study of Yahoo! mobile search queries

[...]

Jeonghee Yi¹, Farzin Maghoul¹, Jan Pedersen¹•Institutions (1)

Yahoo!¹

21 Apr 2008

TL;DR: The characteristics of search queries submitted from mobile devices using various Yahoo! one-Search applications during a 2 months period in the second half of 2007, and the query patterns derived from 20 million English sample queries are reported.

...read moreread less

Abstract: In this paper we study the characteristics of search queries submitted from mobile devices using various Yahoo! one-Search applications during a 2 months period in the second half of 2007, and report the query patterns derived from 20 million English sample queries submitted by users in US, Canada, Europe, and Asia. We examine the query distribution and topical categories the queries belong to in order to find new trends. We compare and contrast the search patterns between US vs international queries, and between queries from various search interfaces (XHTML/WAP, java widgets, and SMS). We also compare our results with previous studies wherever possible, either to confirm previous findings, or to find interesting differences in the query distribution and pattern.

...read moreread less

Journal Article•10.1145/1352582.1352590•

Conjunctive query containment and answering under description logic constraints

[...]

Diego Calvanese¹, Giuseppe De Giacomo², Maurizio Lenzerini²•Institutions (2)

Free University of Bozen-Bolzano¹, Sapienza University of Rome²

12 Jun 2008-ACM Transactions on Computational Logic

TL;DR: In this paper, the problem of query containment and query answering under description logic constraints is studied, and it is shown that the problem is undecidable in the case where inequalities in the right-hand-side query are allowed.

...read moreread less

Abstract: Query containment and query answering are two important computational tasks in databases. While query answering amounts to computing the result of a query over a database, query containment is the problem of checking whether, for every database, the result of one query is a subset of the result of another query.In this article, we deal with unions of conjunctive queries, and we address query containment and query answering under description logic constraints. Every such constraint is essentially an inclusion dependency between concepts and relations, and their expressive power is due to the possibility of using complex expressions in the specification of the dependencies, for example, intersection and difference of relations, special forms of quantification, regular expressions over binary relations. These types of constraints capture a great variety of data models, including the relational, the entity-relationship, and the object-oriented model, all extended with various forms of constraints. They also capture the basic features of the ontology languages used in the context of the Semantic Web.We present the following results on both query containment and query answering. We provide a method for query containment under description logic constraints, thus showing that the problem is decidable, and analyze its computational complexity. We prove that query containment is undecidable in the case where we allow inequalities in the right-hand-side query, even for very simple constraints and queries. We show that query answering under description logic constraints can be reduced to query containment, and illustrate how such a reduction provides upper-bound results with respect to both combined and data complexity.

...read moreread less

Patent•

Generating continuous query notifications

[...]

Srinivas S. Vemuri¹, Bipul Sinha¹, Amit Ganesh¹, Subramanyam B. Chitti¹•Institutions (1)

Business International Corporation¹

8 Aug 2008

TL;DR: In this article, a query is registered as a persistent stored entity within the database, and notifications are generated as and when the query result changes continuously as long as the query continues to be registered with the database.

...read moreread less

Abstract: Techniques are described to allow a query to be registered as a persistent stored entity within the database, and to generate notifications as and when the query result changes continuously as long as the query continues to be registered with the database. According to one aspect, for a table referenced in a query, a filter condition is generated based, at least in part, on a predicate of the query. Then, the database server determines whether the filter condition is satisfied by either a before image of a row, or an after image of the row, that was modified by a transaction. If the filter condition is satisfied by either the before image or the after image, then the query is added to a first set of queries whose result sets may have been affected by the transaction. From among the first set of queries, a second set of queries that have result sets that were actually affected by the transaction is determined. Notifications are then sent based on the second set of queries.

...read moreread less

...

Expand