Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Web query classification
  4. 2015
  1. Home
  2. Topics
  3. Web query classification
  4. 2015
Showing papers on "Web query classification published in 2015"
Proceedings Article•10.1145/2806416.2806493•
A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion

[...]

Alessandro Sordoni1, Yoshua Bengio1, Hossein Vahabi2, Christina Lioma3, Jakob Grue Simonsen3, Jian-Yun Nie1 •
Université de Montréal1, Yahoo!2, University of Copenhagen3
17 Oct 2015
TL;DR: This work presents a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths and is sensitive to the order of queries in the context while avoiding data sparsity.
Abstract: Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths. As a result, our suggestions are sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that our model outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our architecture is general enough to be used in a variety of other applications.

552 citations

Patent•
Computer readable electronic records automated classification system

[...]

Thomas A. Summerlin, Timothy Shinkle1, Russell E. Stalters2•
EMC Corporation1, Open Text Corporation2
5 Aug 2015
TL;DR: In this article, a classification, based on a first classification instance in a plurality of classification instances, is assigned without human intervention to the electronic document if the confidence data associated with the first class instance exceeds a first threshold.
Abstract: Classifying an electronic document in a computer-based system is disclosed. For each classification instance in a plurality of classification instances, a confidence data indicating a degree of confidence that the electronic document is associated with that classification instance is determined. A classification, based on a first classification instance in the plurality of classification instances, is assigned without human intervention to the electronic document if the confidence data associated with the first classification instance exceeds a first threshold.

201 citations

Proceedings Article•10.1109/SANER.2015.7081874•
Query expansion via WordNet for effective code search

[...]

Meili Lu1, Xiaobing Sun1, Shaowei Wang2, David Lo2, Yucong Duan3 •
Yangzhou University1, Singapore Management University2, Hainan University3
2 Mar 2015
TL;DR: This work proposes an approach that extends a query with synonyms generated from WordNet that improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.
Abstract: Source code search plays an important role in software maintenance. The effectiveness of source code search not only relies on the search technique, but also on the quality of the query. In practice, software systems are large, thus it is difficult for a developer to format an accurate query to express what really in her/his mind, especially when the maintainer and the original developer are not the same person. When a query performs poorly, it has to be reformulated. But the words used in a query may be different from those that have similar semantics in the source code, i.e., the synonyms, which will affect the accuracy of code search results. To address this issue, we propose an approach that extends a query with synonyms generated from WordNet. Our approach extracts natural language phrases from source code identifiers, matches expanded queries with these phrases, and sorts the search results. It allows developers to explore word usage in a piece of software, helps them quickly identify relevant program elements for investigation or quickly recognize alternative words for query reformulation. Our initial empirical study on search tasks performed on the JavaScript/ECMAScript interpreter and compiler, Rhino, shows that the synonyms used to expand the queries help recommend good alternative queries. Our approach also improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.

199 citations

Book Chapter•10.1007/978-3-319-21768-0_9•
Ontology-Mediated Query Answering with Data-Tractable Description Logics

[...]

Meghyn Bienvenu1, Magdalena Ortiz2•
University of Paris-Sud1, Vienna University of Technology2
31 Jul 2015
TL;DR: A brief introduction to ontology-mediated query answering using description logic (DL) ontologies, with a focus on DLs for which query answering scales polynomially in the size of the data, as these are best suited for applications requiring large amounts of data.
Abstract: Recent years have seen an increasing interest in ontology-mediated query answering, in which the semantic knowledge provided by an ontology is exploited when querying data. Adding an ontology has several advantages (e.g. simplifying query formulation, integrating data from different sources, providing more complete answers to queries), but it also makes the query answering task more difficult. In this chapter, we give a brief introduction to ontology-mediated query answering using description logic (DL) ontologies. Our focus will be on DLs for which query answering scales polynomially in the size of the data, as these are best suited for applications requiring large amounts of data. We will describe the challenges that arise when evaluating different natural types of queries in the presence of such ontologies, and we will present algorithmic solutions based upon two key concepts, namely, query rewriting and saturation. We conclude the chapter with an overview of recent results and active areas of ongoing research.

181 citations

Patent•
Systems and methods for highlighting search results

[...]

Amit J. Patel1, David L. desJardins1•
Google1
25 Sep 2015
TL;DR: In this article, a system was proposed to highlight search terms in documents distributed over a network by generating a search query that includes a search term and receiving a list of one or more references to documents in the network.
Abstract: A system highlights search terms in documents distributed over a network. The system generates a search query that includes a search term and, in response to the search query, receives a list of one or more references to documents in the network. The system receives selection of one of the references and retrieves a document that corresponds to the selected reference. The system then highlights the search term in the retrieved document.

140 citations

Journal Article•10.1109/TKDE.2015.2426696•
Querying Knowledge Graphs by Example Entity Tuples

[...]

Nandish Jayaram1, Arijit Khan2, Chengkai Li1, Xifeng Yan3, Ramez Elmasri1 •
University of Texas at Arlington1, ETH Zurich2, University of California, Santa Barbara3
01 Oct 2015-IEEE Transactions on Knowledge and Data Engineering
TL;DR: The system, Graph Query By Example, automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent, and efficiently finds and ranks the top approximate matching answer graphs and answer tuples.
Abstract: We witness an unprecedented proliferation of knowledge graphs that record millions of entities and their relationships. While knowledge graphs are structure-flexible and content-rich, they are difficult to use. The challenge lies in the gap between their overwhelming complexity and the limited database knowledge of non-professional users. If writing structured queries over “simple” tables is difficult, complex graphs are only harder to query. As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. Our system, Graph Query By Example ( $\mathsf {GQBE}$ ), automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent. It then efficiently finds and ranks the top approximate matching answer graphs and answer tuples. We conducted experiments and user studies on the large Freebase and DBpedia datasets and observed appealing accuracy and efficiency. Our system provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. To the best of our knowledge, there was no such proposal in the past in the context of graphs.

115 citations

Proceedings Article•10.1145/2740908.2742562•
Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study

[...]

Lihong Li1, Shunbao Chen1, Jim Kleban2, Ankur Gupta1•
Microsoft1, Facebook2
18 May 2015
TL;DR: This paper proposes to address the problem of estimating online metrics that depend on user feedback using causal inference techniques, under the contextual-bandit framework, and obtains very promising results that suggest the wide applicability of these techniques.
Abstract: Optimizing an interactive system against a predefined online metric is particularly challenging, especially when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is actually run to serve live users and compared with a baseline in a controlled experiment. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run potentially many online experiments offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques.

112 citations

Proceedings Article•10.1145/2806416.2806599•
Query Auto-Completion for Rare Prefixes

[...]

Bhaskar Mitra1, Nick Craswell1•
Microsoft1
17 Oct 2015
TL;DR: A candidate generation approach using frequently observed query suffixes mined from historical search logs is described, and a supervised model for ranking these synthetic query suggestions alongside the traditional full-query candidates.
Abstract: Query auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking signals such as popularity. Such systems can only recommend queries for prefixes that have been previously seen by the search engine with adequate frequency. They fail to recommend if the prefix is sufficiently rare such that it has no matches in the precomputed candidate set. We propose a design of a QAC system that can suggest completions for rare query prefixes. In particular, we describe a candidate generation approach using frequently observed query suffixes mined from historical search logs. We then describe a supervised model for ranking these synthetic suggestions alongside the traditional full-query candidates. We further explore ranking signals that are appropriate for both types of candidates based on n-gram statistics and a convolutional latent semantic model (CLSM). Within our supervised framework the new features demonstrate significant improvements in performance over the popularity-based baseline. The synthetic query suggestions complement the existing popularity-based approach, helping users formulate rare queries.

99 citations

Proceedings Article•10.1109/ICDE.2015.7113349•
Approximate keyword search in semantic trajectory database

[...]

Bolong Zheng1, Nicholas Jing Yuan2, Kai Zheng1, Xing Xie2, Shazia Sadiq1, Xiaofang Zhou1 •
University of Queensland1, Microsoft2
13 Apr 2015
TL;DR: An efficient search algorithm and fast evaluation of the minimum value of spatio-textual utility function are proposed and the results of empirical studies based on real check-in datasets demonstrate that the proposed index and algorithms can achieve good scalability.
Abstract: Driven by the advances in location positioning techniques and the popularity of location sharing services, semantic enriched trajectory data have become unprecedentedly available. While finding relevant Point-of-Interest (POIs) based on users' locations and query keywords has been extensively studied in the past years, it is largely untouched to explore the keyword queries in the context of semantic trajectory database. In this paper, we study the problem of approximate keyword search in massive semantic trajectories. Given a set of query keywords, an approximate keyword query of semantic trajectory (AKQST) returns k trajectories that contain the most relevant keywords to the query and yield the least travel effort in the meantime. The main difference between AKQST and conventional spatial keyword queries is that there is no query location in AKQST, which means the search area cannot be localized. To capture the travel effort in the context of query keywords, a novel utility function, called spatio-textual utility function, is first defined. Then we develop a hybrid index structure called GiKi to organize the trajectories hierarchically, which enables pruning the search space by spatial and textual similarity simultaneously. Finally an efficient search algorithm and fast evaluation of the minimum value of spatio-textual utility function are proposed. The results of our empirical studies based on real check-in datasets demonstrate that our proposed index and algorithms can achieve good scalability.

94 citations

Journal Article•10.1002/ASI.23308•
Developing a bottom-up, user-based method of web register classification

[...]

Jesse Egbert1, Douglas Biber1, Mark Davies2•
Northern Arizona University1, Brigham Young University2
1 Sep 2015
TL;DR: This paper introduces a project to develop a reliable, cost‐effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents.
Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.

76 citations

Journal Article•10.1016/J.NEUCOM.2014.06.042•
Active learning via query synthesis and nearest neighbour search

[...]

Liantao Wang1, Xuelei Hu1, Xuelei Hu2, Bo Yuan3, Jianfeng Lu1 •
Nanjing University of Science and Technology1, University of Queensland2, Tsinghua University3
05 Jan 2015-Neurocomputing
TL;DR: New strategies for a novel querying framework that combines query synthesis and pool-based sampling are proposed, which overcomes the limitation of query synthesis, and has the advantage of fast querying.
Patent•
Systems and methods for query evaluation over distributed linked data stores

[...]

Achille B. Fokoue-Nkoutche1, Anastasios Kementsietsidis1, Spyros Kotoulas1, M. Mustafa Rafique1•
IBM1
10 Jul 2015
TL;DR: In this paper, a method for query evaluation comprises receiving a query over a set of distributed data sources, decomposing the query into sub-queries of the query, and evaluating each sub-query in the set of subqueries with respect to each data source in the given set.
Abstract: A method for query evaluation comprises receiving a query over a set of distributed data sources, decomposing the query into a set of sub-queries of the query, evaluating each sub-query in the set of sub-queries with respect to each data source in the set of distributed data sources, wherein evaluating comprises determining which data sources in the set of distributed data sources are capable of answering each sub-query and at what cost, computing a set of distributed plans by composing one or more of the sub-queries in one or more of the data sources, evaluating each plan in the set of distributed plans, selecting a sub-set of plans from the set of distributed plans to be executed for responding to the query, executing the selected sub-set of plans, and returning results of the query.
Journal Article•10.14778/2850583.2850587•
QuERy: a framework for integrating entity resolution with query processing

[...]

Hotham Altwaijry1, Sharad Mehrotra1, Dmitri V. Kalashnikov2•
University of California, Irvine1, AT&T Labs2
1 Nov 2015
TL;DR: QuERy, a novel framework for integrating entity resolution (ER) with query processing, is proposed to correctly and efficiently answer complex queries issued on top of dirty data.
Abstract: This paper explores an analysis-aware data cleaning architecture for a large class of SPJ SQL queries. In particular, we propose QuERy, a novel framework for integrating entity resolution (ER) with query processing. The aim of QuERy is to correctly and efficiently answer complex queries issued on top of dirty data. The comprehensive empirical evaluation of the proposed solution demonstrates its significant advantage in terms of efficiency over the traditional techniques for the given problem settings.
Journal Article•10.1016/J.WEBSEM.2014.11.007•
Temporalizing rewritable query languages over knowledge bases

[...]

Stefan Borgwardt1, Marcel Lippmann1, Veronika Thost1•
Dresden University of Technology1
01 Aug 2015-Journal of Web Semantics
TL;DR: This paper proposes a generic temporal query language that combines linear temporal logic with queries over ontologies and shows that, if atemporal queries are rewritable in the sense described above, then the corresponding temporal queries are also rewroteable such that they can answer them over a temporal database.
Journal Article•10.14778/2735703.2735708•
Querying with access patterns and integrity constraints

[...]

Michael Benedikt1, Julien Leblay1, Efthymia Tsamoura1•
University of Oxford1
1 Feb 2015
TL;DR: This paper presents a system in which classical cost-based join optimization is extended to support both access-restrictions and constraints, and explores a space of proofs that witness the answering of the query, where each proof has a direct correspondence with a query plan.
Abstract: Traditional query processing involves a search for plans formed by applying algebraic operators on top of primitives representing access to relations in the input query. But many querying scenarios involve two interacting issues that complicate the search. On the one hand, the search space may be limited by access restrictions associated with the interfaces to datasources, which require certain parameters to be given as inputs. On the other hand, the search space may be extended through the presence of integrity constraints that relate sources to each other, allowing for plans that do not match the structure of the user query.In this paper we present the first optimization approach that attacks both these difficulties within a single framework, presenting a system in which classical cost-based join optimization is extended to support both access-restrictions and constraints. Instead of iteratively exploring subqueries of the input query, our optimizer explores a space of proofs that witness the answering of the query, where each proof has a direct correspondence with a query plan.
Patent•
Web-based customer service interface

[...]

Yoram Nelken1, Randy Jessee1, Steve Kirshner1•
IBM1
6 Nov 2015
TL;DR: In this paper, a system and method for processing a web-based query is provided, which comprises a web server for transmitting a web form having a text field box for entering a natural language query, and a language analysis server for extracting concepts from the natural language queries and classifying the natural languages query into predefined categories via computed match scores based upon the extracted concepts and information contained within an adaptable knowledge base.
Abstract: A system and method for processing a web-based query is provided. The system comprises a web server for transmitting a web form having a text field box for entering a natural language query, and a language analysis server for extracting concepts from the natural language query and classifying the natural language query into predefined categories via computed match scores based upon the extracted concepts and information contained within an adaptable knowledge base. In various embodiments, the web server selectively transmits either a resource page or a confirmation page to the client, based upon the match scores. The resource page may comprise at least one suggested response corresponding to at least one predefined category. The language analysis server may adapt the knowledge base in accordance with a communicative action received from the client after the resource page is transmitted.
Proceedings Article•10.5220/0005636704870495•
New classification models for detecting Hate and Violence web content

[...]

Shuhua Liu1, Thomas Forss1•
Arcada University of Applied Sciences1
1 Nov 2015
TL;DR: New ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models are explored.
Abstract: Today, the presence of harmful and inappropriate content on the web still remains one of the most primary concerns for web users. Web classification models in the early days are limited by the methods and data available. In our research we revisit the web classification problem with the application of new methods and techniques for text content analysis. Our recent studies have indicated the promising potential of combing topic analysis and sentiment analysis in web content classification. In this paper we further explore new ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models.
Book Chapter•10.1007/978-3-319-25639-9_51•
QueryVOWL: A Visual Query Notation for Linked Data

[...]

Florian Haag1, Steffen Lohmann1, Stephan Siek1, Thomas Ertl1•
University of Stuttgart1
31 May 2015
TL;DR: This paper presents QueryVOWL, a visual query language that is based upon the ontology visualization VOWL and defines mappings to SPARQL, and aims for alanguage that is intuitive and easy to use, while remaining flexible and preserving most of the expressiveness of SParQL.
Abstract: In order to enable users without any knowledge of RDF and SPARQL to query Linked Data, visual approaches can be helpful by providing graphical support for query building. We present QueryVOWL, a visual query language that is based upon the ontology visualization VOWL and defines mappings to SPARQL. We aim for a language that is intuitive and easy to use, while remaining flexible and preserving most of the expressiveness of SPARQL. In contrast to related work, the queries can be created entirely with visual elements, taking into account RDFS and OWL concepts often used to structure Linked Data. This paper is a revised version of a workshop paper where we first introduced QueryVOWL. We present the query notation, some example queries, and two prototypical implementations of QueryVOWL. Also, we report on a qualitative user study that indicates lay users are able to construct and interpret QueryVOWL graphs.
Proceedings Article•10.1145/2702123.2702527•
A Large-Scale Study of User Image Search Behavior on the Web

[...]

Jaimie Yejean Park1, Neil O'Hare2, Rossano Schifanella3, Alejandro Jaimes2, Chin-Wan Chung1 •
KAIST1, Yahoo!2, University of Turin3
18 Apr 2015
TL;DR: This study analyzes user image search behavior from a large-scale Yahoo! Image Search query log and identifies important behavioral differences across query types, in particular showing that some query types are more exploratory, while others correspond to focused search.
Abstract: In this study, we analyze user image search behavior from a large-scale Yahoo! Image Search query log, based on the hypothesis that behavior is dependent on query type. We categorize queries using two orthogonal taxonomies (subject-based and facet-based) and identify important query types at the intersection of these taxonomies. We study user search behavior on a large-scale set of search sessions for each query type, examining characteristics of sessions, query reformulation patterns, click patterns, and page view patterns. We identify important behavioral differences across query types, in particular showing that some query types are more exploratory, while others correspond to focused search. We also supplement our study with a survey to link the behavioral differences to users' intent. Our findings shed light on the importance of considering query categories to better understand user behavior on image search platforms.
Journal Article•10.1109/TKDE.2015.2407353•
CrowdOp: Query Optimization for Declarative Crowdsourcing Systems

[...]

Ju Fan1, Meihui Zhang2, Stanley Kok2, Meiyu Lu1, Beng Chin Ooi1 •
National University of Singapore1, Singapore University of Technology and Design2
01 Aug 2015-IEEE Transactions on Knowledge and Data Engineering
TL;DR: CrowdOp is proposed, a cost-based query optimization approach for declarative crowdsourcing systems that considers both cost and latency in query optimization objectives and generates query plans that provide a good balance between thecost and latency.
Abstract: We study the query optimization problem in declarative crowdsourcing systems. Declarative crowdsourcing is designed to hide the complexities and relieve the user of the burden of dealing with the crowd. The user is only required to submit an SQL-like query and the system takes the responsibility of compiling the query, generating the execution plan and evaluating in the crowdsourcing marketplace. A given query can have many alternative execution plans and the difference in crowdsourcing cost between the best and the worst plans may be several orders of magnitude. Therefore, as in relational database systems, query optimization is important to crowdsourcing systems that provide declarative query interfaces. In this paper, we propose CrowdOp , a cost-based query optimization approach for declarative crowdsourcing systems. CrowdOp considers both cost and latency in query optimization objectives and generates query plans that provide a good balance between the cost and latency. We develop efficient algorithms in the CrowdOp for optimizing three types of queries: selection queries, join queries, and complex selection-join queries. We validate our approach via extensive experiments by simulation as well as with the real crowd on Amazon Mechanical Turk.
Journal Article•10.1016/J.INS.2015.02.029•
On personalizing Web search using social network analysis

[...]

Omair Shafiq1, Reda Alhajj1, John G. Rokne1•
University of Calgary1
01 Sep 2015-Information Sciences
TL;DR: This paper has designed and developed a mechanism that extracts information from a user's social network and uses it to re-rank the results from a search engine, based on the proposed trust and relevance matrices.
Proceedings Article•10.1109/ECS.2015.7124749•
Keyword focused web crawler

[...]

Gunjan H. Agre, Nikita V. Mahajan
1 Feb 2015
TL;DR: This paper introduces extraction of URLs based on keyword or search criteria and offers high optimality comparing with traditional web crawler and can enhance search efficiency with more accuracy.
Abstract: Users and uses of internet is growing tremendously these days which causing an extreme trouble and efforts at user side to get web pages searched which are as per concern and relevant to user's requirement Generally users approach to search web pages from a large available hierarchy of concepts or use a query to browse web pages from available search engine and receive results based on search pattern where few of the results are relevant to search and most of them are not. Web crawler plays an important role in search engine and act as a key element when performance is considered. This paper includes domain engineering concept and keyword driven crawling with relevancy decision mechanism and uses Ontology concepts which ensures the best path for improving crawler's performance. This paper introduces extraction of URLs based on keyword or search criteria. It extracts URLs for web pages which contains searched keyword in their content and considers such pages only as important and doesn't download web pages irrelevant to search. It offers high optimality comparing with traditional web crawler and can enhance search efficiency with more accuracy.
Journal Article•10.1016/J.ASOC.2015.01.026•
Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries

[...]

Tansel Dokeroglu1, Murat Ali Bayir2, Ahmet Cosar1•
Middle East Technical University1, Microsoft2
1 May 2015
TL;DR: A set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits of cloud computing.
Abstract: Graphical abstractDisplay Omitted HighlightsMQO is adapted for relational Cloud DB with a cost model including network expenses.Alternative query plans are intelligently developed and experimentally evaluated.B&B, Genetic, Hill Climbing and Genetic-Hill Climbing algorithms are developed. Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relations, compute the same tasks (join, sort, etc.), and/or ship the same data among virtual computers. The total time spent for the queries can be reduced by executing these common tasks only once. In this study, we build and use efficient sets of query execution plans to reduce the total execution time. This is an NP-Hard problem therefore, a set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill Climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits. The optimization time of each algorithm for identifying the query execution plans and the quality of these plans are analyzed by extensive experiments.
Patent•
Real-time and adaptive data mining

[...]

Sharon Gill Chadha, Xin Cheng, Parvinder Chadha
4 Dec 2015
TL;DR: In this article, a method of analyzing data is presented, which includes generating a query based on a topic of interest, expanding search terms of the query, executing the query on one or more data sources, and monitoring a specific data source selected from the one or multiple data sources.
Abstract: A method of analyzing data is presented. The method includes generating a query based on a topic of interest, expanding search terms of the query, executing the query on one or more data sources, monitoring a specific data source selected from the one or more data sources. The monitoring is performed to monitor for matches to the query.
Journal Article•10.1109/TKDE.2014.2350252•
Authentication of Moving Top-k Spatial Keyword Queries

[...]

Dingming Wu1, Byron Choi1, Jianliang Xu1, Christian S. Jensen2•
Hong Kong Baptist University1, Aalborg University2
01 Apr 2015-IEEE Transactions on Knowledge and Data Engineering
TL;DR: New authentication data structures, the MIR-tree and MIR*-tree are proposed that enable the authentication of MkSK queries at low computation and communication costs and are capable of outperforming two baseline algorithms by orders of magnitude.
Abstract: A moving top- $k$ spatial keyword (M $k$ SK) query, which takes into account a continuously moving query location, enables a mobile client to be continuously aware of the top- $k$ spatial web objects that best match a query with respect to location and text relevance. The increasing mobile use of the web and the proliferation of geo-positioning render it of interest to consider a scenario where spatial keyword search is outsourced to a separate service provider capable at handling the voluminous spatial web objects available from various sources. A key challenge is that the service provider may return inaccurate or incorrect query results (intentionally or not), e.g., due to cost considerations or invasion of hackers. Therefore, it is attractive to be able to authenticate the query results at the client side. Existing authentication techniques are either inefficient or inapplicable for the kind of query we consider. We propose new authentication data structures, the MIR-tree and MIR $^*$ -tree, that enable the authentication of MkSK queries at low computation and communication costs. We design a verification object for authenticating MkSK queries, and we provide algorithms for constructing verification objects and using these for verifying query results. A thorough experimental study on real data shows that the proposed techniques are capable of outperforming two baseline algorithms by orders of magnitude.
Patent•
Recent interest based relevance scoring

[...]

Philip A. McDonnell1, Glen Jeh1, Taher H. Haveliwala1, Yair Kurzion1•
Google1
22 Jan 2015
TL;DR: A computer-implemented method for processing query information includes receiving prior queries followed by a current query, the prior and current queries being received within an activity period an originating with a search requester.
Abstract: A computer-implemented method for processing query information includes receiving prior queries followed by a current query, the prior and current queries being received within an activity period an originating with a search requester. The method also includes receiving a plurality of search results based on the current query. Each search result identifying a search result document, each respective search result document being associated with a query specific score indicating a relevance of the document to the current query. The method also includes determining a first category based, at least in part, on the prior queries. The method also includes identifying a plurality of prior activity periods of other search requesters, each prior activity period containing a prior activity query where the prior activity query matches the current query, and where the prior activity period indicates the same first category.
Journal Article•10.1016/J.IPM.2014.07.004•
Weighted Word Pairs for query expansion

[...]

Francesco Colace1, Massimo De Santo1, Luca Greco1, Paolo Napoletano2•
University of Salerno1, University of Milano-Bicocca2
01 Jan 2015-Information Processing and Management
TL;DR: A novel query expansion method that makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words based on the Probabilistic Topic Model is proposed.
Abstract: This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.
Journal Article•10.1109/TKDE.2014.2324597•
Route-Saver: Leveraging Route APIs for Accurate and Efficient Query Processing at Location-Based Services

[...]

Yu Li1, Man Lung Yiu1•
Hong Kong Polytechnic University1
01 Jan 2015-IEEE Transactions on Knowledge and Data Engineering
TL;DR: This work proposes to exploit recent routes requested from route APIs to answer queries accurately, and design effective lower/upper bounding techniques and ordering techniques to process queries efficiently to reduce the query response time.
Abstract: Location-based services (LBS) enable mobile users to query points-of-interest (e.g., restaurants, cafes) on various features (e.g., price, quality, variety). In addition, users require accurate query results with up-to-date travel times. Lacking the monitoring infrastructure for road traffic, the LBS may obtain live travel times of routes from online route APIs in order to offer accurate results. Our goal is to reduce the number of requests issued by the LBS significantly while preserving accurate query results. First, we propose to exploit recent routes requested from route APIs to answer queries accurately. Then, we design effective lower/upper bounding techniques and ordering techniques to process queries efficiently. Also, we study parallel route requests to further reduce the query response time. Our experimental evaluation shows that our solution is three times more efficient than a competitor, and yet achieves high result accuracy (above 98 percent).
Journal Article•10.1007/S10766-014-0327-4•
A Hardware/Software Approach for Database Query Acceleration with FPGAs

[...]

Bharat Sukhwani1, Mathew S. Thoennes1, Hong Min1, Parijat Dube1, Bernard Brezzo1, Sameh W. Asaad1, Donna N. Dillenberger1 •
IBM1
01 Dec 2015-International Journal of Parallel Programming
TL;DR: This paper tries to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection by enabling FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload.
Abstract: Complex analytics queries often involve expensive operations that may require large computational runtimes leading to slow query responsiveness and hampering real-time performance. Moreover, running these expensive analytics queries inside traditional online transaction processing (OLTP) systems for real-time analytics can affect the performance of mission-critical OLTP queries. On the other hand, support for real-time analytics is considered vital for important business insights and improved market responsiveness. In this paper, we try to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection. While projection helps reduce the amount of data being processed by subsequent query operations, sort is central to most database queries, even those not involving an explicit sort operation. Our system involves FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload. The FPGA-accelerated database system contains accelerator kernels for various database operations and automatic transformation of query operations into calls to these hardware kernels for seamless integration of the accelerator into the database system. Based on the query semantics, each accelerator kernel can be tailored by software to execute specific database operations and different kernels can be fused together to compose a query accelerator. Our query transformation algorithm creates a query-specific control block to customize the accelerator without requiring FPGA-reconfiguration.
Journal Article•10.14778/2850583.2850588•
Processing and optimizing main memory spatial-keyword queries

[...]

Taesung Lee1, Jin-Woo Park2, Sanghoon Lee2, Seung-won Hwang1, Sameh Elnikety3, Yuxiong He3 •
Yonsei University1, Pohang University of Science and Technology2, Microsoft3
1 Nov 2015
TL;DR: This work employs a cost-based optimizer to process spatial-keyword queries using a spatial index and a keyword index, and introduces five optimization techniques that efficiently reduce the search space and produce a query plan with low cost.
Abstract: Important cloud services rely on spatial-keyword queries, containing a spatial predicate and arbitrary boolean keyword queries. In particular, we study the processing of such queries in main memory to support short response times. In contrast, current state-of-the-art spatial-keyword indexes and relational engines are designed for different assumptions. Rather than building a new spatial-keyword index, we employ a cost-based optimizer to process these queries using a spatial index and a keyword index. We address several technical challenges to achieve this goal. We introduce three operators as the building blocks to construct plans for main memory query processing. We then develop a cost model for the operators and query plans. We introduce five optimization techniques that efficiently reduce the search space and produce a query plan with low cost. The optimization techniques are computationally efficient, and they identify a query plan with a formal approximation guarantee under the common independence assumption. Furthermore, we extend the framework to exploit interesting orders. We implement the query optimizer to empirically validate our proposed approach using real-life datasets. The evaluation shows that the optimizations provide significant reduction in the average and tail latency of query processing: 7- to 11-fold reduction over using a single index in terms of 99th percentile response time. In addition, this approach outperforms existing spatial-keyword indexes, and DBMS query optimizers for both average and high-percentile response times.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve