TL;DR: This work presents a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths and is sensitive to the order of queries in the context while avoiding data sparsity.
Abstract: Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths. As a result, our suggestions are sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that our model outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our architecture is general enough to be used in a variety of other applications.
TL;DR: In this article, a classification, based on a first classification instance in a plurality of classification instances, is assigned without human intervention to the electronic document if the confidence data associated with the first class instance exceeds a first threshold.
Abstract: Classifying an electronic document in a computer-based system is disclosed. For each classification instance in a plurality of classification instances, a confidence data indicating a degree of confidence that the electronic document is associated with that classification instance is determined. A classification, based on a first classification instance in the plurality of classification instances, is assigned without human intervention to the electronic document if the confidence data associated with the first classification instance exceeds a first threshold.
TL;DR: This work proposes an approach that extends a query with synonyms generated from WordNet that improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.
Abstract: Source code search plays an important role in software maintenance. The effectiveness of source code search not only relies on the search technique, but also on the quality of the query. In practice, software systems are large, thus it is difficult for a developer to format an accurate query to express what really in her/his mind, especially when the maintainer and the original developer are not the same person. When a query performs poorly, it has to be reformulated. But the words used in a query may be different from those that have similar semantics in the source code, i.e., the synonyms, which will affect the accuracy of code search results. To address this issue, we propose an approach that extends a query with synonyms generated from WordNet. Our approach extracts natural language phrases from source code identifiers, matches expanded queries with these phrases, and sorts the search results. It allows developers to explore word usage in a piece of software, helps them quickly identify relevant program elements for investigation or quickly recognize alternative words for query reformulation. Our initial empirical study on search tasks performed on the JavaScript/ECMAScript interpreter and compiler, Rhino, shows that the synonyms used to expand the queries help recommend good alternative queries. Our approach also improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.
TL;DR: A brief introduction to ontology-mediated query answering using description logic (DL) ontologies, with a focus on DLs for which query answering scales polynomially in the size of the data, as these are best suited for applications requiring large amounts of data.
Abstract: Recent years have seen an increasing interest in ontology-mediated query answering, in which the semantic knowledge provided by an ontology is exploited when querying data. Adding an ontology has several advantages (e.g. simplifying query formulation, integrating data from different sources, providing more complete answers to queries), but it also makes the query answering task more difficult. In this chapter, we give a brief introduction to ontology-mediated query answering using description logic (DL) ontologies. Our focus will be on DLs for which query answering scales polynomially in the size of the data, as these are best suited for applications requiring large amounts of data. We will describe the challenges that arise when evaluating different natural types of queries in the presence of such ontologies, and we will present algorithmic solutions based upon two key concepts, namely, query rewriting and saturation. We conclude the chapter with an overview of recent results and active areas of ongoing research.
TL;DR: In this article, a system was proposed to highlight search terms in documents distributed over a network by generating a search query that includes a search term and receiving a list of one or more references to documents in the network.
Abstract: A system highlights search terms in documents distributed over a network. The system generates a search query that includes a search term and, in response to the search query, receives a list of one or more references to documents in the network. The system receives selection of one of the references and retrieves a document that corresponds to the selected reference. The system then highlights the search term in the retrieved document.
TL;DR: The system, Graph Query By Example, automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent, and efficiently finds and ranks the top approximate matching answer graphs and answer tuples.
Abstract: We witness an unprecedented proliferation of knowledge graphs that record millions of entities and their relationships. While knowledge graphs are structure-flexible and content-rich, they are difficult to use. The challenge lies in the gap between their overwhelming complexity and the limited database knowledge of non-professional users. If writing structured queries over “simple” tables is difficult, complex graphs are only harder to query. As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. Our system, Graph Query By Example ( $\mathsf {GQBE}$ ), automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent. It then efficiently finds and ranks the top approximate matching answer graphs and answer tuples. We conducted experiments and user studies on the large Freebase and DBpedia datasets and observed appealing accuracy and efficiency. Our system provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. To the best of our knowledge, there was no such proposal in the past in the context of graphs.
TL;DR: This paper proposes to address the problem of estimating online metrics that depend on user feedback using causal inference techniques, under the contextual-bandit framework, and obtains very promising results that suggest the wide applicability of these techniques.
Abstract: Optimizing an interactive system against a predefined online metric is particularly challenging, especially when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is actually run to serve live users and compared with a baseline in a controlled experiment. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run potentially many online experiments offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques.
TL;DR: A candidate generation approach using frequently observed query suffixes mined from historical search logs is described, and a supervised model for ranking these synthetic query suggestions alongside the traditional full-query candidates.
Abstract: Query auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking signals such as popularity. Such systems can only recommend queries for prefixes that have been previously seen by the search engine with adequate frequency. They fail to recommend if the prefix is sufficiently rare such that it has no matches in the precomputed candidate set. We propose a design of a QAC system that can suggest completions for rare query prefixes. In particular, we describe a candidate generation approach using frequently observed query suffixes mined from historical search logs. We then describe a supervised model for ranking these synthetic suggestions alongside the traditional full-query candidates. We further explore ranking signals that are appropriate for both types of candidates based on n-gram statistics and a convolutional latent semantic model (CLSM). Within our supervised framework the new features demonstrate significant improvements in performance over the popularity-based baseline. The synthetic query suggestions complement the existing popularity-based approach, helping users formulate rare queries.
TL;DR: An efficient search algorithm and fast evaluation of the minimum value of spatio-textual utility function are proposed and the results of empirical studies based on real check-in datasets demonstrate that the proposed index and algorithms can achieve good scalability.
Abstract: Driven by the advances in location positioning techniques and the popularity of location sharing services, semantic enriched trajectory data have become unprecedentedly available. While finding relevant Point-of-Interest (POIs) based on users' locations and query keywords has been extensively studied in the past years, it is largely untouched to explore the keyword queries in the context of semantic trajectory database. In this paper, we study the problem of approximate keyword search in massive semantic trajectories. Given a set of query keywords, an approximate keyword query of semantic trajectory (AKQST) returns k trajectories that contain the most relevant keywords to the query and yield the least travel effort in the meantime. The main difference between AKQST and conventional spatial keyword queries is that there is no query location in AKQST, which means the search area cannot be localized. To capture the travel effort in the context of query keywords, a novel utility function, called spatio-textual utility function, is first defined. Then we develop a hybrid index structure called GiKi to organize the trajectories hierarchically, which enables pruning the search space by spatial and textual similarity simultaneously. Finally an efficient search algorithm and fast evaluation of the minimum value of spatio-textual utility function are proposed. The results of our empirical studies based on real check-in datasets demonstrate that our proposed index and algorithms can achieve good scalability.
TL;DR: This paper introduces a project to develop a reliable, cost‐effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents.
Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
TL;DR: New strategies for a novel querying framework that combines query synthesis and pool-based sampling are proposed, which overcomes the limitation of query synthesis, and has the advantage of fast querying.
TL;DR: In this paper, a method for query evaluation comprises receiving a query over a set of distributed data sources, decomposing the query into sub-queries of the query, and evaluating each sub-query in the set of subqueries with respect to each data source in the given set.
Abstract: A method for query evaluation comprises receiving a query over a set of distributed data sources, decomposing the query into a set of sub-queries of the query, evaluating each sub-query in the set of sub-queries with respect to each data source in the set of distributed data sources, wherein evaluating comprises determining which data sources in the set of distributed data sources are capable of answering each sub-query and at what cost, computing a set of distributed plans by composing one or more of the sub-queries in one or more of the data sources, evaluating each plan in the set of distributed plans, selecting a sub-set of plans from the set of distributed plans to be executed for responding to the query, executing the selected sub-set of plans, and returning results of the query.
TL;DR: QuERy, a novel framework for integrating entity resolution (ER) with query processing, is proposed to correctly and efficiently answer complex queries issued on top of dirty data.
Abstract: This paper explores an analysis-aware data cleaning architecture for a large class of SPJ SQL queries. In particular, we propose QuERy, a novel framework for integrating entity resolution (ER) with query processing. The aim of QuERy is to correctly and efficiently answer complex queries issued on top of dirty data. The comprehensive empirical evaluation of the proposed solution demonstrates its significant advantage in terms of efficiency over the traditional techniques for the given problem settings.
TL;DR: This paper proposes a generic temporal query language that combines linear temporal logic with queries over ontologies and shows that, if atemporal queries are rewritable in the sense described above, then the corresponding temporal queries are also rewroteable such that they can answer them over a temporal database.
TL;DR: This paper presents a system in which classical cost-based join optimization is extended to support both access-restrictions and constraints, and explores a space of proofs that witness the answering of the query, where each proof has a direct correspondence with a query plan.
Abstract: Traditional query processing involves a search for plans formed by applying algebraic operators on top of primitives representing access to relations in the input query. But many querying scenarios involve two interacting issues that complicate the search. On the one hand, the search space may be limited by access restrictions associated with the interfaces to datasources, which require certain parameters to be given as inputs. On the other hand, the search space may be extended through the presence of integrity constraints that relate sources to each other, allowing for plans that do not match the structure of the user query.In this paper we present the first optimization approach that attacks both these difficulties within a single framework, presenting a system in which classical cost-based join optimization is extended to support both access-restrictions and constraints. Instead of iteratively exploring subqueries of the input query, our optimizer explores a space of proofs that witness the answering of the query, where each proof has a direct correspondence with a query plan.
TL;DR: In this paper, a system and method for processing a web-based query is provided, which comprises a web server for transmitting a web form having a text field box for entering a natural language query, and a language analysis server for extracting concepts from the natural language queries and classifying the natural languages query into predefined categories via computed match scores based upon the extracted concepts and information contained within an adaptable knowledge base.
Abstract: A system and method for processing a web-based query is provided. The system comprises a web server for transmitting a web form having a text field box for entering a natural language query, and a language analysis server for extracting concepts from the natural language query and classifying the natural language query into predefined categories via computed match scores based upon the extracted concepts and information contained within an adaptable knowledge base. In various embodiments, the web server selectively transmits either a resource page or a confirmation page to the client, based upon the match scores. The resource page may comprise at least one suggested response corresponding to at least one predefined category. The language analysis server may adapt the knowledge base in accordance with a communicative action received from the client after the resource page is transmitted.
TL;DR: New ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models are explored.
Abstract: Today, the presence of harmful and inappropriate content on the web still remains one of the most primary concerns for web users. Web classification models in the early days are limited by the methods and data available. In our research we revisit the web classification problem with the application of new methods and techniques for text content analysis. Our recent studies have indicated the promising potential of combing topic analysis and sentiment analysis in web content classification. In this paper we further explore new ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models.
TL;DR: This paper presents QueryVOWL, a visual query language that is based upon the ontology visualization VOWL and defines mappings to SPARQL, and aims for alanguage that is intuitive and easy to use, while remaining flexible and preserving most of the expressiveness of SParQL.
Abstract: In order to enable users without any knowledge of RDF and SPARQL to query Linked Data, visual approaches can be helpful by providing graphical support for query building. We present QueryVOWL, a visual query language that is based upon the ontology visualization VOWL and defines mappings to SPARQL. We aim for a language that is intuitive and easy to use, while remaining flexible and preserving most of the expressiveness of SPARQL. In contrast to related work, the queries can be created entirely with visual elements, taking into account RDFS and OWL concepts often used to structure Linked Data. This paper is a revised version of a workshop paper where we first introduced QueryVOWL. We present the query notation, some example queries, and two prototypical implementations of QueryVOWL. Also, we report on a qualitative user study that indicates lay users are able to construct and interpret QueryVOWL graphs.
TL;DR: This study analyzes user image search behavior from a large-scale Yahoo! Image Search query log and identifies important behavioral differences across query types, in particular showing that some query types are more exploratory, while others correspond to focused search.
Abstract: In this study, we analyze user image search behavior from a large-scale Yahoo! Image Search query log, based on the hypothesis that behavior is dependent on query type. We categorize queries using two orthogonal taxonomies (subject-based and facet-based) and identify important query types at the intersection of these taxonomies. We study user search behavior on a large-scale set of search sessions for each query type, examining characteristics of sessions, query reformulation patterns, click patterns, and page view patterns. We identify important behavioral differences across query types, in particular showing that some query types are more exploratory, while others correspond to focused search. We also supplement our study with a survey to link the behavioral differences to users' intent. Our findings shed light on the importance of considering query categories to better understand user behavior on image search platforms.
TL;DR: CrowdOp is proposed, a cost-based query optimization approach for declarative crowdsourcing systems that considers both cost and latency in query optimization objectives and generates query plans that provide a good balance between thecost and latency.
Abstract: We study the query optimization problem in declarative crowdsourcing systems. Declarative crowdsourcing is designed to hide the complexities and relieve the user of the burden of dealing with the crowd. The user is only required to submit an SQL-like query and the system takes the responsibility of compiling the query, generating the execution plan and evaluating in the crowdsourcing marketplace. A given query can have many alternative execution plans and the difference in crowdsourcing cost between the best and the worst plans may be several orders of magnitude. Therefore, as in relational database systems, query optimization is important to crowdsourcing systems that provide declarative query interfaces. In this paper, we propose CrowdOp , a cost-based query optimization approach for declarative crowdsourcing systems. CrowdOp considers both cost and latency in query optimization objectives and generates query plans that provide a good balance between the cost and latency. We develop efficient algorithms in the CrowdOp for optimizing three types of queries: selection queries, join queries, and complex selection-join queries. We validate our approach via extensive experiments by simulation as well as with the real crowd on Amazon Mechanical Turk.
TL;DR: This paper has designed and developed a mechanism that extracts information from a user's social network and uses it to re-rank the results from a search engine, based on the proposed trust and relevance matrices.
TL;DR: This paper introduces extraction of URLs based on keyword or search criteria and offers high optimality comparing with traditional web crawler and can enhance search efficiency with more accuracy.
Abstract: Users and uses of internet is growing tremendously these days which causing an extreme trouble and efforts at user side to get web pages searched which are as per concern and relevant to user's requirement Generally users approach to search web pages from a large available hierarchy of concepts or use a query to browse web pages from available search engine and receive results based on search pattern where few of the results are relevant to search and most of them are not. Web crawler plays an important role in search engine and act as a key element when performance is considered. This paper includes domain engineering concept and keyword driven crawling with relevancy decision mechanism and uses Ontology concepts which ensures the best path for improving crawler's performance. This paper introduces extraction of URLs based on keyword or search criteria. It extracts URLs for web pages which contains searched keyword in their content and considers such pages only as important and doesn't download web pages irrelevant to search. It offers high optimality comparing with traditional web crawler and can enhance search efficiency with more accuracy.
TL;DR: A set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits of cloud computing.
Abstract: Graphical abstractDisplay Omitted HighlightsMQO is adapted for relational Cloud DB with a cost model including network expenses.Alternative query plans are intelligently developed and experimentally evaluated.B&B, Genetic, Hill Climbing and Genetic-Hill Climbing algorithms are developed. Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relations, compute the same tasks (join, sort, etc.), and/or ship the same data among virtual computers. The total time spent for the queries can be reduced by executing these common tasks only once. In this study, we build and use efficient sets of query execution plans to reduce the total execution time. This is an NP-Hard problem therefore, a set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill Climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits. The optimization time of each algorithm for identifying the query execution plans and the quality of these plans are analyzed by extensive experiments.
TL;DR: In this article, a method of analyzing data is presented, which includes generating a query based on a topic of interest, expanding search terms of the query, executing the query on one or more data sources, and monitoring a specific data source selected from the one or multiple data sources.
Abstract: A method of analyzing data is presented. The method includes generating a query based on a topic of interest, expanding search terms of the query, executing the query on one or more data sources, monitoring a specific data source selected from the one or more data sources. The monitoring is performed to monitor for matches to the query.
TL;DR: New authentication data structures, the MIR-tree and MIR*-tree are proposed that enable the authentication of MkSK queries at low computation and communication costs and are capable of outperforming two baseline algorithms by orders of magnitude.
Abstract: A moving top- $k$ spatial keyword (M $k$ SK) query, which takes into account a continuously moving query location, enables a mobile client to be continuously aware of the top- $k$ spatial web objects that best match a query with respect to location and text relevance. The increasing mobile use of the web and the proliferation of geo-positioning render it of interest to consider a scenario where spatial keyword search is outsourced to a separate service provider capable at handling the voluminous spatial web objects available from various sources. A key challenge is that the service provider may return inaccurate or incorrect query results (intentionally or not), e.g., due to cost considerations or invasion of hackers. Therefore, it is attractive to be able to authenticate the query results at the client side. Existing authentication techniques are either inefficient or inapplicable for the kind of query we consider. We propose new authentication data structures, the MIR-tree and MIR $^*$ -tree, that enable the authentication of MkSK queries at low computation and communication costs. We design a verification object for authenticating MkSK queries, and we provide algorithms for constructing verification objects and using these for verifying query results. A thorough experimental study on real data shows that the proposed techniques are capable of outperforming two baseline algorithms by orders of magnitude.
TL;DR: A computer-implemented method for processing query information includes receiving prior queries followed by a current query, the prior and current queries being received within an activity period an originating with a search requester.
Abstract: A computer-implemented method for processing query information includes receiving prior queries followed by a current query, the prior and current queries being received within an activity period an originating with a search requester. The method also includes receiving a plurality of search results based on the current query. Each search result identifying a search result document, each respective search result document being associated with a query specific score indicating a relevance of the document to the current query. The method also includes determining a first category based, at least in part, on the prior queries. The method also includes identifying a plurality of prior activity periods of other search requesters, each prior activity period containing a prior activity query where the prior activity query matches the current query, and where the prior activity period indicates the same first category.
TL;DR: A novel query expansion method that makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words based on the Probabilistic Topic Model is proposed.
Abstract: This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.
TL;DR: This work proposes to exploit recent routes requested from route APIs to answer queries accurately, and design effective lower/upper bounding techniques and ordering techniques to process queries efficiently to reduce the query response time.
Abstract: Location-based services (LBS) enable mobile users to query points-of-interest (e.g., restaurants, cafes) on various features (e.g., price, quality, variety). In addition, users require accurate query results with up-to-date travel times. Lacking the monitoring infrastructure for road traffic, the LBS may obtain live travel times of routes from online route APIs in order to offer accurate results. Our goal is to reduce the number of requests issued by the LBS significantly while preserving accurate query results. First, we propose to exploit recent routes requested from route APIs to answer queries accurately. Then, we design effective lower/upper bounding techniques and ordering techniques to process queries efficiently. Also, we study parallel route requests to further reduce the query response time. Our experimental evaluation shows that our solution is three times more efficient than a competitor, and yet achieves high result accuracy (above 98 percent).
TL;DR: This paper tries to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection by enabling FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload.
Abstract: Complex analytics queries often involve expensive operations that may require large computational runtimes leading to slow query responsiveness and hampering real-time performance. Moreover, running these expensive analytics queries inside traditional online transaction processing (OLTP) systems for real-time analytics can affect the performance of mission-critical OLTP queries. On the other hand, support for real-time analytics is considered vital for important business insights and improved market responsiveness. In this paper, we try to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection. While projection helps reduce the amount of data being processed by subsequent query operations, sort is central to most database queries, even those not involving an explicit sort operation. Our system involves FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload. The FPGA-accelerated database system contains accelerator kernels for various database operations and automatic transformation of query operations into calls to these hardware kernels for seamless integration of the accelerator into the database system. Based on the query semantics, each accelerator kernel can be tailored by software to execute specific database operations and different kernels can be fused together to compose a query accelerator. Our query transformation algorithm creates a query-specific control block to customize the accelerator without requiring FPGA-reconfiguration.
TL;DR: This work employs a cost-based optimizer to process spatial-keyword queries using a spatial index and a keyword index, and introduces five optimization techniques that efficiently reduce the search space and produce a query plan with low cost.
Abstract: Important cloud services rely on spatial-keyword queries, containing a spatial predicate and arbitrary boolean keyword queries. In particular, we study the processing of such queries in main memory to support short response times. In contrast, current state-of-the-art spatial-keyword indexes and relational engines are designed for different assumptions. Rather than building a new spatial-keyword index, we employ a cost-based optimizer to process these queries using a spatial index and a keyword index. We address several technical challenges to achieve this goal. We introduce three operators as the building blocks to construct plans for main memory query processing. We then develop a cost model for the operators and query plans. We introduce five optimization techniques that efficiently reduce the search space and produce a query plan with low cost. The optimization techniques are computationally efficient, and they identify a query plan with a formal approximation guarantee under the common independence assumption. Furthermore, we extend the framework to exploit interesting orders. We implement the query optimizer to empirically validate our proposed approach using real-life datasets. The evaluation shows that the optimizations provide significant reduction in the average and tail latency of query processing: 7- to 11-fold reduction over using a single index in terms of 99th percentile response time. In addition, this approach outperforms existing spatial-keyword indexes, and DBMS query optimizers for both average and high-percentile response times.