Top 802 papers published in the topic of Web query classification in 2014

Showing papers on "Web query classification published in 2014"

Proceedings Article•10.1109/ICDE.2014.6816690•

Secure k-nearest neighbor query over encrypted data in outsourced environments

[...]

Yousef Elmehdwi¹, Bharath K. Samanthula¹, Wei Jiang¹•Institutions (1)

Missouri University of Science and Technology¹

19 May 2014

TL;DR: Wang et al. as discussed by the authors proposed a secure kNN protocol that protects the confidentiality of the data, user's input query, and data access patterns, and empirically analyzed the efficiency of their protocols through various experiments.

...read moreread less

Abstract: For the past decade, query processing on relational data has been studied extensively, and many theoretical and practical solutions to query processing have been proposed under various scenarios. With the recent popularity of cloud computing, users now have the opportunity to outsource their data as well as the data management tasks to the cloud. However, due to the rise of various privacy issues, sensitive data (e.g., medical records) need to be encrypted before outsourcing to the cloud. In addition, query processing tasks should be handled by the cloud; otherwise, there would be no point to outsource the data at the first place. To process queries over encrypted data without the cloud ever decrypting the data is a very challenging task. In this paper, we focus on solving the k-nearest neighbor (kNN) query problem over encrypted database outsourced to a cloud: a user issues an encrypted query record to the cloud, and the cloud returns the k closest records to the user. We first present a basic scheme and demonstrate that such a naive solution is not secure. To provide better security, we propose a secure kNN protocol that protects the confidentiality of the data, user's input query, and data access patterns. Also, we empirically analyze the efficiency of our protocols through various experiments. These results indicate that our secure protocol is very efficient on the user end, and this lightweight scheme allows a user to use any mobile device to perform the kNN query.

...read moreread less

385 citations

Proceedings Article•10.1109/SP.2014.30•

Blind Seer: A Scalable Private DBMS

[...]

Vasilis Pappas¹, Fernando Krell¹, Binh Vo¹, Vladimir Kolesnikov¹, Tal Malkin¹, Seung Geol Choi¹, Wesley George², Angelos D. Keromytis³, Steve Bellovin⁴ - Show less +5 more•Institutions (4)

Columbia University¹, Bell Labs², United States Naval Academy³, University of Toronto⁴

18 May 2014

TL;DR: This work addresses a major open problem in private DB: efficient sub linear search for arbitrary Boolean queries, and allows leakage of some search pattern information, but protects the query and data, and provides a high level of privacy for individual terms in the executed search formula.

...read moreread less

Abstract: Query privacy in secure DBMS is an important feature, although rarely formally considered outside the theoretical community. Because of the high overheads of guaranteeing privacy in complex queries, almost all previous works addressing practical applications consider limited queries (e.g., just keyword search), or provide a weak guarantee of privacy. In this work, we address a major open problem in private DB: efficient sub linear search for arbitrary Boolean queries. We consider scalable DBMS with provable security for all parties, including protection of the data from both server (who stores encrypted data) and client (who searches it), as well as protection of the query, and access control for the query. We design, build, and evaluate the performance of a rich DBMS system, suitable for real-world deployment on today medium-to large-scale DBs. On a modern server, we are able to query a formula over 10TB, 100M-record DB, with 70 searchable index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower than MySQL, although some queries are costlier). We support a rich query set, including searching on arbitrary boolean formulas on keywords and ranges, support for stemming, and free keyword searches over text fields. We identify and permit a reasonable and controlled amount of leakage, proving that no further leakage is possible. In particular, we allow leakage of some search pattern information, but protect the query and data, provide a high level of privacy for individual terms in the executed search formula, and hide the difference between a query that returned no results and a query that returned a very small result set. We also support private and complex access policies, integrated in the search process so that a query with empty result set and a query that fails the policy are hard to tell apart.

...read moreread less

322 citations

Book•

Semantic Matching in Search

[...]

Hang Li¹, Jun Xu¹•Institutions (1)

Huawei¹

20 Jun 2014

TL;DR: This survey gives a systematic and detailed introduction to newly developed machine learning technologies for query document matching (semantic matching) in search, particularly web search, and focuses on the fundamental problems, as well as the state-of-the-art solutions.

...read moreread less

Abstract: Relevance is the most important factor to assure users' satisfaction in search and the success of a search engine heavily depends on its performance on relevance. It has been observed that most of the dissatisfaction cases in relevance are due to term mismatch between queries and documents (e.g., query "NY times" does not match well with a document only containing "New York Times"), because term matching, i.e., the bag-of-words approach, still functions as the main mechanism of modern search engines. It is not exaggerated to say, therefore, that mismatch between query and document poses the most critical challenge in search. Ideally, one would like to see query and document match with each other, if they are topically relevant. Recently, researchers have expended significant effort to address the problem. The major approach is to conduct semantic matching, i.e., to perform more query and document understanding to represent the meanings of them, and perform better matching between the enriched query and document representations. With the availability of large amounts of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently. This survey gives a systematic and detailed introduction to newly developed machine learning technologies for query document matching (semantic matching) in search, particularly web search. It focuses on the fundamental problems, as well as the state-of-the-art solutions of query document matching on form aspect, phrase aspect, word sense aspect, topic aspect, and structure aspect. The ideas and solutions explained may motivate industrial practitioners to turn the research results into products. The methods introduced and the discussions made may also stimulate academic researchers to find new research directions and approaches. Matching between query and document is not limited to search and similar problems can be found in question answering, online advertising, cross-language information retrieval, machine translation, recommender systems, link prediction, image annotation, drug design, and other applications, as the general task of matching between objects from two different spaces. The technologies introduced can be generalized into more general machine learning techniques, which is referred to as learning to match in this survey.

...read moreread less

211 citations

Proceedings Article•10.1145/2588555.2593667•

Knowing when you're wrong: building fast and reliable approximate query processing systems

[...]

Sameer Agarwal¹, Henry Milner¹, Ariel Kleiner¹, Ameet Talwalkar¹, Michael I. Jordan¹, Samuel Madden², Barzan Mozafari³, Ion Stoica¹ - Show less +4 more•Institutions (3)

University of California, Berkeley¹, Massachusetts Institute of Technology², University of Michigan³

18 Jun 2014

TL;DR: In this article, a query approximation pipeline that produces approximate answers and reliable error bars at interactive speeds is presented. But it is not validated whether these techniques actually generate accurate error bars for real query workloads, and error bar estimation often fails on real world production workloads.

...read moreread less

Abstract: Modern data analytics applications typically process massive amounts of data on clusters of tens, hundreds, or thousands of machines to support near-real-time decisions.The quantity of data and limitations of disk and memory bandwidth often make it infeasible to deliver answers at interactive speeds. However, it has been widely observed that many applications can tolerate some degree of inaccuracy. This is especially true for exploratory queries on data, where users are satisfied with "close-enough" answers if they can come quickly. A popular technique for speeding up queries at the cost of accuracy is to execute each query on a sample of data, rather than the whole dataset. To ensure that the returned result is not too inaccurate, past work on approximate query processing has used statistical techniques to estimate "error bars" on returned results. However, existing work in the sampling-based approximate query processing (S-AQP) community has not validated whether these techniques actually generate accurate error bars for real query workloads. In fact, we find that error bar estimation often fails on real world production workloads. Fortunately, it is possible to quickly and accurately diagnose the failure of error estimation for a query. In this paper, we show that it is possible to implement a query approximation pipeline that produces approximate answers and reliable error bars at interactive speeds.

...read moreread less

204 citations

Journal Article•10.1007/S00778-013-0331-0•

Personalized trajectory matching in spatial networks

[...]

Shuo Shang¹, Ruogu Ding², Kai Zheng³, Christian S. Jensen⁴, Panos Kalnis², Xiaofang Zhou³ - Show less +2 more•Institutions (4)

China University of Petroleum¹, King Abdullah University of Science and Technology², University of Queensland³, Aalborg University⁴

1 Jun 2014

TL;DR: A novel two-phase search algorithm is proposed that carefully selects a set of expansion centers from the query trajectory and exploits upper and lower bounds to prune the search space in the spatial and temporal domains.

...read moreread less

Abstract: With the increasing availability of moving-object tracking data, trajectory search and matching is increasingly important. We propose and investigate a novel problem called personalized trajectory matching (PTM). In contrast to conventional trajectory similarity search by spatial distance only, PTM takes into account the significance of each sample point in a query trajectory. A PTM query takes a trajectory with user-specified weights for each sample point in the trajectory as its argument. It returns the trajectory in an argument data set with the highest similarity to the query trajectory. We believe that this type of query may bring significant benefits to users in many popular applications such as route planning, carpooling, friend recommendation, traffic analysis, urban computing, and location-based services in general. PTM query processing faces two challenges: how to prune the search space during the query processing and how to schedule multiple so-called expansion centers effectively. To address these challenges, a novel two-phase search algorithm is proposed that carefully selects a set of expansion centers from the query trajectory and exploits upper and lower bounds to prune the search space in the spatial and temporal domains. An efficiency study reveals that the algorithm explores the minimum search space in both domains. Second, a heuristic search strategy based on priority ranking is developed to schedule the multiple expansion centers, which can further prune the search space and enhance the query efficiency. The performance of the PTM query is studied in extensive experiments based on real and synthetic trajectory data sets.

...read moreread less

179 citations

Journal Article•10.1145/2638546•

Query Rewriting and Optimization for Ontological Databases

[...]

Georg Gottlob¹, Giorgio Orsi¹, Andreas Pieris¹•Institutions (1)

University of Oxford¹

07 Oct 2014-ACM Transactions on Database Systems

TL;DR: In this paper, the authors present a query rewriting algorithm for rather general types of ontological constraints, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog± family of ontology languages, can be compiled into a union of conjunctive queries (UCQ).

...read moreread less

Abstract: Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints that derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this article, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent first-order query against the underlying extensional database. We present a novel query rewriting algorithm for rather general types of ontological constraints that is well suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog± family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process soas to produce possibly small and cost-effective UCQ rewritings for an input query.

...read moreread less

104 citations

Journal Article•10.14778/2732977.2732986•

Adaptive query processing on RAW data

[...]

Manos Karpathiotakis¹, Miguel Branco¹, Ioannis Alagiannis¹, Anastasia Ailamaki¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

1 Aug 2014

TL;DR: RAW as mentioned in this paper is a prototype query engine which enables querying heterogeneous data sources transparently and employs Just-In-Time access paths, which efficiently couple heterogeneous raw files to the query engine and reduce the overheads of traditional general-purpose scan operators.

...read moreread less

Abstract: Database systems deliver impressive performance for large classes of workloads as the result of decades of research into optimizing database engines. High performance, however, is achieved at the cost of versatility. In particular, database systems only operate efficiently over loaded data, i.e., data converted from its original raw format into the system's internal data format. At the same time, data volume continues to increase exponentially and data varies increasingly, with an escalating number of new formats. The consequence is a growing impedance mismatch between the original structures holding the data in the raw files and the structures used by query engines for efficient processing. In an ideal scenario, the query engine would seamlessly adapt itself to the data and ensure efficient query processing regardless of the input data formats, optimizing itself to each instance of a file and of a query by leveraging information available at query time. Today's systems, however, force data to adapt to the query engine during data loading.This paper proposes adapting the query engine to the formats of raw data. It presents RAW, a prototype query engine which enables querying heterogeneous data sources transparently. RAW employs Just-In-Time access paths, which efficiently couple heterogeneous raw files to the query engine and reduce the overheads of traditional general-purpose scan operators. There are, however, inherent overheads with accessing raw data directly that cannot be eliminated, such as converting the raw values. Therefore, RAW also uses column shreds, ensuring that we pay these costs only for the subsets of raw data strictly needed by a query. We use RAW in a real-world scenario and achieve a two-order of magnitude speedup against the existing hand-written solution.

...read moreread less

96 citations

Patent•

Systems and methods for searching for a media asset

[...]

Sashikumar Venkataraman, Ahmed Nizam Mohaideen Pathurudeen

30 Sep 2014

TL;DR: In this paper, the authors describe a system that receives a first search query from a user and determines whether a media asset from the media assets is related to the second search query.

...read moreread less

Abstract: Systems and methods for searching for a media asset are described. In some aspects, the system includes control circuitry that receives a first search query from a user. The control circuitry identifies media assets related to the first search query from a content database. The control circuitry receives a second search query following the first search query. The control circuitry determines whether a media asset from the media assets is related to the second search query. In response to determining that less than a threshold number of media assets from the media assets are related to the second search query, the control circuitry transmits an instruction requesting the user to repeat the second search query. The control circuitry receives a third search query related to the first search query. The control circuitry determines a media asset from the media assets that is related to the third search query.

...read moreread less

84 citations

Proceedings Article•

Exploiting the query structure for efficient join ordering in SPARQL queries

[...]

Andrey Gubichev¹, Thomas Neumann¹•Institutions (1)

Technische Universität München¹

1 Jan 2014

TL;DR: This paper introduces a new join ordering algorithm that performs a SParQL-tailored query simplification and presents a novel RDF statistical synopsis that accurately estimates cardinalities in large SPARQL queries.

...read moreread less

Abstract: The join ordering problem is a fundamental challenge that has to be solved by any query optimizer. Since the high-performance RDF systems are often implemented as triple stores (i.e., they represent RDF data as a single table with three attributes, at least conceptually), the query optimization strategies employed by such systems are often adopted from relational query optimization. In this paper we show that the techniques borrowed from traditional SQL query optimization (such as Dynamic Programming algorithm or greedy heuristics) are not immediately capable of handling large SPARQL queries. We introduce a new join ordering algorithm that performs a SPARQL-tailored query simplification. Furthermore, we present a novel RDF statistical synopsis that accurately estimates cardinalities in large SPARQL queries. Our experiments show that this algorithm is highly superior to the state-of-the-art SPARQL optimization approaches, including the RDF-3X’s original Dynamic Programming strategy.

...read moreread less

83 citations

Patent•

Method and system of scoring documents based on attributes obtained from a digital document by eye-tracking data analysis

[...]

Richard Ross Peters¹, Amit Karmarkar¹•Institutions (1)

AMIT¹

27 Jan 2014

TL;DR: In this article, a set of attributes derived from an element of a first digital document is obtained from eye-tracking data of a user viewing the digital document, and a search query of a database comprising at least one query term is received.

...read moreread less

Abstract: In one exemplary embodiment, a set of attributes derived from an element of a first digital document is obtained. The element is identified from eye-tracking data of a user viewing the digital document. A search query of a database comprising at least one query term is received. A set of documents in the database is identified according to the search query. An attribute score is determined for each document. The set of documents are sorted according to the attribute score. Optionally, a commonality between the query term and at least one member of the set of attributes ma be determined. The search query may be generated by the user. The database may be a hypermedia database.

...read moreread less

81 citations

Proceedings Article•10.1145/2600428.2611177•

STICS: searching with strings, things, and cats

[...]

Johannes Hoffart¹, Dragan Milchevski¹, Gerhard Weikum¹•Institutions (1)

Max Planck Society¹

3 Jul 2014

TL;DR: An advanced search engine that supports users in querying documents by means of keywords, entities, and categories, which is automatically mapped onto appropriate suggestions for entities and categories based on named-entity disambiguation.

...read moreread less

Abstract: This paper describes an advanced search engine that supports users in querying documents by means of keywords, entities, and categories. Users simply type words, which are automatically mapped onto appropriate suggestions for entities and categories. Based on named-entity disambiguation, the search engine returns documents containing the query's entities and prominent entities from the query's categories.

...read moreread less

Proceedings Article•10.1145/2623330.2623679•

Identifying and labeling search tasks via query-based hawkes processes

[...]

Liangda Li¹, Hongbo Deng², Anlei Dong², Yi Chang², Hongyuan Zha¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Yahoo!²

24 Aug 2014

TL;DR: A probabilistic method for identifying and labeling search tasks based on the following intuitive observations: queries that are issued temporally close by users in many sequences of queries are likely to belong to the same search task, meanwhile, different users having the same information needs tend to submit topically coherent search queries.

...read moreread less

Abstract: We consider a search task as a set of queries that serve the same user information need. Analyzing search tasks from user query streams plays an important role in building a set of modern tools to improve search engine performance. In this paper, we propose a probabilistic method for identifying and labeling search tasks based on the following intuitive observations: queries that are issued temporally close by users in many sequences of queries are likely to belong to the same search task, meanwhile, different users having the same information needs tend to submit topically coherent search queries. To capture the above intuitions, we directly model query temporal patterns using a special class of point processes called Hawkes processes, and combine topic models with Hawkes processes for simultaneously identifying and labeling search tasks. Essentially, Hawkes processes utilize their self-exciting properties to identify search tasks if influence exists among a sequence of queries for individual users, while the topic model exploits query co-occurrence across different users to discover the latent information needed for labeling search tasks. More importantly, there is mutual reinforcement between Hawkes processes and the topic model in the unified model that enhances the performance of both. We evaluate our method based on both synthetic data and real-world query log data. In addition, we also apply our model to query clustering and search task identification. By comparing with state-of-the-art methods, the results demonstrate that the improvement in our proposed approach is consistent and promising.

...read moreread less

Patent•

Method and system for evaluating query suggestions quality

[...]

Alyssa Glass¹, Anlei Dong¹, Ted Eiche¹•Institutions (1)

Yahoo!¹

30 Apr 2014

TL;DR: In this paper, a plurality of query suggestions are provided in a ranking to a user, and a quality measure of the plurality of queries is calculated based on the user activity and the position of the one of the query suggestions in the ranking.

...read moreread less

Abstract: Methods, systems and programming for evaluating query suggestions quality. In one example, a plurality of query suggestions are provided in a ranking to a user. A user activity with respect to one of the plurality of query suggestions is detected. A position of the one of the plurality of query suggestions in the ranking is determined. A quality measure of the plurality of query suggestions is calculated based, at least in part, on the user activity and the position of the one of the plurality of query suggestions.

...read moreread less

Journal Article•10.1007/S13740-012-0017-6•

Query Extensions and Incremental Query Rewriting for OWL 2 QL Ontologies

[...]

Tassos Venetis¹, Giorgos Stoilos¹, Giorgos Stamou¹•Institutions (1)

National Technical University of Athens¹

01 Mar 2014-Journal on Data Semantics

TL;DR: This paper studies the problem of computing the rewriting of an extended query by ‘extending’ a previously computed rewriting of the initial query and avoiding recomputation, and implies a novel algorithm for computing the rewrite of a fixed query.

...read moreread less

Abstract: Query rewriting over lightweight ontologies, like DL-Lite ontologies, is a prominent approach for ontology-based data access. It is often the case in realistic scenarios that users ask an initial query which they later refine, e.g., by extending it with new constraints making their initial request more precise. So far, all DL-Lite systems would need to process the new query from scratch. In this paper, we study the problem of computing the rewriting of an extended query by ‘extending’ a previously computed rewriting of the initial query and avoiding recomputation. Interestingly, our approach also implies a novel algorithm for computing the rewriting of a fixed query. More precisely, the query can be ‘decomposed’ into its atoms and then each atom can be processed incrementally. We present detailed algorithms, several optimisations for improving the performance of our query rewriting algorithm, and finally, an experimental evaluation.

...read moreread less

Journal Article•10.1016/J.IPM.2013.09.002•

The use of query suggestions during information search

[...]

Xi Niu¹, Diane Kelly²•Institutions (2)

Indiana University¹, University of North Carolina at Chapel Hill²

01 Jan 2014-Information Processing and Management

TL;DR: Investigating how and when people integrate query suggestions into their searches and the outcome of this usage shows that query suggestion can provide support in situations where people have less search expertise, greater difficulty searching and at specific times during the search.

...read moreread less

Abstract: Query suggestion is a common feature of many information search systems. While much research has been conducted about how to generate suggestions, fewer studies have been conducted about how people interact with and use suggestions. The purpose of this paper is to investigate how and when people integrate query suggestions into their searches and the outcome of this usage. The paper further investigates the relationships between search expertise, topic difficulty, and temporal segment of the search and query suggestion usage. A secondary analysis of data was conducted using data collected in a previous controlled laboratory study. In this previous study, 23 undergraduate research participants used an experimental search system with query suggestions to conduct four topic searches. Results showed that participants integrated the suggestions into their searching fairly quickly and that participants with less search expertise used more suggestions and saved more documents. Participants also used more suggestions towards the end of their searches and when searching for more difficult topics. These results show that query suggestion can provide support in situations where people have less search expertise, greater difficulty searching and at specific times during the search.

...read moreread less

Patent•

Blending search results on online social networks

[...]

Girish Kumar¹, Yuval Kesten¹, Xiao Li¹, Fabio Lopiano¹•Institutions (1)

Facebook¹

4 Apr 2014

TL;DR: In this paper, the first user of an online social network receives a search query input including one or more n-grams, generates a number of query commands based on the query input, and then searches the verticals to identify objects stored by the vertical that match the query commands.

...read moreread less

Abstract: In one embodiment, a method includes receiving from a first user of an online social network a search query input including one or more n-grams; generating a number of query commands based on the search query input; and searching one or more verticals to identify one or more objects stored by the vertical that match the query commands. Each vertical stores one or more objects associated with the online social network. The method also includes generating a number of search-result modules. Each search-result module corresponds to a query command of the number of query commands. Each search-result module includes references to one or more of the identified objects matching the query command corresponding to the search-result module. The method also includes scoring the search-result modules; and sending each search-result module having a score greater than a threshold score to the first user for display.

...read moreread less

Patent•

Multi-language information retrieval and advertising

[...]

Andrey Vladislavovich Kurochkin¹, Ahmed Sobhi Mohamed Kamel¹, Sriram Parameswar¹•Institutions (1)

Microsoft¹

14 Mar 2014

TL;DR: In this article, a system for obtaining and presenting search results in a language that differs from the language in which a query is received is presented, where the search results being based on the search query and associated with the at least one second language (or dialect).

...read moreread less

Abstract: Systems, methods, and computer-readable storage media are provided for obtaining and presenting search results in a language that differs from the language in which a query is received. Upon receipt of a search query in a first language, at least one second language (or dialect) to which the search query is directed is determined and one or more search results are retrieved, the search results being based on the search query and associated with the at least one second language (or dialect). Further, embodiments of the present invention relate to generating advertisements including embedded links to landing pages that have been translated into one or more languages (or dialects) associated with a target market. In this way, advertisers are able to more successfully advertise to individuals whose primary language or dialect differs from that of the website and/or the advertiser.

...read moreread less

Patent•

Search intent for queries on online social networks

[...]

Rajat Raina¹, Kedar Dhamdhere¹, Olivier Chatot¹•Institutions (1)

Facebook¹

30 Apr 2014

TL;DR: In this paper, a structured query consisting of references to one or more selected objects accessible by the computing device is generated by generating search results corresponding to the structured query, wherein each search result corresponds to a particular object accessible by a computing device.

...read moreread less

Abstract: In one embodiment, a method includes receiving, from a client system of a first user, a structured query comprising references to one or more selected objects accessible by the computing device, generating one or more search results corresponding to the structured query, wherein each search result corresponds to a particular object accessible by the computing device, determining one or more search intents based at least on whether one or more of the selected objects referenced in the structured query match objects corresponding to a search intent indexed in a pattern-detection model, and scoring the search results based on one or more of the search intents.

...read moreread less

Patent•

Blending by Query Classification on Online Social Networks

[...]

Necip Fazil Ayan¹, Maxime Boucher¹, Xiao Li¹, Alexander Perelygin¹•Institutions (1)

Facebook¹

27 Aug 2014

TL;DR: In this article, a search query from a first user and identifying one or more second nodes that match the search query is defined. But the search intent may be based on topics associated with the identified nodes and node types of identified nodes.

...read moreread less

Abstract: In one embodiment, a method includes receiving a search query from a first user and identifying one or more second nodes that match the search query. The method includes determining one or more search intents of the search query. Search intent may be based on one or more topics associated with the identified nodes and one or more node-types of the identified nodes. The method includes generating one or more search results corresponding to the search query, the search-results being generated based on the determined search intents. The method includes sending a search-results page to the client system of the first user for display. The search-results page may include one or more of the generated search results.

...read moreread less

Proceedings Article•10.1145/2594538.2594551•

On scale independence for querying big data

[...]

Wenfei Fan¹, Floris Geerts², Leonid Libkin¹•Institutions (2)

University of Edinburgh¹, University of Antwerp²

18 Jun 2014

TL;DR: This paper defines what it means to be scale-independent, and provides matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity.

...read moreread less

Abstract: To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and access methods rather than the size of the dataset itself. This paper aims to formalize this notion and study its properties. We start by defining what it means to be scale-independent, and provide matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity. Since the complexity turns out to be rather high, and since scale-independent queries cannot be captured syntactically, we develop sufficient conditions for scale independence. We formulate them based on access schemas, which combine indexing and constraints together with bounds on the sizes of retrieved data sets. We then study two variations of scale-independent query answering, inspired by existing practical systems. One concerns incremental query answering: we check when query answers can be maintained in response to updates scale-independently. The other explores scale-independent query rewriting using views.

...read moreread less

Proceedings Article•10.5220/0005170305300537•

Combining N-gram based Similarity Analysis with Sentiment Analysis in Web Content Classification

[...]

Shuhua Liu¹, Thomas Forss¹•Institutions (1)

Arcada University of Applied Sciences¹

21 Oct 2014

TL;DR: This research concerns the development of web content detection systems that will be able to automatically classify any web page into pre-defined content categories, and makes use of tf-idf weighted n-grams in building the content classification models.

...read moreread less

Abstract: This research concerns the development of web content detection systems that will be able to automatically classify any web page into pre-defined content categories. Our work is motivated by practical experience and observations that certain categories of web pages, such as those that contain hatred and violence, are much harder to classify with good accuracy when both content and structural features are already taken into account. To further improve the performance of detection systems, we bring web sentiment features into classification models. In addition, we incorporate n-gram representation into our classification approach, based on the assumption that n-grams can capture more local context information in text, and thus could help to enhance topic similarity analysis. Different from most studies that only consider presence or frequency count of n-grams in their applications, we make use of tf-idf weighted n-grams in building the content classification models. Our result shows that unigram based models, even though a much simpler approach, show their unique value and effectiveness in web content classification. Higher order n-gram based approaches, especially 5-gram based models that combine topic similarity features with sentiment features, bring significant improvement in precision levels for the Violence and two Racism related web categories.

...read moreread less

Proceedings Article•10.1109/ICDE.2014.6816678•

History-aware query optimization with materialized intermediate views

[...]

Luis Perez¹, Chris Jermaine¹•Institutions (1)

Rice University¹

19 May 2014

TL;DR: An architecture called Hawc is introduced that extends a cost-based logical optimizer with the capability to use history information to identify query plans that, if executed, produce intermediate result sets that can be used to create materialized views with the potential to reduce the execution time of future queries.

...read moreread less

Abstract: The use of materialized views derived from the intermediate results of frequently executed queries is a popular strategy for improving performance in query workloads. Optimizers capable of matching such views with inbound queries can generate alternative execution plans that read the materialized contents directly instead of re-computing the corresponding subqueries, which tends to result in reduced query execution times. In this paper, we introduce an architecture called Hawc that extends a cost-based logical optimizer with the capability to use history information to identify query plans that, if executed, produce intermediate result sets that can be used to create materialized views with the potential to reduce the execution time of future queries. We present techniques for using knowledge of past queries to assist the query optimizer and match, generate and select useful materialized views. Experimental results indicate that these techniques provide substantial improvements in workload execution time.

...read moreread less

Book•

Society of the Query Reader: Reflections on Web Search

[...]

René König, Miriam Rasch

22 Apr 2014

Patent•

Rules-Based Generation of Search Results

[...]

Liron Shapira, Michael Harris, Jonathan Ben-Tzur

10 Dec 2014

TL;DR: In this paper, a method for determining one or more query parses based on the search query and a knowledge base is proposed, where each query parse indicates one or multiple entity types, wherein each entity type corresponds to a query term or a combination of query terms contained in the query.

...read moreread less

Abstract: A method including receiving a search query containing one or more query terms from a remote device and determining one or more query parses based on the search query and a knowledge base. Each query parse indicates one or more entity types, wherein each entity type corresponds to a query term or a combination of query terms contained in the search query. The method further includes obtaining a set of app-specific rules, each app-specific rule respectively corresponding to a respective software application. The method further includes generating a set of unparameterized function identifiers based on the plurality of app-specific rules and the one or more query parses. For each of the set of unparameterized function identifiers, the method includes parameterizing the function identifier based on the query terms. The method further includes generating search results based on the parameterized function identifiers and transmitting the search results to the remote device.

...read moreread less

Patent•

Supplementing a high performance analytics store with evaluation of individual events to respond to an event query

[...]

David Ryan Marquardt, Stephen Phillip Sorkin, Steve Yu Zhang

31 Jan 2014

TL;DR: In this paper, a search head is associated with one more indexers containing event records, and queries directed towards summarizing and reporting on event records may be received at the search head.

...read moreread less

Abstract: Embodiments are directed are towards the transparent summarization of events. Queries directed towards summarizing and reporting on event records may be received at a search head. Search heads may be associated with one more indexers containing event records. The search head may forward the query to the indexers the can resolve the query for concurrent execution. If a query is a collection query, indexers may generate summarization information based on event records located on the indexers. Event record fields included in the summarization information may be determined based on terms included in the collection query. If a query is a stats query, each indexer may generate a partial result set from previously generated summarization information, returning the partial result sets to the search head. Collection queries may be saved and scheduled to run and periodically update the summarization information.

...read moreread less

Proceedings Article•10.1145/2588555.2610531•

Dynamically optimizing queries over large scale data platforms

[...]

Konstantinos Karanasos¹, Andrey Balmin, Marcel Kutsch², Fatma Ozcan¹, Vuk Ercegovac³, Chunyang Xia¹, Jesse E. Jackson¹ - Show less +3 more•Institutions (3)

IBM¹, Apple Inc.², Google³

18 Jun 2014

TL;DR: This paper proposes new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters, and produces plans that are at least as good as, and up to 2x (4x) better for Jaql (Hive) than, the best hand-written left-deep query plans.

...read moreread less

Abstract: Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their "big data". Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business logic close to the data via user-defined functions (UDFs), which are usually opaque to the optimizer, further complicating cost-based optimization. As a result, classical relational query optimization techniques do not fit well in this setting, while at the same time, suboptimal query plans can be disastrous with large datasets. In this paper, we propose new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters. We introduce "pilot runs", which execute part of the query over a sample of the data to estimate selectivities, and employ a cost-based optimizer that uses these selectivities to choose an initial query plan. Then, we follow a dynamic optimization approach, in which plans evolve as parts of the queries get executed. Our experimental results show that our techniques produce plans that are at least as good as, and up to 2x (4x) better for Jaql (Hive) than, the best hand-written left-deep query plans.

...read moreread less

Patent•

Priming Search Results on Online Social Networks

[...]

Craig S. Campbell¹, Guarav Kulkarni¹•Institutions (1)

Facebook¹

29 Aug 2014

TL;DR: In this article, a method was proposed to determine one or more predicted queries based on the partial query input, which was generated by generating search results for each of the predicted queries.

...read moreread less

Abstract: In one embodiment, a method includes receiving from a client device of a first user of an online social network a partial query input including a first character string. The method may determine one or more predicted queries based on the partial query input. The method may generate one or more search results for each of the predicted queries. The method may send, in response to receiving the partial query input, one or more of the search results to the client device for storage in a cache of the client device. The method may also retrieve, in response to receiving a completed query input from the first user, one or more of the search results from the cache of the client device for display. The completed query input may include a second character string, where the second character string may include at least the first character string.

...read moreread less

Journal Article•10.1109/TKDE.2013.137•

Authenticating Location-Based Skyline Queries in Arbitrary Subspaces

[...]

Xin Lin¹, Jianliang Xu², Haibo Hu², Wang-Chien Lee³•Institutions (3)

East China Normal University¹, Hong Kong Baptist University², Pennsylvania State University³

01 Jun 2014-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A prefetching-based approach is developed that enables clients to compute new LASQ results locally during movement, without frequently contacting the server for query re-evaluation, and a basic Merkle Skyline R-tree method and a novel Partial S4- tree method to authenticate one-shot LASZs are proposed.

...read moreread less

Abstract: With the ever-increasing use of smartphones and tablet devices, location-based services (LBSs) have experienced explosive growth in the past few years. To scale up services, there has been a rising trend of outsourcing data management to Cloud service providers, which provide query services to clients on behalf of data owners. However, in this data-outsourcing model, the service provider can be untrustworthy or compromised, thereby returning incorrect or incomplete query results to clients, intentionally or not. Therefore, empowering clients to authenticate query results is imperative for outsourced databases. In this paper, we study the authentication problem for location-based arbitrary-subspace skyline queries (LASQs), which represent an important class of LBS applications. We propose a basic Merkle Skyline R-tree method and a novel Partial S4-tree method to authenticate one-shot LASQs. For the authentication of continuous LASQs, we develop a prefetching-based approach that enables clients to compute new LASQ results locally during movement, without frequently contacting the server for query re-evaluation. Experimental results demonstrate the efficiency of our proposed methods and algorithms under various system settings.

...read moreread less

Patent•

Search query interactions on online social networks

[...]

Rajat Raina¹, Kihyuk Hong¹, Sriram Sankar¹, Kittipat Virochsiri¹•Institutions (1)

Facebook¹

30 Apr 2014

TL;DR: In this article, a structured query consisting of an inner query constraint and an outer query constraint is generated from a client system of a first user of an online social network, where each search result corresponds to an object of the plurality of objects.

...read moreread less

Abstract: In one embodiment, a method includes receiving, from a client system of a first user of an online social network, a structured query comprising references to one or more selected objects associated with the online social network, generating a query command based on the structured query, wherein the query command comprises an inner query constraint and an outer query constraint, identifying a first set of objects matching the inner query constraint and at least in part matching the outer query constraint, identifying a second set of objects matching the outer query constraint, and generating one or more search results based on the first and second sets of objects, wherein each search result corresponds to an object of the plurality of objects.

...read moreread less

Patent•

Query relationship management

[...]

Christian Hengstler, Stefan Hesse, Martin Rosjat, Volodymyr Vasyutynskyy

29 Apr 2014

TL;DR: A query relationship data structure (RELSTRUCT) generator is proposed in this article to select a plurality of queries, each query structured for application against a database to yield a query result.

...read moreread less

Abstract: A query relationship data structure (RELSTRUCT) generator configured to select a plurality of queries, each query structured for application against a database to yield a query result. The RELSTRUCT generator includes a query analyzer configured to identify query parts of individual queries, and determine for each query, a relation, if any, of an included query part to any query part of remaining queries of the plurality of queries. The RELSTRUCT generator also may create, for each query, a query relationship data structure in which the query is related to at least one other query of the plurality of queries, based on the determined relation of a query part of the query and a query part of the at least one other query of the plurality of queries.

...read moreread less

...

Expand