Top 859 papers published in the topic of Web query classification in 2013

Showing papers on "Web query classification published in 2013"

Proceedings Article•10.1109/ICDE.2013.6544899•

Predicting query execution time: Are optimizer cost models really unusable?

[...]

Wentao Wu¹, Yun Chi, Shenghuo Zhu, Junichi Tatemura, Hakan Hacigumus, J.F. Naughton¹ - Show less +2 more•Institutions (1)

University of Wisconsin-Madison¹

8 Apr 2013

TL;DR: This paper investigates the novel idea of spending extra resources to refine estimates for the query plan after it has been chosen by the optimizer but before execution and finds a well calibrated query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive.

...read moreread less

Abstract: Predicting query execution time is useful in many database management issues including admission control, query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by query optimizers are insufficient for query execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizer's estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the query plan after it has been chosen by the optimizer but before execution. In our experiments we find that a well calibrated query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

...read moreread less

192 citations

Proceedings Article•10.1145/2463664.2465223•

Ontology-based data access: a study through disjunctive datalog, CSP, and MMSNP

[...]

Meghyn Bienvenu¹, Balder ten Cate², Carsten Lutz³, Frank Wolter⁴•Institutions (4)

University of Paris-Sud¹, University of California, Santa Cruz², University of Bremen³, University of Liverpool⁴

22 Jun 2013

TL;DR: This paper studies several classes of ontology-mediated queries, where the database queries are given as some form of conjunctive query and the ontologies are formulated in description logics or other relevant fragments of first-order logic, such as the guarded fragment and the unary-negation fragment.

...read moreread less

Abstract: Ontology-based data access is concerned with querying incomplete data sources in the presence of domain-specific knowledge provided by an ontology. A central notion in this setting is that of an ontology-mediated query, which is a database query coupled with an ontology. In this paper, we study several classes of ontology-mediated queries, where the database queries are given as some form of conjunctive query and the ontologies are formulated in description logics or other relevant fragments of first-order logic, such as the guarded fragment and the unary-negation fragment. The contributions of the paper are three-fold. First, we characterize the expressive power of ontology-mediated queries in terms of fragments of disjunctive datalog. Second, we establish intimate connections between ontology-mediated queries and constraint satisfaction problems (CSPs) and their logical generalization, MMSNP formulas. Third, we exploit these connections to obtain new results regarding (i) first-order rewritability and datalog-rewritability of ontology-mediated queries, (ii) P/NP dichotomies for ontology-mediated queries, and (iii) the query containment problem for ontology-mediated queries.

...read moreread less

188 citations

Proceedings Article•10.1109/ICDE.2013.6544828•

Towards efficient search for activity trajectories

[...]

Kai Zheng¹, Shuo Shang², Nicholas Jing Yuan³, Yi Yang⁴•Institutions (4)

University of Queensland¹, Aalborg University², Microsoft³, Carnegie Mellon University⁴

8 Apr 2013

TL;DR: A novel hybrid grid index, GAT, is developed to organize the trajectory segments and activities hierarchically, which enables the search space by location proximity and activity containment simultaneously, and algorithms for efficient computation of the minimum match distance and minimum order-sensitive match distance are proposed.

...read moreread less

Abstract: The advances in location positioning and wireless communication technologies have led to a myriad of spatial trajectories representing the mobility of a variety of moving objects. While processing trajectory data with the focus of spatio-temporal features has been widely studied in the last decade, recent proliferation in location-based web applications (e.g., Foursquare, Facebook) has given rise to large amounts of trajectories associated with activity information, called activity trajectory. In this paper, we study the problem of efficient similarity search on activity trajectory database. Given a sequence of query locations, each associated with a set of desired activities, an activity trajectory similarity query (ATSQ) returns k trajectories that cover the query activities and yield the shortest minimum match distance. An order-sensitive activity trajectory similarity query (OATSQ) is also proposed to take into account the order of the query locations. To process the queries efficiently, we firstly develop a novel hybrid grid index, GAT, to organize the trajectory segments and activities hierarchically, which enables us to prune the search space by location proximity and activity containment simultaneously. In addition, we propose algorithms for efficient computation of the minimum match distance and minimum order-sensitive match distance, respectively. The results of our extensive empirical studies based on real online check-in datasets demonstrate that our proposed index and methods are capable of achieving superior performance and good scalability.

...read moreread less

188 citations

Patent•

Graph query processing using plurality of engines

[...]

Sameh Elnikety¹, Yuxiong He¹, Sherif Sakr¹•Institutions (1)

Microsoft¹

14 Dec 2013

TL;DR: In this paper, a graph query submitted to a graph database which is modeled by an attributed graph is received, and the graph query is decomposed into a plurality of query components.

...read moreread less

Abstract: Graph queries are processed using a plurality of independent query execution engines. A graph query submitted to a graph database which is modeled by an attributed graph is received. The graph query is decomposed into a plurality of query components. For each of the query components, a one of the query execution engines that is available to process the query component is identified, a sub-query representing the query component is generated, the sub-query is sent to the identified query execution engine for processing, and results for the sub-query are received from the identified query execution engine. The results received are then combined to generate a response to the graph query.

...read moreread less

181 citations

Patent•

Distributed high performance analytics store

[...]

David Ryan Marquardt, Stephen Phillip Sorkin, Steve Yu Zhang

31 Jan 2013

TL;DR: In this article, a search head is associated with one more indexers containing event records, and queries directed towards summarizing and reporting on event records may be received at the search head.

...read moreread less

Abstract: Embodiments are directed are towards the transparent summarization of events. Queries directed towards summarizing and reporting on event records may be received at a search head. Search heads may be associated with one more indexers containing event records. The search head may forward the query to the indexers the can resolve the query for concurrent execution. If a query is a collection query, indexers may generate summarization information based on event records located on the indexers. Event record fields included in the summarization information may be determined based on terms included in the collection query. If a query is a stats query, each indexer may generate a partial result set from previously generated summarization information, returning the partial result sets to the search head. Collection queries may be saved and scheduled to run and periodically update the summarization information.

...read moreread less

153 citations

Proceedings Article•10.1145/2488388.2488401•

Inferring the demographics of search users: social data meets search queries

[...]

Bin Bi¹, Milad Shokouhi², Michal Kosinski³, Thore Graepel²•Institutions (3)

University of California, Los Angeles¹, Microsoft², University of Cambridge³

13 May 2013

TL;DR: It is shown that it is indeed feasible to infer important demographic data of users from their query history based on labelled Likes data and it is believed that this approach could provide valuable information for personalization and monetization even in the absence of demographic data.

...read moreread less

Abstract: Knowing users' views and demographic traits offers a great potential for personalizing web search results or related services such as query suggestion and query completion. Such signals however are often only available for a small fraction of search users, namely those who log in with their social network account and allow its use for personalization of search results. In this paper, we offer a solution to this problem by showing how user demographic traits such as age and gender, and even political and religious views can be efficiently and accurately inferred based on their search query histories. This is accomplished in two steps; we first train predictive models based on the publically available myPersonality dataset containing users' Facebook Likes and their demographic information. We then match Facebook Likes with search queries using Open Directory Project categories. Finally, we apply the model trained on Facebook Likes to large-scale query logs of a commercial search engine while explicitly taking into account the difference between the traits distribution in both datasets. We find that the accuracy of classifying age and gender, expressed by the area under the ROC curve (AUC), are 77% and 84% respectively for predictions based on Facebook Likes, and only degrade to 74% and 80% when based on search queries. On a US state-by-state basis we find a Pearson correlation of 0.72 for political views between the predicted scores and Gallup data, and 0.54 for affiliation with Judaism between predicted scores and data from the US Religious Landscape Survey. We conclude that it is indeed feasible to infer important demographic data of users from their query history based on labelled Likes data and believe that this approach could provide valuable information for personalization and monetization even in the absence of demographic data.

...read moreread less

133 citations

Patent•

Display of Dynamic Interference Graph Results

[...]

Thomas M. Annau, Gregory B. Lindahl¹, Samuel Makonnen¹, Michael Markson¹, Keith Peters, Robert Michael Saliba, Al Sary, Rich Skrenta, Dan Swartz, Robert N. Truel, Timothy Walters - Show less +7 more•Institutions (1)

IBM¹

30 Apr 2013

TL;DR: In this article, a Slashtag server is configured to detect at least a search operator in a search query, the search operator being associated with a category of content from a social network site.

...read moreread less

Abstract: A search engine system, including a slashtag server configured to detect at least a search operator in a search query, the search operator being associated with a category of content from a social network site. Also, a web server configured to, in response to detecting the search query, generate a first search result based on at least the category of content associated with the search operator, and display the first search result in a web browser.

...read moreread less

116 citations

Proceedings Article•10.1109/ASRU.2013.6707708•

Query understanding enhanced by hierarchical parsing structures

[...]

Jingjing Liu¹, Panupong Pasupat¹, Yining Wang², Scott Cyphers¹, James Glass¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Tsinghua University²

1 Dec 2013

TL;DR: This work extracts a set of syntactic structural features and semantic dependency features from query parse trees to enhance inference model learning and shows that augmenting sequence labeling models with linguistic knowledge can improve query understanding performance in various domains.

...read moreread less

Abstract: Query understanding has been well studied in the areas of information retrieval and spoken language understanding (SLU). There are generally three layers of query understanding: domain classification, user intent detection, and semantic tagging. Classifiers can be applied to domain and intent detection in real systems, and semantic tagging (or slot filling) is commonly defined as a sequence-labeling task - mapping a sequence of words to a sequence of labels. Various statistical features (e.g., n-grams) can be extracted from annotated queries for learning label prediction models; however, linguistic characteristics of queries, such as hierarchical structures and semantic relationships, are usually neglected in the feature extraction process. In this work, we propose an approach that leverages linguistic knowledge encoded in hierarchical parse trees for query understanding. Specifically, for natural language queries, we extract a set of syntactic structural features and semantic dependency features from query parse trees to enhance inference model learning. Experiments on real natural language queries show that augmenting sequence labeling models with linguistic knowledge can improve query understanding performance in various domains.

...read moreread less

112 citations

Proceedings Article•10.1109/MSR.2013.6624044•

Assisting code search with automatic Query Reformulation for bug localization

[...]

Bunyamin Sisman¹, Avinash C. Kak¹•Institutions (1)

Purdue University¹

18 May 2013

TL;DR: This paper demonstrates how the difficulty of designing a proper query can be alleviated through automatic Query Reformulation (QR) - an under-the-hood operation for reformulating a user's query with no additional input from the user.

...read moreread less

Abstract: Source code retrieval plays an important role in many software engineering tasks. However, designing a query that can accurately retrieve the relevant software artifacts can be challenging for developers as it requires a certain level of knowledge and experience regarding the code base. This paper demonstrates how the difficulty of designing a proper query can be alleviated through automatic Query Reformulation (QR) - an under-the-hood operation for reformulating a user's query with no additional input from the user. The proposed QR framework works by enriching a user's search query with certain specific additional terms drawn from the highest-ranked artifacts retrieved in response to the initial query. The important point here is that these additional terms injected into a query are those that are deemed to be “close” to the original query terms in the source code on the basis of positional proximity. This similarity metric is based on the notion that terms that deal with the same concepts in source code are usually proximal to one another in the same files. We demonstrate the superiority of our QR framework in relation to the QR frameworks well-known in the natural language document retrieval by showing significant improvements in bug localization performance for two large software projects using more than 4,000 queries.

...read moreread less

99 citations

Patent•

Interactive query completion templates

[...]

Yosi Markovich¹, Jack W. Menzel¹, Sean Liu¹•Institutions (1)

Google¹

27 Feb 2013

TL;DR: In this article, a query completion template is provided for display for a category of information associated with one or more terms within the partial query, the template including an interactive field that is user editable.

...read moreread less

Abstract: Methods, systems and apparatus are described herein that includes identifying a partial query entered into a search field. A query completion template is then provided for display for a category of information associated with one or more terms within the partial query, the query completion template including an interactive field that is user editable. User interaction with the interactive field is the identified. Display of the query completion template is then updated to include the results of the user interaction within the interactive of the query completion template. User selection of the updated query completion template is then identified, and in response the updated display of the query completion template is transmitted as a search query.

...read moreread less

88 citations

Journal Article•10.5120/13895-1227•

Search Query Recommendations using Hybrid User Profile with Query Logs

[...]

R. Umagandhi, A. Senthil Kumar

18 Oct 2013-International Journal of Computer Applications

TL;DR: The Query Recommendation technique provides alternative queries to the user to frame a meaningful and relevant query in the future and rapidly satisfies their information needs.

...read moreread less

Abstract: The exhaustive information available in the World Wide Web indeed, unfolds the challenge of exploring the apposite, precise and relevant data in every search result. Apparently, in such instances of web-searching, Query Recommendations is the ultimate application in information retrieval. The Query Recommendation technique provides alternative queries to the user to frame a meaningful and relevant query in the future and rapidly satisfies their information needs. Similar query

...read moreread less

Book Chapter•10.1007/978-3-642-41335-3_36•

DAW: Duplicate-AWare Federated Query Processing over the Web of Data

[...]

Muhammad Saleem¹, Axel-Cyrille Ngonga Ngomo¹, Josiane Xavier Parreira², Helena F. Deus², Manfred Hauswirth³ - Show less +1 more•Institutions (3)

Leipzig University¹, National University of Ireland, Galway², Digital Enterprise Research Institute³

21 Oct 2013

TL;DR: DAW is presented, a novel duplicate-aware approach to federated querying over the Web of Data that can significantly improve the performance of federated query processing engines and provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.

...read moreread less

Abstract: Over the last years the Web of Data has developed into a large compendium of interlinked data sets from multiple domains. Due to the decentralised architecture of this compendium, several of these datasets contain duplicated data. Yet, so far, only little attention has been paid to the effect of duplicated data on federated querying. This work presents DAW, a novel duplicate-aware approach to federated querying over the Web of Data. DAW is based on a combination of min-wise independent permutations and compact data summaries. It can be directly combined with existing federated query engines in order to achieve the same query recall values while querying fewer data sources. We extend three well-known federated query processing engines DARQ, SPLENDID, and FedX with DAW and compare our extensions with the original approaches. The comparison shows that DAW can greatly reduce the number of queries sent to the endpoints, while keeping high query recall values. Therefore, it can significantly improve the performance of federated query processing engines. Moreover, DAW provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.

...read moreread less

Patent•

Determining concepts associated with a query

[...]

Digvijay Singh Lamba¹, Wang Chee Lam¹, Michel A. Tourn¹•Institutions (1)

Walmart¹

9 Sep 2013

TL;DR: In this article, a query is received and a list of concepts and associated scores is received, and a density function is used to evaluate the received concepts and one or more concepts are associated with the query based at least in part on the results of the density function.

...read moreread less

Abstract: Determining one or more concepts associated with a query is disclosed. A query is received. A list of concepts and associated scores is received. The concepts fit within a concept hierarchy. A density function is used to evaluate the received concepts. One or more concepts are associated with the query based at least in part on the results of the density function.

...read moreread less

Journal Article•10.1145/2493175.2493181•

Behavioral dynamics on the web: Learning, modeling, and prediction

[...]

Kira Radinsky¹, Krysta M. Svore², Susan T. Dumais², Milad Shokouhi², Jaime Teevan², Alex Bocharov², Eric Horvitz² - Show less +3 more•Institutions (2)

Technion – Israel Institute of Technology¹, Microsoft²

05 Aug 2013-ACM Transactions on Information Systems

TL;DR: A temporal modeling framework adapted from physics and signal processing is developed and harnessed to predict temporal patterns in search behavior using smoothing, trends, periodicities, and surprises and presents two applications where new methods introduced for the temporal modeling of user behavior significantly improve upon the state of the art.

...read moreread less

Abstract: The queries people issue to a search engine and the results clicked following a query change over time. For example, after the earthquake in Japan in March 2011, the query japan spiked in popularity and people issuing the query were more likely to click government-related results than they would prior to the earthquake. We explore the modeling and prediction of such temporal patterns in Web search behavior. We develop a temporal modeling framework adapted from physics and signal processing and harness it to predict temporal patterns in search behavior using smoothing, trends, periodicities, and surprises. Using current and past behavioral data, we develop a learning procedure that can be used to construct models of users' Web search activities. We also develop a novel methodology that learns to select the best prediction model from a family of predictive models for a given query or a class of queries. Experimental results indicate that the predictive models significantly outperform baseline models that weight historical evidence the same for all queries. We present two applications where new methods introduced for the temporal modeling of user behavior significantly improve upon the state of the art. Finally, we discuss opportunities for using models of temporal dynamics to enhance other areas of Web search and information retrieval.

...read moreread less

Proceedings Article•10.1145/2484402.2484415•

Secure k-NN computation on encrypted cloud data without sharing key with query users

[...]

Youwen Zhu¹, Rui Xu¹, Tsuyoshi Takagi¹•Institutions (1)

Kyushu University¹

8 May 2013

TL;DR: This paper proposes a novel secure and efficient scheme for k-NN query on encrypted cloud data in which the key of data owner to encrypt and decrypt ousourced data will not be completely disclosed to any query user.

...read moreread less

Abstract: In cloud computing, secure analysis on outsourced encrypted data is a significant topic. As a frequently used query for online applications, secure k-nearest neighbors (k-NN) computation on encrypted cloud data has received much attention, and several solutions for it have been put forward. However, most existing schemes assume the query users are fully trusted and all query users share the total key which is used to encrypt and decrypt data owner's outsourced data. It is constitutionally not feasible in lots of real-world applications. In this paper, we propose a novel secure and efficient scheme for k-NN query on encrypted cloud data in which the key of data owner to encrypt and decrypt ousourced data will not be completely disclosed to any query user. Therefore, our scheme can efficiently support the secure k-NN query on encrypted cloud data even when query users are not trustworthy enough.

...read moreread less

Patent•

Filtering suggested structured queries on online social networks

[...]

Xiao Li¹•Institutions (1)

Facebook¹

8 May 2013

TL;DR: In this paper, a method for accessing a social graph comprising a plurality of nodes and edges connecting the nodes, receiving from a user an unstructured text query, generating a set of structured queries based on the text query and calculating a quality score based on both the text and the structured queries for each structured query in the set, and filtering the set to remove each structured queries having quality score less than a threshold score.

...read moreread less

Abstract: In one embodiment, a method includes accessing a social graph comprising a plurality of nodes and a plurality of edges connecting the nodes, receiving from a user an unstructured text query, generating a set of structured queries based on the text query, calculating a quality score based on the text query and the structured query for each structured query in the set, and filtering the set to remove each structured query having a quality score less than a threshold score.

...read moreread less

Proceedings Article•10.1145/2452376.2452441•

Optimizing query rewriting in ontology-based data access

[...]

Floriana Di Pinto¹, Domenico Lembo¹, Maurizio Lenzerini¹, Riccardo Mancini¹, Antonella Poggi¹, Riccardo Rosati¹, Marco Ruzzi¹, Domenico Fabio Savo¹ - Show less +4 more•Institutions (1)

Sapienza University of Rome¹

18 Mar 2013

TL;DR: A new approach to the optimization of query rewriting in OBDA, using the usage of inclusion between mapping views and the use of perfect mappings to drastically lower the combinatorial explosion due to mapping rewriting.

...read moreread less

Abstract: In ontology-based data access (OBDA), an ontology is connected to autonomous, and generally pre-existing, data repositories through mappings, so as to provide a high-level, conceptual view over such data. User queries are posed over the ontology, and answers are computed by reasoning both on the ontology and the mappings. Query answering in OBDA systems is typically performed through a query rewriting approach which is divided into two steps: (i) the query is rewritten with respect to the ontology (ontology rewriting of the query); (ii) the query thus obtained is then reformulated over the database schema using the mapping assertions (mapping rewriting of the query). In this paper we present a new approach to the optimization of query rewriting in OBDA. The key ideas of our approach are the usage of inclusion between mapping views and the usage of perfect mappings, which allow us to drastically lower the combinatorial explosion due to mapping rewriting. These ideas are formalized in PerfectMap, an algorithm for OBDA query rewriting. We have experimented PerfectMap in a real-world OBDA scenario: our experimental results clearly show that, in such a scenario, the optimizations of PerfectMap are crucial to effectively perform query answering.

...read moreread less

Proceedings Article•10.1145/2470654.2481325•

The challenges of specifying intervals and absences in temporal queries: a graphical language approach

[...]

Megan Monroe¹, Rongjian Lan¹, Juan Morales del Olmo², Ben Shneiderman¹, Catherine Plaisant¹, Jeff Millstein³ - Show less +2 more•Institutions (3)

University of Maryland, College Park¹, Technical University of Madrid², Oracle Corporation³

27 Apr 2013

TL;DR: This paper reports on a two-part user study, as well as a series of early tests and interviews with clinical researchers, that informed the development of two temporal query interfaces: a basic, menu-based interfaces and an advanced, graphic-based interface.

...read moreread less

Abstract: In our burgeoning world of ubiquitous sensors and affordable data storage, records of timestamped events are being produced across nearly every domain of personal and professional computing. The resulting data surge has created an overarching need to search these records for meaningful patterns of events. This paper reports on a two-part user study, as well as a series of early tests and interviews with clinical researchers, that informed the development of two temporal query interfaces: a basic, menu-based interface and an advanced, graphic-based interface. While the scope of temporal query is very broad, this work focuses on two particularly complex and critical facets of temporal event sequences: intervals (events with both a start time and an end time), and the absence of an event. We describe how users encounter a common set of difficulties when specifying such queries, and propose solutions to help overcome them. Finally, we report on two case studies with epidemiologists at the US Army Pharmacovigilance Center, illustrating how both query interfaces were used to study patterns of drug use.

...read moreread less

Patent•

Context-based search query formation

[...]

Peng Bai¹, Zheng Chen¹, Xuedong David Huang¹, Xiaochuan Ni¹, Jian-Tao Sun¹, Zhimin Zhang¹ - Show less +2 more•Institutions (1)

Microsoft¹

1 Feb 2013

TL;DR: In this article, the user is provided with query suggestions based on the selected text and the query suggestions are ranked based on a context provided by the document, where the user may select the text by using a mouse, drawing a circle around the text on a touch screen, or by other input techniques.

...read moreread less

Abstract: Searching is assisted by recognizing a selection of text from a document as an indication that a user wishes to initiate a search based on the selected text. The user is provided with query suggestions based on the selected text and the query suggestions are ranked based on a context provided by the document. The user may select the text by using a mouse, drawing a circle around the text on a touch screen, or by other input techniques. The query suggestions may be based on query reformulation or query expansion techniques applied to the selected text. Context provided by the document is used by a language model and/or an artificial intelligence system to rank the query suggestions in predicted order of relevance based on the selected text and the context.

...read moreread less

Patent•

Implicit question query identification

[...]

Nitin Gupta¹, Preyas Popat¹, Steven D. Baker¹, Srinivasan Venkatachary¹•Institutions (1)

Google¹

18 Nov 2013

Patent•

Dynamic language model

[...]

Pedro J. Moreno Mengibar¹, Michael H. Cohen¹•Institutions (1)

Google¹

18 Jun 2013

TL;DR: In this article, a dynamic language model for speech recognition is presented, where the first word sequence having a base probability value is used to convert a voice search query to a text search query based on one or more probabilities.

...read moreread less

Abstract: The invention relates to a dynamic language model. Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition are provided. One of the methods includes receiving a base language model for speech recognition including a first word sequence having a base probability value; receiving a voice search query associated with a query context; determining that a customized language model is to be used when the query context satisfies one or more criteria associated with the customized language model; obtaining the customized language model, thecustomized language model including the first word sequence having an adjusted probability value being the base probability value adjusted according to the query context; and converting the voice search query to a text search query based on one or more probabilities, each of the probabilities corresponding to a word sequence in a group of one or more word sequences, the group including the firstword sequence having the adjusted probability value.

...read moreread less

Journal Article•10.1145/2508037.2508041•

Social semantic query expansion

[...]

Claudio Biancalana¹, Fabio Gasparetti¹, Alessandro Micarelli¹, Giuseppe Sansonetti¹•Institutions (1)

Roma Tre University¹

08 Oct 2013-ACM Transactions on Intelligent Systems and Technology

TL;DR: The results of an indepth experimental evaluation show that the proposed weak semantic technique for query expansion outperforms traditional techniques, such as relevance feedback and personalized PageRank, so confirming the validity and usefulness of the categorization of the user needs and preferences in semantic classes.

...read moreread less

Abstract: Weak semantic techniques rely on the integration of Semantic Web techniques with social annotations and aim to embrace the strengths of both. In this article, we propose a novel weak semantic technique for query expansion. Traditional query expansion techniques are based on the computation of two-dimensional co-occurrence matrices. Our approach proposes the use of three-dimensional matrices, where the added dimension is represented by semantic classes (i.e., categories comprising all the terms that share a semantic property) related to the folksonomy extracted from social bookmarking services, such as delicious and StumbleUpon. The results of an indepth experimental evaluation performed on both artificial datasets and real users show that our approach outperforms traditional techniques, such as relevance feedback and personalized PageRank, so confirming the validity and usefulness of the categorization of the user needs and preferences in semantic classes. We also present the results of a questionnaire aimed to know the users opinion regarding the system. As one drawback of several query expansion techniques is their high computational costs, we also provide a complexity analysis of our system, in order to show its capability of operating in real time.

...read moreread less

Patent•10.21437/INTERSPEECH.2012-84•

Exploiting the semantic web for unsupervised natural language semantic parsing

[...]

Gokhan Tur¹, Dilek Hakkani-Tur¹, Larry Heck¹, Minwoo Jeong¹, Ye-Yi Wang¹ - Show less +1 more•Institutions (1)

Microsoft¹

21 Feb 2013

TL;DR: In this article, search queries that hit structured web pages are automatically mined for information that is used to semantically annotate the queries and the automatically annotated queries may be used for automatically building statistical unsupervised slot filling models without using a semantic annotation guideline.

...read moreread less

Abstract: Structured web pages are accessed and parsed to obtain implicit annotation for natural language understanding tasks. Search queries that hit these structured web pages are automatically mined for information that is used to semantically annotate the queries. The automatically annotated queries may be used for automatically building statistical unsupervised slot filling models without using a semantic annotation guideline. For example, tags that are located on a structured web page that are associated with the search query may be used to annotate the query. The mined search queries may be filtered to create a set of queries that is in a form of a natural language query and/or remove queries that are difficult to parse. A natural language model may be trained using the resulting mined queries. Some queries may be set aside for testing and the model may be adapted using in-domain sentences that are not annotated. The models may be tested using these implicitly annotated natural-language-like queries in an unsupervised fashion.

...read moreread less

Journal Article•10.14778/2536206.2536207•

Query optimization over crowdsourced data

[...]

Hyunjung Park¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

1 Aug 2013

TL;DR: Novel techniques incorporated into Deco's query optimizer include a cost model distinguishing between "free" existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging.

...read moreread less

Abstract: Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco's cost-based query optimizer, building on Deco's data model, query language, and query execution engine presented earlier. Deco's objective in query optimization is to find the best query plan to answer a query, in terms of estimated monetary cost. Deco's query semantics and plan execution strategies require several fundamental changes to traditional query optimization. Novel techniques incorporated into Deco's query optimizer include a cost model distinguishing between "free" existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging. We experimentally evaluate Deco's query optimizer, focusing on the accuracy of cost estimation and the efficiency of plan enumeration.

...read moreread less

Patent•

Method of converting query plans to native code

[...]

Craig Steven Freedman¹, Erik Ismert¹•Institutions (1)

Microsoft¹

12 Mar 2013

TL;DR: In this article, the authors present a method to perform database queries based on a particular database query and access a query plan based on the specific database query, which has operators and specific operational parameters associated with each of the operators.

...read moreread less

Abstract: Performing database queries. A method includes receiving a particular database query. The method further includes accessing a query plan based on the particular database query. The query plan has operators and specific operational parameters associated with each of the operators. The association of operators and specific operational parameters is specific to the particular database query. From the query plan, the method further includes instantiating a plurality of compiled code templates. Each code template includes executable code that when executed performs functionality of one of the operators from the query plan with the specific operational parameters applied in the compilation. The method further includes binding the code templates together using programmatic control flow to create a functioning program.

...read moreread less

Proceedings Article•10.1109/ICDE.2013.6544819•

Top-k query processing in probabilistic databases with non-materialized views

[...]

Maximilian Dylla¹, Iris Miliaraki¹, Martin Theobald²•Institutions (2)

Max Planck Society¹, University of Antwerp²

8 Apr 2013

TL;DR: This work is the first to address integrated data and confidence computations for intensional query evaluations in the context of probabilistic databases by considering confidence bounds over first-order lineage formulas and extends query processing techniques by a tool-suite of scheduling strategies based on selectivity estimation.

...read moreread less

Abstract: We investigate a novel approach of computing confidence bounds for top-k ranking queries in probabilistic databases with non-materialized views. Unlike related approaches, we present an exact pruning algorithm for finding the top-ranked query answers according to their marginal probabilities without the need to first materialize all answer candidates via the views. Specifically, we consider conjunctive queries over multiple levels of select-project-join views, the latter of which are cast into Datalog rules which we ground in a top-down fashion directly at query processing time. To our knowledge, this work is the first to address integrated data and confidence computations for intensional query evaluations in the context of probabilistic databases by considering confidence bounds over first-order lineage formulas. We extend our query processing techniques by a tool-suite of scheduling strategies based on selectivity estimation and the expected impact on confidence bounds. Further extensions to our query processing strategies include improved top-k bounds in the case when sorted relations are available as input, as well as the consideration of recursive rules. Experiments with large datasets demonstrate significant runtime improvements of our approach compared to both exact and sampling-based top-k methods over probabilistic data.

...read moreread less

Posted Content•

Oblivious Query Processing

[...]

Arvind Arasu¹, Raghav Kaushik¹•Institutions (1)

Microsoft¹

14 Dec 2013-arXiv: Databases

TL;DR: In this paper, the first formal study of secure query processing over encrypted data is presented, where an adversary having full knowledge of the query (text) and observing the query execution learns nothing about the underlying database other than the result size on the database.

...read moreread less

Abstract: Motivated by cloud security concerns, there is an increasing interest in database systems that can store and support queries over encrypted data. A common architecture for such systems is to use a trusted component such as a cryptographic co-processor for query processing that is used to securely decrypt data and perform computations in plaintext. The trusted component has limited memory, so most of the (input and intermediate) data is kept encrypted in an untrusted storage and moved to the trusted component on ``demand.'' In this setting, even with strong encryption, the data access pattern from untrusted storage has the potential to reveal sensitive information; indeed, all existing systems that use a trusted component for query processing over encrypted data have this vulnerability. In this paper, we undertake the first formal study of secure query processing, where an adversary having full knowledge of the query (text) and observing the query execution learns nothing about the underlying database other than the result size of the query on the database. We introduce a simpler notion, oblivious query processing, and show formally that a query admits secure query processing iff it admits oblivious query processing. We present oblivious query processing algorithms for a rich class of database queries involving selections, joins, grouping and aggregation. For queries not handled by our algorithms, we provide some initial evidence that designing oblivious (and therefore secure) algorithms would be hard via reductions from two simple, well-studied problems that are generally believed to be hard. Our study of oblivious query processing also reveals interesting connections to database join theory.

...read moreread less

Proceedings Article•10.1145/2488388.2488484•

Learning joint query interpretation and response ranking

[...]

Uma Sawant¹, Soumen Chakrabarti¹•Institutions (1)

Indian Institute of Technology Bombay¹

13 May 2013

TL;DR: This work proposes two new, natural formulations for joint query interpretation and response ranking that exploit bidirectional flow of information between the knowledge base and the corpus, inspired by probabilistic language models and max-margin discriminative learning.

...read moreread less

Abstract: Thanks to information extraction and semantic Web efforts, search on unstructured text is increasingly refined using semantic annotations and structured knowledge bases. However, most users cannot become familiar with the schema of knowledge bases and ask structured queries. Interpreting free-format queries into a more structured representation is of much current interest. The dominant paradigm is to segment or partition query tokens by purpose (references to types, entities, attribute names, attribute values, relations) and then launch the interpreted query on structured knowledge bases. Given that structured knowledge extraction is never complete, here we choose a less trodden path: a data representation that retains the unstructured text corpus, along with structured annotations (mentions of entities and relationships) on it. We propose two new, natural formulations for joint query interpretation and response ranking that exploit bidirectional flow of information between the knowledge base and the corpus. One, inspired by probabilistic language models, computes expected response scores over the uncertainties of query interpretation. The other is based on max-margin discriminative learning, with latent variables representing those uncertainties. In the context of typed entity search, both formulations bridge a considerable part of the accuracy gap between a generic query that does not constrain the type at all, and the upper bound where the "perfect" target entity type of each query is provided by humans. Our formulations are also superior to a two-stage approach of first choosing a target type using recent query type prediction techniques, and then launching a type-restricted entity search query.

...read moreread less

Patent•

Join operations for continuous queries over archived views

[...]

Unmesh Anil Deshmukh¹, Anand Srinivasan¹, Vikram Shukla¹, Prathab Kali¹•Institutions (1)

Business International Corporation¹

25 Sep 2013

TL;DR: In this paper, a continuous query may be received, the continuous query being identified based at least in part on an archived view, the archived view being created and/or identified based on a join query related to two or more archived relations associated with an application.

...read moreread less

Abstract: A continuous query may be received, the continuous query being identified based at least in part on an archived view. The archived view may be created and/or identified based at least in part on a join query related to two or more archived relations associated with an application, at least one of the two or more archived relations being identified as a dimension relation. A query plan for the continuous query may be generated. A join operator in the query plan may be identified based at least in part on the dimension relation. A state of an operator corresponding to the dimension relation may be initialized. It may be identified if the state of the operator identifies an event that detects a change to the dimension relation. The continuous query may be re-started based at least in part on the event that detects the change to the dimension relation.

...read moreread less

Proceedings Article•10.1145/2536146.2536149•

OptiqueVQS: towards an ontology-based visual query system for big data

[...]

Ahmet Soylu¹, Martin Giese¹, Ernesto Jiménez-Ruiz², Evgeny Kharlamov², Dmitry Zheleznyakov², Ian Horrocks² - Show less +2 more•Institutions (2)

University of Oslo¹, University of Oxford²

28 Oct 2013

TL;DR: This paper focuses on end-user visual query formulation, demonstrates the preliminary ontology-based visual query system (i.e., interface), and discusses initial insights for alleviating the affects of Big Data.

...read moreread less

Abstract: A recent EU project, named Optique, with a strong industrial perspective, strives to enable scalable end-user access to Big Data. To this end, Optique employs an ontology-based approach, along with other techniques such as query optimisation and parallelisation, for scalable query formulation and evaluation. In this paper, we specifically focus on end-user visual query formulation, demonstrate our preliminary ontology-based visual query system (i.e., interface), and discuss initial insights for alleviating the affects of Big Data.

...read moreread less

...

Expand