Top 689 papers published in the topic of Web query classification in 2015

Showing papers on "Web query classification published in 2015"

Proceedings Article•10.1145/2806416.2806493•

A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion

[...]

Alessandro Sordoni¹, Yoshua Bengio¹, Hossein Vahabi², Christina Lioma³, Jakob Grue Simonsen³, Jian-Yun Nie¹ - Show less +2 more•Institutions (3)

Université de Montréal¹, Yahoo!², University of Copenhagen³

17 Oct 2015

TL;DR: This work presents a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths and is sensitive to the order of queries in the context while avoiding data sparsity.

...read moreread less

Abstract: Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths. As a result, our suggestions are sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that our model outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our architecture is general enough to be used in a variety of other applications.

...read moreread less

552 citations

Patent•

Computer readable electronic records automated classification system

[...]

Thomas A. Summerlin, Timothy Shinkle¹, Russell E. Stalters²•Institutions (2)

EMC Corporation¹, Open Text Corporation²

5 Aug 2015

TL;DR: In this article, a classification, based on a first classification instance in a plurality of classification instances, is assigned without human intervention to the electronic document if the confidence data associated with the first class instance exceeds a first threshold.

...read moreread less

Abstract: Classifying an electronic document in a computer-based system is disclosed. For each classification instance in a plurality of classification instances, a confidence data indicating a degree of confidence that the electronic document is associated with that classification instance is determined. A classification, based on a first classification instance in the plurality of classification instances, is assigned without human intervention to the electronic document if the confidence data associated with the first classification instance exceeds a first threshold.

...read moreread less

201 citations

Proceedings Article•10.1109/SANER.2015.7081874•

Query expansion via WordNet for effective code search

[...]

Meili Lu¹, Xiaobing Sun¹, Shaowei Wang², David Lo², Yucong Duan³ - Show less +1 more•Institutions (3)

Yangzhou University¹, Singapore Management University², Hainan University³

2 Mar 2015

TL;DR: This work proposes an approach that extends a query with synonyms generated from WordNet that improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.

...read moreread less

Abstract: Source code search plays an important role in software maintenance. The effectiveness of source code search not only relies on the search technique, but also on the quality of the query. In practice, software systems are large, thus it is difficult for a developer to format an accurate query to express what really in her/his mind, especially when the maintainer and the original developer are not the same person. When a query performs poorly, it has to be reformulated. But the words used in a query may be different from those that have similar semantics in the source code, i.e., the synonyms, which will affect the accuracy of code search results. To address this issue, we propose an approach that extends a query with synonyms generated from WordNet. Our approach extracts natural language phrases from source code identifiers, matches expanded queries with these phrases, and sorts the search results. It allows developers to explore word usage in a piece of software, helps them quickly identify relevant program elements for investigation or quickly recognize alternative words for query reformulation. Our initial empirical study on search tasks performed on the JavaScript/ECMAScript interpreter and compiler, Rhino, shows that the synonyms used to expand the queries help recommend good alternative queries. Our approach also improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.

...read moreread less

199 citations

Book Chapter•10.1007/978-3-319-21768-0_9•

Ontology-Mediated Query Answering with Data-Tractable Description Logics

[...]

Meghyn Bienvenu¹, Magdalena Ortiz²•Institutions (2)

University of Paris-Sud¹, Vienna University of Technology²

31 Jul 2015

TL;DR: A brief introduction to ontology-mediated query answering using description logic (DL) ontologies, with a focus on DLs for which query answering scales polynomially in the size of the data, as these are best suited for applications requiring large amounts of data.

...read moreread less

Abstract: Recent years have seen an increasing interest in ontology-mediated query answering, in which the semantic knowledge provided by an ontology is exploited when querying data. Adding an ontology has several advantages (e.g. simplifying query formulation, integrating data from different sources, providing more complete answers to queries), but it also makes the query answering task more difficult. In this chapter, we give a brief introduction to ontology-mediated query answering using description logic (DL) ontologies. Our focus will be on DLs for which query answering scales polynomially in the size of the data, as these are best suited for applications requiring large amounts of data. We will describe the challenges that arise when evaluating different natural types of queries in the presence of such ontologies, and we will present algorithmic solutions based upon two key concepts, namely, query rewriting and saturation. We conclude the chapter with an overview of recent results and active areas of ongoing research.

...read moreread less

181 citations

Patent•

Systems and methods for highlighting search results

[...]

Amit J. Patel¹, David L. desJardins¹•Institutions (1)

Google¹

25 Sep 2015

TL;DR: In this article, a system was proposed to highlight search terms in documents distributed over a network by generating a search query that includes a search term and receiving a list of one or more references to documents in the network.

...read moreread less

Abstract: A system highlights search terms in documents distributed over a network. The system generates a search query that includes a search term and, in response to the search query, receives a list of one or more references to documents in the network. The system receives selection of one of the references and retrieves a document that corresponds to the selected reference. The system then highlights the search term in the retrieved document.

...read moreread less

140 citations

Journal Article•10.1109/TKDE.2015.2426696•

Querying Knowledge Graphs by Example Entity Tuples

[...]

Nandish Jayaram¹, Arijit Khan², Chengkai Li¹, Xifeng Yan³, Ramez Elmasri¹ - Show less +1 more•Institutions (3)

University of Texas at Arlington¹, ETH Zurich², University of California, Santa Barbara³

01 Oct 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The system, Graph Query By Example, automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent, and efficiently finds and ranks the top approximate matching answer graphs and answer tuples.

...read moreread less

Abstract: We witness an unprecedented proliferation of knowledge graphs that record millions of entities and their relationships. While knowledge graphs are structure-flexible and content-rich, they are difficult to use. The challenge lies in the gap between their overwhelming complexity and the limited database knowledge of non-professional users. If writing structured queries over “simple” tables is difficult, complex graphs are only harder to query. As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. Our system, Graph Query By Example ( $\mathsf {GQBE}$ ), automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent. It then efficiently finds and ranks the top approximate matching answer graphs and answer tuples. We conducted experiments and user studies on the large Freebase and DBpedia datasets and observed appealing accuracy and efficiency. Our system provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. To the best of our knowledge, there was no such proposal in the past in the context of graphs.

...read moreread less

115 citations

Proceedings Article•10.1145/2740908.2742562•

Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study

[...]

Lihong Li¹, Shunbao Chen¹, Jim Kleban², Ankur Gupta¹•Institutions (2)

Microsoft¹, Facebook²

18 May 2015

TL;DR: This paper proposes to address the problem of estimating online metrics that depend on user feedback using causal inference techniques, under the contextual-bandit framework, and obtains very promising results that suggest the wide applicability of these techniques.

...read moreread less

Abstract: Optimizing an interactive system against a predefined online metric is particularly challenging, especially when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is actually run to serve live users and compared with a baseline in a controlled experiment. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run potentially many online experiments offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques.

...read moreread less

112 citations

Proceedings Article•10.1145/2806416.2806599•

Query Auto-Completion for Rare Prefixes

[...]

Bhaskar Mitra¹, Nick Craswell¹•Institutions (1)

Microsoft¹

17 Oct 2015

TL;DR: A candidate generation approach using frequently observed query suffixes mined from historical search logs is described, and a supervised model for ranking these synthetic query suggestions alongside the traditional full-query candidates.

...read moreread less

Abstract: Query auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking signals such as popularity. Such systems can only recommend queries for prefixes that have been previously seen by the search engine with adequate frequency. They fail to recommend if the prefix is sufficiently rare such that it has no matches in the precomputed candidate set. We propose a design of a QAC system that can suggest completions for rare query prefixes. In particular, we describe a candidate generation approach using frequently observed query suffixes mined from historical search logs. We then describe a supervised model for ranking these synthetic suggestions alongside the traditional full-query candidates. We further explore ranking signals that are appropriate for both types of candidates based on n-gram statistics and a convolutional latent semantic model (CLSM). Within our supervised framework the new features demonstrate significant improvements in performance over the popularity-based baseline. The synthetic query suggestions complement the existing popularity-based approach, helping users formulate rare queries.

...read moreread less

99 citations

Proceedings Article•10.1109/ICDE.2015.7113349•

Approximate keyword search in semantic trajectory database

[...]

Bolong Zheng¹, Nicholas Jing Yuan², Kai Zheng¹, Xing Xie², Shazia Sadiq¹, Xiaofang Zhou¹ - Show less +2 more•Institutions (2)

University of Queensland¹, Microsoft²

13 Apr 2015

TL;DR: An efficient search algorithm and fast evaluation of the minimum value of spatio-textual utility function are proposed and the results of empirical studies based on real check-in datasets demonstrate that the proposed index and algorithms can achieve good scalability.

...read moreread less

Abstract: Driven by the advances in location positioning techniques and the popularity of location sharing services, semantic enriched trajectory data have become unprecedentedly available. While finding relevant Point-of-Interest (POIs) based on users' locations and query keywords has been extensively studied in the past years, it is largely untouched to explore the keyword queries in the context of semantic trajectory database. In this paper, we study the problem of approximate keyword search in massive semantic trajectories. Given a set of query keywords, an approximate keyword query of semantic trajectory (AKQST) returns k trajectories that contain the most relevant keywords to the query and yield the least travel effort in the meantime. The main difference between AKQST and conventional spatial keyword queries is that there is no query location in AKQST, which means the search area cannot be localized. To capture the travel effort in the context of query keywords, a novel utility function, called spatio-textual utility function, is first defined. Then we develop a hybrid index structure called GiKi to organize the trajectories hierarchically, which enables pruning the search space by spatial and textual similarity simultaneously. Finally an efficient search algorithm and fast evaluation of the minimum value of spatio-textual utility function are proposed. The results of our empirical studies based on real check-in datasets demonstrate that our proposed index and algorithms can achieve good scalability.

...read moreread less

94 citations

Journal Article•10.1002/ASI.23308•

Developing a bottom-up, user-based method of web register classification

[...]

Jesse Egbert¹, Douglas Biber¹, Mark Davies²•Institutions (2)

Northern Arizona University¹, Brigham Young University²

1 Sep 2015

TL;DR: This paper introduces a project to develop a reliable, cost‐effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents.

...read moreread less

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.

...read moreread less

76 citations

Journal Article•10.1016/J.NEUCOM.2014.06.042•

Active learning via query synthesis and nearest neighbour search

[...]

Liantao Wang¹, Xuelei Hu¹, Xuelei Hu², Bo Yuan³, Jianfeng Lu¹ - Show less +1 more•Institutions (3)

Nanjing University of Science and Technology¹, University of Queensland², Tsinghua University³

05 Jan 2015-Neurocomputing

TL;DR: New strategies for a novel querying framework that combines query synthesis and pool-based sampling are proposed, which overcomes the limitation of query synthesis, and has the advantage of fast querying.

...read moreread less

Patent•

Systems and methods for query evaluation over distributed linked data stores

[...]

Achille B. Fokoue-Nkoutche¹, Anastasios Kementsietsidis¹, Spyros Kotoulas¹, M. Mustafa Rafique¹•Institutions (1)

IBM¹

10 Jul 2015

TL;DR: In this paper, a method for query evaluation comprises receiving a query over a set of distributed data sources, decomposing the query into sub-queries of the query, and evaluating each sub-query in the set of subqueries with respect to each data source in the given set.

...read moreread less

Abstract: A method for query evaluation comprises receiving a query over a set of distributed data sources, decomposing the query into a set of sub-queries of the query, evaluating each sub-query in the set of sub-queries with respect to each data source in the set of distributed data sources, wherein evaluating comprises determining which data sources in the set of distributed data sources are capable of answering each sub-query and at what cost, computing a set of distributed plans by composing one or more of the sub-queries in one or more of the data sources, evaluating each plan in the set of distributed plans, selecting a sub-set of plans from the set of distributed plans to be executed for responding to the query, executing the selected sub-set of plans, and returning results of the query.

...read moreread less

Journal Article•10.14778/2850583.2850587•

QuERy: a framework for integrating entity resolution with query processing

[...]

Hotham Altwaijry¹, Sharad Mehrotra¹, Dmitri V. Kalashnikov²•Institutions (2)

University of California, Irvine¹, AT&T Labs²

1 Nov 2015

TL;DR: QuERy, a novel framework for integrating entity resolution (ER) with query processing, is proposed to correctly and efficiently answer complex queries issued on top of dirty data.

...read moreread less

Abstract: This paper explores an analysis-aware data cleaning architecture for a large class of SPJ SQL queries. In particular, we propose QuERy, a novel framework for integrating entity resolution (ER) with query processing. The aim of QuERy is to correctly and efficiently answer complex queries issued on top of dirty data. The comprehensive empirical evaluation of the proposed solution demonstrates its significant advantage in terms of efficiency over the traditional techniques for the given problem settings.

...read moreread less

Journal Article•10.1016/J.WEBSEM.2014.11.007•

Temporalizing rewritable query languages over knowledge bases

[...]

Stefan Borgwardt¹, Marcel Lippmann¹, Veronika Thost¹•Institutions (1)

Dresden University of Technology¹

01 Aug 2015-Journal of Web Semantics

TL;DR: This paper proposes a generic temporal query language that combines linear temporal logic with queries over ontologies and shows that, if atemporal queries are rewritable in the sense described above, then the corresponding temporal queries are also rewroteable such that they can answer them over a temporal database.

...read moreread less

Journal Article•10.14778/2735703.2735708•

Querying with access patterns and integrity constraints

[...]

Michael Benedikt¹, Julien Leblay¹, Efthymia Tsamoura¹•Institutions (1)

University of Oxford¹

1 Feb 2015

TL;DR: This paper presents a system in which classical cost-based join optimization is extended to support both access-restrictions and constraints, and explores a space of proofs that witness the answering of the query, where each proof has a direct correspondence with a query plan.

...read moreread less

Abstract: Traditional query processing involves a search for plans formed by applying algebraic operators on top of primitives representing access to relations in the input query. But many querying scenarios involve two interacting issues that complicate the search. On the one hand, the search space may be limited by access restrictions associated with the interfaces to datasources, which require certain parameters to be given as inputs. On the other hand, the search space may be extended through the presence of integrity constraints that relate sources to each other, allowing for plans that do not match the structure of the user query.In this paper we present the first optimization approach that attacks both these difficulties within a single framework, presenting a system in which classical cost-based join optimization is extended to support both access-restrictions and constraints. Instead of iteratively exploring subqueries of the input query, our optimizer explores a space of proofs that witness the answering of the query, where each proof has a direct correspondence with a query plan.

...read moreread less

Patent•

Web-based customer service interface

[...]

Yoram Nelken¹, Randy Jessee¹, Steve Kirshner¹•Institutions (1)

IBM¹

6 Nov 2015

TL;DR: In this paper, a system and method for processing a web-based query is provided, which comprises a web server for transmitting a web form having a text field box for entering a natural language query, and a language analysis server for extracting concepts from the natural language queries and classifying the natural languages query into predefined categories via computed match scores based upon the extracted concepts and information contained within an adaptable knowledge base.

...read moreread less

Abstract: A system and method for processing a web-based query is provided. The system comprises a web server for transmitting a web form having a text field box for entering a natural language query, and a language analysis server for extracting concepts from the natural language query and classifying the natural language query into predefined categories via computed match scores based upon the extracted concepts and information contained within an adaptable knowledge base. In various embodiments, the web server selectively transmits either a resource page or a confirmation page to the client, based upon the match scores. The resource page may comprise at least one suggested response corresponding to at least one predefined category. The language analysis server may adapt the knowledge base in accordance with a communicative action received from the client after the resource page is transmitted.

...read moreread less

Proceedings Article•10.5220/0005636704870495•

New classification models for detecting Hate and Violence web content

[...]

Shuhua Liu¹, Thomas Forss¹•Institutions (1)

Arcada University of Applied Sciences¹

1 Nov 2015

TL;DR: New ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models are explored.

...read moreread less

Abstract: Today, the presence of harmful and inappropriate content on the web still remains one of the most primary concerns for web users. Web classification models in the early days are limited by the methods and data available. In our research we revisit the web classification problem with the application of new methods and techniques for text content analysis. Our recent studies have indicated the promising potential of combing topic analysis and sentiment analysis in web content classification. In this paper we further explore new ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models.

...read moreread less

Book Chapter•10.1007/978-3-319-25639-9_51•

QueryVOWL: A Visual Query Notation for Linked Data

[...]

Florian Haag¹, Steffen Lohmann¹, Stephan Siek¹, Thomas Ertl¹•Institutions (1)

University of Stuttgart¹

31 May 2015

TL;DR: This paper presents QueryVOWL, a visual query language that is based upon the ontology visualization VOWL and defines mappings to SPARQL, and aims for alanguage that is intuitive and easy to use, while remaining flexible and preserving most of the expressiveness of SParQL.

...read moreread less

Abstract: In order to enable users without any knowledge of RDF and SPARQL to query Linked Data, visual approaches can be helpful by providing graphical support for query building. We present QueryVOWL, a visual query language that is based upon the ontology visualization VOWL and defines mappings to SPARQL. We aim for a language that is intuitive and easy to use, while remaining flexible and preserving most of the expressiveness of SPARQL. In contrast to related work, the queries can be created entirely with visual elements, taking into account RDFS and OWL concepts often used to structure Linked Data. This paper is a revised version of a workshop paper where we first introduced QueryVOWL. We present the query notation, some example queries, and two prototypical implementations of QueryVOWL. Also, we report on a qualitative user study that indicates lay users are able to construct and interpret QueryVOWL graphs.

...read moreread less

Proceedings Article•10.1145/2702123.2702527•

A Large-Scale Study of User Image Search Behavior on the Web

[...]

Jaimie Yejean Park¹, Neil O'Hare², Rossano Schifanella³, Alejandro Jaimes², Chin-Wan Chung¹ - Show less +1 more•Institutions (3)

KAIST¹, Yahoo!², University of Turin³

18 Apr 2015

TL;DR: This study analyzes user image search behavior from a large-scale Yahoo! Image Search query log and identifies important behavioral differences across query types, in particular showing that some query types are more exploratory, while others correspond to focused search.

...read moreread less

Abstract: In this study, we analyze user image search behavior from a large-scale Yahoo! Image Search query log, based on the hypothesis that behavior is dependent on query type. We categorize queries using two orthogonal taxonomies (subject-based and facet-based) and identify important query types at the intersection of these taxonomies. We study user search behavior on a large-scale set of search sessions for each query type, examining characteristics of sessions, query reformulation patterns, click patterns, and page view patterns. We identify important behavioral differences across query types, in particular showing that some query types are more exploratory, while others correspond to focused search. We also supplement our study with a survey to link the behavioral differences to users' intent. Our findings shed light on the importance of considering query categories to better understand user behavior on image search platforms.

...read moreread less

Journal Article•10.1109/TKDE.2015.2407353•

CrowdOp: Query Optimization for Declarative Crowdsourcing Systems

[...]

Ju Fan¹, Meihui Zhang², Stanley Kok², Meiyu Lu¹, Beng Chin Ooi¹ - Show less +1 more•Institutions (2)

National University of Singapore¹, Singapore University of Technology and Design²

01 Aug 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: CrowdOp is proposed, a cost-based query optimization approach for declarative crowdsourcing systems that considers both cost and latency in query optimization objectives and generates query plans that provide a good balance between thecost and latency.

...read moreread less

Abstract: We study the query optimization problem in declarative crowdsourcing systems. Declarative crowdsourcing is designed to hide the complexities and relieve the user of the burden of dealing with the crowd. The user is only required to submit an SQL-like query and the system takes the responsibility of compiling the query, generating the execution plan and evaluating in the crowdsourcing marketplace. A given query can have many alternative execution plans and the difference in crowdsourcing cost between the best and the worst plans may be several orders of magnitude. Therefore, as in relational database systems, query optimization is important to crowdsourcing systems that provide declarative query interfaces. In this paper, we propose CrowdOp , a cost-based query optimization approach for declarative crowdsourcing systems. CrowdOp considers both cost and latency in query optimization objectives and generates query plans that provide a good balance between the cost and latency. We develop efficient algorithms in the CrowdOp for optimizing three types of queries: selection queries, join queries, and complex selection-join queries. We validate our approach via extensive experiments by simulation as well as with the real crowd on Amazon Mechanical Turk.

...read moreread less

Journal Article•10.1016/J.INS.2015.02.029•

On personalizing Web search using social network analysis

[...]

Omair Shafiq¹, Reda Alhajj¹, John G. Rokne¹•Institutions (1)

University of Calgary¹

01 Sep 2015-Information Sciences

TL;DR: This paper has designed and developed a mechanism that extracts information from a user's social network and uses it to re-rank the results from a search engine, based on the proposed trust and relevance matrices.

...read moreread less

Proceedings Article•10.1109/ECS.2015.7124749•

Keyword focused web crawler

[...]

Gunjan H. Agre, Nikita V. Mahajan

1 Feb 2015

TL;DR: This paper introduces extraction of URLs based on keyword or search criteria and offers high optimality comparing with traditional web crawler and can enhance search efficiency with more accuracy.

...read moreread less

Abstract: Users and uses of internet is growing tremendously these days which causing an extreme trouble and efforts at user side to get web pages searched which are as per concern and relevant to user's requirement Generally users approach to search web pages from a large available hierarchy of concepts or use a query to browse web pages from available search engine and receive results based on search pattern where few of the results are relevant to search and most of them are not. Web crawler plays an important role in search engine and act as a key element when performance is considered. This paper includes domain engineering concept and keyword driven crawling with relevancy decision mechanism and uses Ontology concepts which ensures the best path for improving crawler's performance. This paper introduces extraction of URLs based on keyword or search criteria. It extracts URLs for web pages which contains searched keyword in their content and considers such pages only as important and doesn't download web pages irrelevant to search. It offers high optimality comparing with traditional web crawler and can enhance search efficiency with more accuracy.

...read moreread less

Journal Article•10.1016/J.ASOC.2015.01.026•

Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries

[...]

Tansel Dokeroglu¹, Murat Ali Bayir², Ahmet Cosar¹•Institutions (2)

Middle East Technical University¹, Microsoft²

1 May 2015

TL;DR: A set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits of cloud computing.

...read moreread less

Abstract: Graphical abstractDisplay Omitted HighlightsMQO is adapted for relational Cloud DB with a cost model including network expenses.Alternative query plans are intelligently developed and experimentally evaluated.B&B, Genetic, Hill Climbing and Genetic-Hill Climbing algorithms are developed. Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relations, compute the same tasks (join, sort, etc.), and/or ship the same data among virtual computers. The total time spent for the queries can be reduced by executing these common tasks only once. In this study, we build and use efficient sets of query execution plans to reduce the total execution time. This is an NP-Hard problem therefore, a set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill Climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits. The optimization time of each algorithm for identifying the query execution plans and the quality of these plans are analyzed by extensive experiments.

...read moreread less

Patent•

Real-time and adaptive data mining

[...]

Sharon Gill Chadha, Xin Cheng, Parvinder Chadha

4 Dec 2015

TL;DR: In this article, a method of analyzing data is presented, which includes generating a query based on a topic of interest, expanding search terms of the query, executing the query on one or more data sources, and monitoring a specific data source selected from the one or multiple data sources.

...read moreread less

Abstract: A method of analyzing data is presented. The method includes generating a query based on a topic of interest, expanding search terms of the query, executing the query on one or more data sources, monitoring a specific data source selected from the one or more data sources. The monitoring is performed to monitor for matches to the query.

...read moreread less

Journal Article•10.1109/TKDE.2014.2350252•

Authentication of Moving Top-k Spatial Keyword Queries

[...]

Dingming Wu¹, Byron Choi¹, Jianliang Xu¹, Christian S. Jensen²•Institutions (2)

Hong Kong Baptist University¹, Aalborg University²

01 Apr 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: New authentication data structures, the MIR-tree and MIR*-tree are proposed that enable the authentication of MkSK queries at low computation and communication costs and are capable of outperforming two baseline algorithms by orders of magnitude.

...read moreread less

Abstract: A moving top- $k$ spatial keyword (M $k$ SK) query, which takes into account a continuously moving query location, enables a mobile client to be continuously aware of the top- $k$ spatial web objects that best match a query with respect to location and text relevance. The increasing mobile use of the web and the proliferation of geo-positioning render it of interest to consider a scenario where spatial keyword search is outsourced to a separate service provider capable at handling the voluminous spatial web objects available from various sources. A key challenge is that the service provider may return inaccurate or incorrect query results (intentionally or not), e.g., due to cost considerations or invasion of hackers. Therefore, it is attractive to be able to authenticate the query results at the client side. Existing authentication techniques are either inefficient or inapplicable for the kind of query we consider. We propose new authentication data structures, the MIR-tree and MIR $^*$ -tree, that enable the authentication of MkSK queries at low computation and communication costs. We design a verification object for authenticating MkSK queries, and we provide algorithms for constructing verification objects and using these for verifying query results. A thorough experimental study on real data shows that the proposed techniques are capable of outperforming two baseline algorithms by orders of magnitude.

...read moreread less

Patent•

Recent interest based relevance scoring

[...]

Philip A. McDonnell¹, Glen Jeh¹, Taher H. Haveliwala¹, Yair Kurzion¹•Institutions (1)

Google¹

22 Jan 2015

TL;DR: A computer-implemented method for processing query information includes receiving prior queries followed by a current query, the prior and current queries being received within an activity period an originating with a search requester.

...read moreread less

Abstract: A computer-implemented method for processing query information includes receiving prior queries followed by a current query, the prior and current queries being received within an activity period an originating with a search requester. The method also includes receiving a plurality of search results based on the current query. Each search result identifying a search result document, each respective search result document being associated with a query specific score indicating a relevance of the document to the current query. The method also includes determining a first category based, at least in part, on the prior queries. The method also includes identifying a plurality of prior activity periods of other search requesters, each prior activity period containing a prior activity query where the prior activity query matches the current query, and where the prior activity period indicates the same first category.

...read moreread less

Journal Article•10.1016/J.IPM.2014.07.004•

Weighted Word Pairs for query expansion

[...]

Francesco Colace¹, Massimo De Santo¹, Luca Greco¹, Paolo Napoletano²•Institutions (2)

University of Salerno¹, University of Milano-Bicocca²

01 Jan 2015-Information Processing and Management

TL;DR: A novel query expansion method that makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words based on the Probabilistic Topic Model is proposed.

...read moreread less

Abstract: This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.

...read moreread less

Journal Article•10.1109/TKDE.2014.2324597•

Route-Saver: Leveraging Route APIs for Accurate and Efficient Query Processing at Location-Based Services

[...]

Yu Li¹, Man Lung Yiu¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Jan 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes to exploit recent routes requested from route APIs to answer queries accurately, and design effective lower/upper bounding techniques and ordering techniques to process queries efficiently to reduce the query response time.

...read moreread less

Abstract: Location-based services (LBS) enable mobile users to query points-of-interest (e.g., restaurants, cafes) on various features (e.g., price, quality, variety). In addition, users require accurate query results with up-to-date travel times. Lacking the monitoring infrastructure for road traffic, the LBS may obtain live travel times of routes from online route APIs in order to offer accurate results. Our goal is to reduce the number of requests issued by the LBS significantly while preserving accurate query results. First, we propose to exploit recent routes requested from route APIs to answer queries accurately. Then, we design effective lower/upper bounding techniques and ordering techniques to process queries efficiently. Also, we study parallel route requests to further reduce the query response time. Our experimental evaluation shows that our solution is three times more efficient than a competitor, and yet achieves high result accuracy (above 98 percent).

...read moreread less

Journal Article•10.1007/S10766-014-0327-4•

A Hardware/Software Approach for Database Query Acceleration with FPGAs

[...]

Bharat Sukhwani¹, Mathew S. Thoennes¹, Hong Min¹, Parijat Dube¹, Bernard Brezzo¹, Sameh W. Asaad¹, Donna N. Dillenberger¹ - Show less +3 more•Institutions (1)

IBM¹

01 Dec 2015-International Journal of Parallel Programming

TL;DR: This paper tries to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection by enabling FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload.

...read moreread less

Abstract: Complex analytics queries often involve expensive operations that may require large computational runtimes leading to slow query responsiveness and hampering real-time performance. Moreover, running these expensive analytics queries inside traditional online transaction processing (OLTP) systems for real-time analytics can affect the performance of mission-critical OLTP queries. On the other hand, support for real-time analytics is considered vital for important business insights and improved market responsiveness. In this paper, we try to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection. While projection helps reduce the amount of data being processed by subsequent query operations, sort is central to most database queries, even those not involving an explicit sort operation. Our system involves FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload. The FPGA-accelerated database system contains accelerator kernels for various database operations and automatic transformation of query operations into calls to these hardware kernels for seamless integration of the accelerator into the database system. Based on the query semantics, each accelerator kernel can be tailored by software to execute specific database operations and different kernels can be fused together to compose a query accelerator. Our query transformation algorithm creates a query-specific control block to customize the accelerator without requiring FPGA-reconfiguration.

...read moreread less

Journal Article•10.14778/2850583.2850588•

Processing and optimizing main memory spatial-keyword queries

[...]

Taesung Lee¹, Jin-Woo Park², Sanghoon Lee², Seung-won Hwang¹, Sameh Elnikety³, Yuxiong He³ - Show less +2 more•Institutions (3)

Yonsei University¹, Pohang University of Science and Technology², Microsoft³

1 Nov 2015

TL;DR: This work employs a cost-based optimizer to process spatial-keyword queries using a spatial index and a keyword index, and introduces five optimization techniques that efficiently reduce the search space and produce a query plan with low cost.

...read moreread less

Abstract: Important cloud services rely on spatial-keyword queries, containing a spatial predicate and arbitrary boolean keyword queries. In particular, we study the processing of such queries in main memory to support short response times. In contrast, current state-of-the-art spatial-keyword indexes and relational engines are designed for different assumptions. Rather than building a new spatial-keyword index, we employ a cost-based optimizer to process these queries using a spatial index and a keyword index. We address several technical challenges to achieve this goal. We introduce three operators as the building blocks to construct plans for main memory query processing. We then develop a cost model for the operators and query plans. We introduce five optimization techniques that efficiently reduce the search space and produce a query plan with low cost. The optimization techniques are computationally efficient, and they identify a query plan with a formal approximation guarantee under the common independence assumption. Furthermore, we extend the framework to exploit interesting orders. We implement the query optimizer to empirically validate our proposed approach using real-life datasets. The evaluation shows that the optimizations provide significant reduction in the average and tail latency of query processing: 7- to 11-fold reduction over using a single index in terms of 99th percentile response time. In addition, this approach outperforms existing spatial-keyword indexes, and DBMS query optimizers for both average and high-percentile response times.

...read moreread less

...

Expand