Top 295 papers published in the topic of Web query classification in 2017

Showing papers on "Web query classification published in 2017"

Proceedings Article•10.1145/3035918.3056097•

Approximate Query Processing: No Silver Bullet

[...]

Surajit Chaudhuri¹, Bolin Ding¹, Srikanth Kandula¹•Institutions (1)

9 May 2017

TL;DR: This paper reflects on the state of the art of Approximate Query Processing, and discusses two promising avenues to pursue towards integrating Approximates Query Processing into data platforms.

...read moreread less

Abstract: In this paper, we reflect on the state of the art of Approximate Query Processing. Although much technical progress has been made in this area of research, we are yet to see its impact on products and services. We discuss two promising avenues to pursue towards integrating Approximate Query Processing into data platforms.

...read moreread less

185 citations

Proceedings Article•10.1145/3077136.3080831•

Relevance-based Word Embedding

[...]

Hamed Zamani¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

7 Aug 2017

TL;DR: Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding model, such as word2vec and GloVe.

...read moreread less

Abstract: Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.

...read moreread less

182 citations

Proceedings Article•10.1145/3132847.3133010•

Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

[...]

Mostafa Dehghani¹, Sascha Rothe², Enrique Alfonseca², Pascal Fleury²•Institutions (2)

University of Amsterdam¹, Google²

6 Nov 2017

TL;DR: A customized sequence-to-sequence model for session-based query suggestion that employs a query-aware attention mechanism to capture the structure of the session context and outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.

...read moreread less

Abstract: Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware attention mechanism to capture the structure of the session context. is enables us to control the scope of the session from which we infer the suggested next query, which helps not only handle the noisy data but also automatically detect session boundaries. Furthermore, we observe that, based on the user query reformulation behavior, within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary. We therefore empower the decoder of our model to access the source words from the session context during decoding by incorporating a copy mechanism. Moreover, we propose evaluation metrics to assess the quality of the generative models for query suggestion. We conduct an extensive set of experiments and analysis. e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.

...read moreread less

128 citations

Posted Content•

Relevance-based Word Embedding

[...]

Hamed Zamani¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

09 May 2017-arXiv: Information Retrieval

TL;DR: This article proposed relevance-based word embedding models that learn word representations based on query-document relevance information and classify each term as belonging to the relevant or non-relevant class for each query.

...read moreread less

115 citations

Proceedings Article•10.1145/3035918.3064013•

Database Learning: Toward a Database that Becomes Smarter Every Time

[...]

Yongjoo Park¹, Ahmad Shahab Tajik¹, Michael Cafarella¹, Barzan Mozafari¹•Institutions (1)

University of Michigan¹

9 May 2017

TL;DR: The principle of maximum entropy is exploited to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations and which lead to increasingly faster response times for future queries.

...read moreread less

Abstract: In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that database learning supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.

...read moreread less

87 citations

Journal Article•10.1007/S00521-016-2207-X•

A new fuzzy logic-based query expansion model for efficient information retrieval using relevance feedback approach

[...]

Jagendra Singh¹, Aditi Sharan¹•Institutions (1)

Jawaharlal Nehru University¹

01 Sep 2017-Neural Computing and Applications

TL;DR: This paper presents a new method for QE based on fuzzy logic considering the top-retrieved document as relevance feedback documents for mining additional QE terms and increases the precision rates and the recall rates of information retrieval systems for dealing with document retrieval.

...read moreread less

Abstract: Efficient query expansion (QE) terms selection methods are really very important for improving the accuracy and efficiency of the system by removing the irrelevant and redundant terms from the top-retrieved feedback documents corpus with respect to a user query. Each individual QE term selection method has its weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, we present a new method for QE based on fuzzy logic considering the top-retrieved document as relevance feedback documents for mining additional QE terms. Different QE terms selection methods calculate the degrees of importance of all unique terms of top-retrieved documents collection for mining additional expansion terms. These methods give different relevance scores for each term. The proposed method combines different weights of each term by using fuzzy rules to infer the weights of the additional query terms. Then, the weights of the additional query terms and the weights of the original query terms are used to form the new query vector, and we use this new query vector to retrieve documents. All the experiments are performed on TREC and FIRE benchmark datasets. The proposed QE method increases the precision rates and the recall rates of information retrieval systems for dealing with document retrieval. It gets a significant higher average recall rate, average precision rate and F measure on both datasets.

...read moreread less

72 citations

Journal Article•10.1007/S10115-016-0990-4•

A survey of query result diversification

[...]

Kaiping Zheng¹, Hongzhi Wang¹, Zhixin Qi¹, Jianzhong Li¹, Hong Gao¹ - Show less +1 more•Institutions (1)

Harbin Institute of Technology¹

01 Apr 2017-Knowledge and Information Systems

TL;DR: This survey aims to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversify systems.

...read moreread less

Abstract: Nowadays, in information systems such as web search engines and databases, diversity is becoming increasingly essential and getting more and more attention for improving users' satisfaction. In this sense, query result diversification is of vital importance and well worth researching. Some issues such as the definition of diversification and efficient diverse query processing are more challenging to handle in information systems. Many researchers have focused on various dimensions of diversify problem. In this survey, we aim to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversification systems. We also propose some open research directions, which are challenging and have not been explored up till now, to improve the quality of query results.

...read moreread less

67 citations

Proceedings Article•10.1145/3035918.3064017•

QIRANA: A Framework for Scalable Query Pricing

[...]

Shaleen Deep¹, Paraschos Koutris¹•Institutions (1)

University of Wisconsin-Madison¹

9 May 2017

TL;DR: This work presents a novel pricing system, called QIRANA, that performs query-based data pricing for a large class of SQL queries (including aggregation) in real time, and provides prices with formal guarantees.

...read moreread less

Abstract: Users are increasingly engaging in buying and selling data over the web. Facilitated by the proliferation of online marketplaces that bring such users together, data brokers need to serve requests where they provide results for user queries over the underlying datasets, and price them fairly according to the information disclosed by the query. In this work, we present a novel pricing system, called QIRANA, that performs query-based data pricing for a large class of SQL queries (including aggregation) in real time. QIRANA provides prices with formal guarantees: for example, it avoids prices that create arbitrage opportunities. Our framework also allows flexible pricing, by allowing the data seller to choose from a variety of pricing functions, as well as specify relation and attribute-level parameters that control the price of queries and assign different value to different portions of the data. We test QIRANA on a variety of real-world datasets and query workloads, and we show that it can efficiently compute the prices for queries over large-scale data.

...read moreread less

60 citations

Journal Article•10.1016/J.COSE.2016.11.013•

Efficient k-NN query over encrypted data in cloud with limited key-disclosure and offline data owner

[...]

Lu Zhou¹, Lu Zhou², Youwen Zhu¹, Youwen Zhu³, Aniello Castiglione⁴ - Show less +1 more•Institutions (4)

Nanjing University of Aeronautics and Astronautics¹, Shandong University², Nanjing University³, University of Salerno⁴

01 Aug 2017-Computers & Security

TL;DR: This paper proposes a new scheme to perform k -NN query over encrypted data in cloud while protecting the privacy of both data owner and query users from cloud, and presents a new scalar product protocol for gaining the properties.

...read moreread less

58 citations

Journal Article•10.1109/TKDE.2017.2668419•

Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Data

[...]

Dong Zhou¹, Xuan Wu¹, Wenyu Zhao¹, Séamus Lawless², Jianxun Liu¹ - Show less +1 more•Institutions (2)

Hunan University of Science and Technology¹, Trinity College, Dublin²

01 Jul 2017-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes a novel model to construct enriched user profiles with the help of an external corpus for personalized query expansion, and builds two novel query expansion techniques based on topical weights-enhanced word embeddings and topical relevance between the query and the terms inside a user profile.

...read moreread less

Abstract: Query expansion has been widely adopted in Web search as a way of tackling the ambiguity of queries. Personalized search utilizing folksonomy data has demonstrated an extreme vocabulary mismatch problem that requires even more effective query expansion methods. Co-occurrence statistics, tag-tag relationships, and semantic matching approaches are among those favored by previous research. However, user profiles which only contain a user's past annotation information may not be enough to support the selection of expansion terms, especially for users with limited previous activity with the system. We propose a novel model to construct enriched user profiles with the help of an external corpus for personalized query expansion. Our model integrates the current state-of-the-art text representation learning framework, known as word embeddings, with topic models in two groups of pseudo-aligned documents. Based on user profiles, we build two novel query expansion techniques. These two techniques are based on topical weights-enhanced word embeddings, and the topical relevance between the query and the terms inside a user profile, respectively. The results of an in-depth experimental evaluation, performed on two real-world datasets using different external corpora, show that our approach outperforms traditional techniques, including existing non-personalized and personalized query expansion methods.

...read moreread less

46 citations

Journal Article•10.1109/TIFS.2017.2721221•

Privacy-Preserving Similarity Joins Over Encrypted Data

[...]

Xingliang Yuan¹, Xinyu Wang¹, Cong Wang¹, Chenyun Yu¹, Sarana Nutanong¹ - Show less +1 more•Institutions (1)

City University of Hong Kong¹

28 Jun 2017-IEEE Transactions on Information Forensics and Security

TL;DR: This paper investigates privacy-preserving similarity join queries, a pivotal primitive of similarity search that finds pairwise similar data points across two data sets, and formalizes the leakage functions in the context of similarity joins, and conducts rigorous security analysis.

...read moreread less

Abstract: Similarity search on high-dimensional data has been intensively studied for data processing and analytics. Despite its broad applicability, data security and privacy concerns along the trend of data outsourcing have not been fully addressed. In this paper, we investigate privacy-preserving similarity join queries, i.e., a pivotal primitive of similarity search that finds pairwise similar data points across two data sets. We start from locality-sensitive hashing and searchable symmetric encryption, i.e., the most practical techniques for similarity search and encrypted search, respectively. However, the immediate combination of two techniques discloses the distribution of the query set, which is exploitable to compromise the confidentiality of queries. To enhance the security, we propose the frequency hiding query scheme, which allows the server to see the flattened query distribution only. To improve the scalability, we further design the result sharing query scheme, which processes a small portion of query points and shares the results with other nearby points. Besides, we set up a strict constraint to carefully select query points to achieve “as-strong-as-possible” guarantees. We formalize the leakage functions in the context of similarity joins, and conduct rigorous security analysis. We implement and evaluate the proposed query schemes on Azure cloud. Experimental results indicate that they have different tradeoffs on security, efficiency, and accuracy, which can flexibly be used for different deployment scenarios.

...read moreread less

Journal Article•10.1016/J.JCSS.2016.12.003•

A query privacy-enhanced and secure search scheme over encrypted data in cloud computing

[...]

Hui Yin¹, Hui Yin², Zheng Qin², Lu Ou², Keqin Li³ - Show less +1 more•Institutions (3)

Changsha University¹, Hunan University², State University of New York System³

01 Dec 2017-Journal of Computer and System Sciences

TL;DR: This work proposes a privacy-enhanced search scheme by allowing the data user to generate random query trapdoor every time, and uses Bloom filter and bilinear pairing operation to construct secure index for each data file, which enables the cloud to perform search without obtaining any useful information.

...read moreread less

Journal Article•10.1007/S00500-015-1881-4•

Query-based multi-documents summarization using linguistic knowledge and content word expansion

[...]

Asad Abdi¹, Norisma Idris¹, Rasim M. Alguliyev², Ramiz M. Aliguliyev²•Institutions (2)

Information Technology University¹, Azerbaijan National Academy of Sciences²

1 Apr 2017

TL;DR: A query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced and demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.

...read moreread less

Abstract: In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users' requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.

...read moreread less

Proceedings Article•10.1109/ICDE.2017.97•

Reverse Top-k Geo-Social Keyword Queries in Road Networks

[...]

Jingwen Zhao¹, Yunjun Gao¹, Gang Chen¹, Christian S. Jensen, Rui Chen¹, Deng Cai¹ - Show less +2 more•Institutions (1)

Zhejiang University¹

19 Apr 2017

TL;DR: This paper proposes a hybrid index, the GIM-tree, which indexes locations, keywords, and social information of geo-tagged users and objects, and then presents efficient RkGSK query processing algorithms that exploit several pruning strategies.

...read moreread less

Abstract: Identifying prospective customers is an important aspect of marketing research In this paper, we provide support for a new type of query, the Reverse Top-k Geo-Social Keyword (RkGSK) query This query takes into account spatial, textual, and social information, and finds prospective customers for geotagged objects As an example, a restaurant manager might apply the query to find prospective customers To address this, we propose a hybrid index, the GIM-tree, which indexes locations, keywords, and social information of geo-tagged users and objects, and then, using the GIM-tree, we present efficient RkGSK query processing algorithms that exploit several pruning strategies The effectiveness of RkGSK retrieval is characterized via a case study, and extensive experiments using real datasets offer insight into the efficiency of the proposed index and algorithms

...read moreread less

Journal Article•10.1109/ACCESS.2017.2712744•

A Web Service Discovery Approach Based on Common Topic Groups Extraction

[...]

Jian Wang¹, Panpan Gao¹, Yutao Ma¹, Keqing He¹, Patrick C. K. Hung² - Show less +1 more•Institutions (2)

Wuhan University¹, University of Ontario Institute of Technology²

06 Jun 2017-IEEE Access

TL;DR: A novel Web service discovery approach based on topic models is presented that can maintain the performance of service discovery at an elevated level by greatly decreasing the number of candidate Web services, thus leading to faster response time.

...read moreread less

Abstract: Web services have attracted much attention from distributed application designers and developers because of their roles in abstraction and interoperability among heterogeneous software systems, and a growing number of distributed software applications have been published as Web services on the Internet. Faced with the increasing numbers of Web services and service users, researchers in the services computing field have attempted to address a challenging issue, i.e., how to quickly find the suitable ones according to user queries. Many previous studies have been reported towards this direction. In this paper, a novel Web service discovery approach based on topic models is presented. The proposed approach mines common topic groups from the service-topic distribution matrix generated by topic modeling, and the extracted common topic groups can then be leveraged to match user queries to relevant Web services, so as to make a better trade-off between the accuracy of service discovery and the number of candidate Web services. Experiment results conducted on two publicly-available data sets demonstrate that, compared with several widely used approaches, the proposed approach can maintain the performance of service discovery at an elevated level by greatly decreasing the number of candidate Web services, thus leading to faster response time.

...read moreread less

Proceedings Article•10.1145/3018661.3018678•

Semantic-aware Query Processing for Activity Trajectories

[...]

Huiwen Liu¹, Jiajie Xu¹, Kai Zheng¹, Chengfei Liu, Lan Du², Xian Wu¹ - Show less +2 more•Institutions (2)

Soochow University (Suzhou)¹, Monash University²

2 Feb 2017

TL;DR: This paper proposes a novel trajectory query that not only considers the spatio-temporal closeness of trajectories but also leverages probabilistic topic modelling to capture the semantic relevance of the activities between data and query.

...read moreread less

Abstract: Nowadays, users of social networks like tweets and weibo have generated massive geo-tagged records, and these records reveal their activities in the physical world together with spatio-temporal dynamics. Existing trajectory data management studies mainly focus on analyzing the spatio-temporal properties of trajectories, while leaving the understanding of their activities largely untouched. In this paper, we incorporate the semantic analysis of the activity information embedded in trajectories into query modelling and processing, with the aim of providing end users more accurate and meaningful trip recommendations. To this end, we propose a novel trajectory query that not only considers the spatio-temporal closeness but also, more importantly, leverages probabilistic topic modelling to capture the semantic relevance of the activities between data and query. To support efficient query processing, we design a novel hybrid index structure, namely ST-tree, to organize the trajectory points hierarchically, which enables us to prune the search space in spatial and topic dimensions simultaneously. The experimental results on real datasets demonstrate the efficiency and scalability of the proposed index structure and search algorithms.

...read moreread less

Journal Article•10.1002/ASI.23735•

Behavior-based personalization in web search

[...]

Fei Cai¹, Shuaiqiang Wang², Maarten de Rijke³•Institutions (3)

National University of Defense Technology¹, University of Jyväskylä², University of Amsterdam³

1 Apr 2017

TL;DR: The experiments show that for personalized ranking, behavioral information helps to improve retrieval effectiveness; and given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user‐dependent adaptive weight outperforms any combination with a fixed weight.

...read moreread less

Abstract: Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we i investigate the impact of distributions of users and queries on document reranking; ii estimate the relevance of a document for a query at 2 levels, at the query-level and at the word-level, to alleviate the problem of sparseness; and iii perform an experimental evaluation both for users seen during the training period and for users not seen during training. For the latter, we explore the use of information from similar users who have been seen during the training period. We use the dwell time on clicked documents to estimate a document's relevance to a query, and perform Bayesian probabilistic matrix factorization to generate a relevance distribution of a document over queries. Our experiments show that: i for personalized ranking, behavioral information helps to improve retrieval effectiveness; and ii given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user-dependent adaptive weight outperforms any combination with a fixed weight.

...read moreread less

Posted Content•

Task-Oriented Query Reformulation with Reinforcement Learning

[...]

Rodrigo Nogueira¹, Kyunghyun Cho²•Institutions (2)

New York University¹, Microsoft²

15 Apr 2017-arXiv: Information Retrieval

TL;DR: In this article, a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned is introduced. But the results are often far from satisfactory.

...read moreread less

Abstract: Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned. We train this neural network with reinforcement learning. The actions correspond to selecting terms to build a reformulated query, and the reward is the document recall. We evaluate our approach on three datasets against strong baselines and show a relative improvement of 5-20% in terms of recall. Furthermore, we present a simple method to estimate a conservative upper-bound performance of a model in a particular environment and verify that there is still large room for improvements.

...read moreread less

Journal Article•10.1016/J.IPL.2016.10.008•

Group-based collective keyword querying in road networks

[...]

Sen Su¹, Sen Zhao¹, Xiang Cheng¹, Rong Bi¹, Xin Cao, Jie Wang² - Show less +2 more•Institutions (2)

Beijing University of Posts and Telecommunications¹, University of Massachusetts Lowell²

01 Feb 2017-Information Processing Letters

TL;DR: This paper develops a series of query processing algorithms for answering the GBCK query, which aims to find a region containing a set of POIs that covers all the query keywords and these POIs areclose to the group of users and are close to each other.

...read moreread less

Journal Article•10.1016/J.INS.2016.10.033•

Level-aware collective spatial keyword queries

[...]

Pengfei Zhang¹, Huaizhong Lin¹, Bin Yao², Dongming Lu¹•Institutions (2)

Zhejiang University¹, Shanghai Jiao Tong University²

01 Feb 2017-Information Sciences

TL;DR: It is proved the LCSK query is NP-hard, and the exact algorithm as well as approximate algorithm with provable approximation bound to this problem are devised.

...read moreread less

Journal Article•10.1016/J.WEBSEM.2016.12.001•

Decomposing federated queries in presence of replicated fragments

[...]

Gabriela Montoya, Hala Skaf-Molli, Pascal Molli¹, Maria-Esther Vidal²•Institutions (2)

University of Nantes¹, Simón Bolívar University²

01 Jan 2017-Journal of Web Semantics

TL;DR: A replication-aware framework named LILAC, sparqL query decomposItion against federations of repLicAted data sourCes, that relies on replicated fragment descriptions to accurately identify sources that provide replicated data is proposed.

...read moreread less

Journal Article•10.3233/SW-150206•

Flexible Query Processing for SPARQL

[...]

Riccardo Frosini¹, Andrea Calì², Andrea Calì¹, Alexandra Poulovassilis¹, Peter T. Wood¹ - Show less +1 more•Institutions (2)

Birkbeck, University of London¹, University of Oxford²

01 Jan 2017-Sprachwissenschaft

TL;DR: This paper presents query processing algorithms for a fragment of SPARQL 1.1 incorporating regular path queries (property path queries), extended with query approximation and relaxation operators, and formally shows the soundness, completeness and termination properties of the query rewriting algorithm.

...read moreread less

Abstract: Flexible querying techniques can enhance users' access to complex, heterogeneous datasets in settings such as Linked Data, where the user may not always know how a query should be formulated in order to retrieve the desired answers. This paper presents query processing algorithms for a fragment of SPARQL 1.1 incorporating regular path queries (property path queries), extended with query approximation and relaxation operators. Our flexible query processing approach is based on query rewriting and returns answers incrementally according to their ``distance'' from the exact form of the query. We formally show the soundness, completeness and termination properties of our query rewriting algorithm. We also present empirical results that show promising query processing performance for the extended language.

...read moreread less

Posted Content•10.7287/PEERJ.PREPRINTS.3186V1•

Improved query reformulation for concept location using CodeRank and document structures

[...]

Mohammad Masudur Rahman¹, Chanchal K. Roy¹•Institutions (1)

University of Saskatchewan¹

30 Oct 2017

TL;DR: A novel technique is proposed --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning.

...read moreread less

Abstract: During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.

...read moreread less

Book•10.1007/978-3-319-49493-7•

Reasoning Web: Logical Foundation of Knowledge Graph Construction and Query Answering

[...]

Jeff Z. Pan, Diego Calvanese, Thomas Eiter, Ian Horrocks, Michael Kifer, Fangzhen Lin, Yuting Zhao - Show less +3 more

1 Jan 2017

Proceedings Article•10.5441/002/EDBT.2017.04•

Subgraph Querying with Parallel Use of Query Rewritings and Alternative Algorithms.

[...]

Foteini Katsarou¹, Nikos Ntarmos¹, Peter Triantafillou¹•Institutions (1)

University of Glasgow¹

21 Mar 2017

TL;DR: The central idea is to employ parallelism in a novel way, whereby parallel matching/decision attempts are initiated, each using a query rewriting and/or an alternate algorithm, which is shown to be highly beneficial across algorithms and datasets.

...read moreread less

Abstract: Subgraph queries are central to graph analytics and graph DBs. We analyze this problem and present key novel discoveries and observations on the nature of the problem which hold across query sizes, datasets, and top-performing algorithms. Firstly, we show that algorithms (for both the decision and matching versions of the problem) suffer from straggler queries, which dominate query workload times. As related research caps query times not reporting results for queries exceeding the cap, this can lead to erroneous conclusions of the methods’ relative performance. Secondly, we study and show the dramatic effect that isomorphic graph queries can have on query times. Thirdly, we show that for each query, isomorphic queries based on proposed query rewritings can introduce large performance benefits. Fourthly, that straggler queries are largely algorithm-specific: many challenging queries to one algorithm can be executed effi- ciently by another. Finally, the above discoveries naturally lead to the derivation of a novel framework for subgraph query processing. The central idea is to employ parallelism in a novel way, whereby parallel matching/decision attempts are initiated, each using a query rewriting and/or an alternate algorithm. The framework is shown to be highly beneficial across algorithms and datasets.

...read moreread less

Posted Content•

Diversity driven Attention Model for Query-based Abstractive Summarization

[...]

Preksha Nema¹, Mitesh M. Khapra¹, Anirban Laha², Balaraman Ravindran¹•Institutions (2)

Indian Institute of Technology Madras¹, IBM²

26 Apr 2017-arXiv: Computation and Language

TL;DR: This work proposes a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions: a query attention model which learns to focus on different portions of the query at different time steps and a new diversity based Attention model which aims to alleviate the problem of repeating phrases in the summary.

...read moreread less

Abstract: ive summarization aims to generate a shorter version of the document covering all the salient points in a compact and coherent fashion. On the other hand, query-based summarization highlights those points that are relevant in the context of a given query. The encode-attend-decode paradigm has achieved notable success in machine translation, extractive summarization, dialog systems, etc. But it suffers from the drawback of generation of repeated phrases. In this work we propose a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions (i) a query attention model (in addition to document attention model) which learns to focus on different portions of the query at different time steps (instead of using a static representation for the query) and (ii) a new diversity based attention model which aims to alleviate the problem of repeating phrases in the summary. In order to enable the testing of this model we introduce a new query-based summarization dataset building on debatepedia. Our experiments show that with these two additions the proposed model clearly outperforms vanilla encode-attend-decode models with a gain of 28% (absolute) in ROUGE-L scores.

...read moreread less

Journal Article•10.1007/S10115-016-0952-X•

Context-aware query expansion method using Language Models and Latent Semantic Analyses

[...]

Btihal El Ghali¹, Abderrahim El Qadi•Institutions (1)

Mohammed V University¹

01 Mar 2017-Knowledge and Information Systems

TL;DR: This paper used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents, and applied a query expansion approach based on thequery context and the Latent Semantic Analyses method.

...read moreread less

Abstract: One of the key difficulties for users in information retrieval is to formulate appropriate queries to submit to the search engine. In this paper, we propose an approach to enrich the user's queries by additional context. We used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents. Then, we applied a query expansion approach based on the query context and the Latent Semantic Analyses method. Using a web test collection, we tested our approach on short and long queries. We varied the number of recommended queries and the number of expansion terms to specify the appropriate parameters for the proposed approach. Experimental results show that the proposed approach improves the effectiveness of the information retrieval system by 19.23 % for short queries and 52.94 % for long queries according to the retrieval results using the original users' queries.

...read moreread less

Proceedings Article•10.1145/3077136.3080691•

Translation of Natural Language Query Into Keyword Query Using a RNN Encoder-Decoder

[...]

Hyun-Je Song¹, A-Yeong Kim², Seong-Bae Park²•Institutions (2)

Naver Corporation¹, Kyungpook National University²

7 Aug 2017

TL;DR: A novel method to translate anatural language query into a keyword query relevant to the natural language query for retrieving better search results without change of the engines is proposed.

...read moreread less

Abstract: The number of natural language queries submitted to search engines is increasing as search environments get diversified. However, legacy search engines are still optimized for short keyword queries. Thus, the use of natural language queries at legacy search engines degrades the retrieval performance of the engines. This paper proposes a novel method to translate a natural language query into a keyword query relevant to the natural language query for retrieving better search results without change of the engines. The proposed method formulates the translation as a generation task. That is, the method generates a keyword query from a natural language query by preserving the semantics of the natural language query. A recurrent neural network encoder-decoder architecture is adopted as a generator of keyword queries from natural language queries. In addition, an attention mechanism is also used to cope with long natural language queries.

...read moreread less

Journal Article•10.1007/S11280-016-0415-Z•

A semantic based Web page classification strategy using multi-layered domain ontology

[...]

Ahmed I. Saleh¹, Mohammed F. Al Rahmawy¹, Arwa E. Abulwafa¹•Institutions (1)

Mansoura University¹

01 Sep 2017-World Wide Web

TL;DR: This paper introduces a novel strategy for vertical Web page classification, which is called Classification using Multi-layered Domain Ontology (CMDO), which employs several Web mining techniques, and depends mainly on proposed multi-layering domain ontology.

...read moreread less

Abstract: World Wide Web is a continuously growing giant, and within the next few years, Web contents will surely increase tremendously Hence, there is a great requirement to have algorithms that could accurately classify Web pages Automatic Web page classification is significantly different from traditional text classification because of the presence of additional information, provided by the HTML structure Recently, several techniques have been arisen from combinations of artificial intelligence and statistical approaches However, it is not a simple matter to find an optimal classification technique for Web pages This paper introduces a novel strategy for vertical Web page classification, which is called Classification using Multi-layered Domain Ontology (CMDO) It employs several Web mining techniques, and depends mainly on proposed multi-layered domain ontology In order to promote the classification accuracy, CMDO implies a distiller to reject pages related to other domains CMDO also employs a novel classification technique, which is called Graph Based Classification (GBC) The proposed GBC has pioneering features that other techniques do not have, such as outlier rejection and pruning Experimental results have shown that CMDO outperforms recent techniques as it introduces better precision, recall, and classification accuracy

...read moreread less

Journal Article•10.1007/S10916-016-0668-1•

Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval

[...]

Ilyes Khennak, Habiba Drias

01 Feb 2017-Journal of Medical Systems

TL;DR: An original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field, using Bat Al algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity.

...read moreread less

Abstract: With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.

...read moreread less

...

Expand