Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Web query classification
  4. 2017
  1. Home
  2. Topics
  3. Web query classification
  4. 2017
Showing papers on "Web query classification published in 2017"
Proceedings Article•10.1145/3035918.3056097•
Approximate Query Processing: No Silver Bullet

[...]

Surajit Chaudhuri1, Bolin Ding1, Srikanth Kandula1•
Microsoft1
9 May 2017
TL;DR: This paper reflects on the state of the art of Approximate Query Processing, and discusses two promising avenues to pursue towards integrating Approximates Query Processing into data platforms.
Abstract: In this paper, we reflect on the state of the art of Approximate Query Processing. Although much technical progress has been made in this area of research, we are yet to see its impact on products and services. We discuss two promising avenues to pursue towards integrating Approximate Query Processing into data platforms.

185 citations

Proceedings Article•10.1145/3077136.3080831•
Relevance-based Word Embedding

[...]

Hamed Zamani1, W. Bruce Croft1•
University of Massachusetts Amherst1
7 Aug 2017
TL;DR: Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding model, such as word2vec and GloVe.
Abstract: Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.

182 citations

Proceedings Article•10.1145/3132847.3133010•
Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

[...]

Mostafa Dehghani1, Sascha Rothe2, Enrique Alfonseca2, Pascal Fleury2•
University of Amsterdam1, Google2
6 Nov 2017
TL;DR: A customized sequence-to-sequence model for session-based query suggestion that employs a query-aware attention mechanism to capture the structure of the session context and outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.
Abstract: Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware attention mechanism to capture the structure of the session context. is enables us to control the scope of the session from which we infer the suggested next query, which helps not only handle the noisy data but also automatically detect session boundaries. Furthermore, we observe that, based on the user query reformulation behavior, within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary. We therefore empower the decoder of our model to access the source words from the session context during decoding by incorporating a copy mechanism. Moreover, we propose evaluation metrics to assess the quality of the generative models for query suggestion. We conduct an extensive set of experiments and analysis. e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.

128 citations

Posted Content•
Relevance-based Word Embedding

[...]

Hamed Zamani1, W. Bruce Croft1•
University of Massachusetts Amherst1
09 May 2017-arXiv: Information Retrieval
TL;DR: This article proposed relevance-based word embedding models that learn word representations based on query-document relevance information and classify each term as belonging to the relevant or non-relevant class for each query.
Abstract: Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.

115 citations

Proceedings Article•10.1145/3035918.3064013•
Database Learning: Toward a Database that Becomes Smarter Every Time

[...]

Yongjoo Park1, Ahmad Shahab Tajik1, Michael Cafarella1, Barzan Mozafari1•
University of Michigan1
9 May 2017
TL;DR: The principle of maximum entropy is exploited to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations and which lead to increasingly faster response times for future queries.
Abstract: In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that database learning supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.

87 citations

Journal Article•10.1007/S00521-016-2207-X•
A new fuzzy logic-based query expansion model for efficient information retrieval using relevance feedback approach

[...]

Jagendra Singh1, Aditi Sharan1•
Jawaharlal Nehru University1
01 Sep 2017-Neural Computing and Applications
TL;DR: This paper presents a new method for QE based on fuzzy logic considering the top-retrieved document as relevance feedback documents for mining additional QE terms and increases the precision rates and the recall rates of information retrieval systems for dealing with document retrieval.
Abstract: Efficient query expansion (QE) terms selection methods are really very important for improving the accuracy and efficiency of the system by removing the irrelevant and redundant terms from the top-retrieved feedback documents corpus with respect to a user query. Each individual QE term selection method has its weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, we present a new method for QE based on fuzzy logic considering the top-retrieved document as relevance feedback documents for mining additional QE terms. Different QE terms selection methods calculate the degrees of importance of all unique terms of top-retrieved documents collection for mining additional expansion terms. These methods give different relevance scores for each term. The proposed method combines different weights of each term by using fuzzy rules to infer the weights of the additional query terms. Then, the weights of the additional query terms and the weights of the original query terms are used to form the new query vector, and we use this new query vector to retrieve documents. All the experiments are performed on TREC and FIRE benchmark datasets. The proposed QE method increases the precision rates and the recall rates of information retrieval systems for dealing with document retrieval. It gets a significant higher average recall rate, average precision rate and F measure on both datasets.

72 citations

Journal Article•10.1007/S10115-016-0990-4•
A survey of query result diversification

[...]

Kaiping Zheng1, Hongzhi Wang1, Zhixin Qi1, Jianzhong Li1, Hong Gao1 •
Harbin Institute of Technology1
01 Apr 2017-Knowledge and Information Systems
TL;DR: This survey aims to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversify systems.
Abstract: Nowadays, in information systems such as web search engines and databases, diversity is becoming increasingly essential and getting more and more attention for improving users' satisfaction. In this sense, query result diversification is of vital importance and well worth researching. Some issues such as the definition of diversification and efficient diverse query processing are more challenging to handle in information systems. Many researchers have focused on various dimensions of diversify problem. In this survey, we aim to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversification systems. We also propose some open research directions, which are challenging and have not been explored up till now, to improve the quality of query results.

67 citations

Proceedings Article•10.1145/3035918.3064017•
QIRANA: A Framework for Scalable Query Pricing

[...]

Shaleen Deep1, Paraschos Koutris1•
University of Wisconsin-Madison1
9 May 2017
TL;DR: This work presents a novel pricing system, called QIRANA, that performs query-based data pricing for a large class of SQL queries (including aggregation) in real time, and provides prices with formal guarantees.
Abstract: Users are increasingly engaging in buying and selling data over the web. Facilitated by the proliferation of online marketplaces that bring such users together, data brokers need to serve requests where they provide results for user queries over the underlying datasets, and price them fairly according to the information disclosed by the query. In this work, we present a novel pricing system, called QIRANA, that performs query-based data pricing for a large class of SQL queries (including aggregation) in real time. QIRANA provides prices with formal guarantees: for example, it avoids prices that create arbitrage opportunities. Our framework also allows flexible pricing, by allowing the data seller to choose from a variety of pricing functions, as well as specify relation and attribute-level parameters that control the price of queries and assign different value to different portions of the data. We test QIRANA on a variety of real-world datasets and query workloads, and we show that it can efficiently compute the prices for queries over large-scale data.

60 citations

Journal Article•10.1016/J.COSE.2016.11.013•
Efficient k-NN query over encrypted data in cloud with limited key-disclosure and offline data owner

[...]

Lu Zhou1, Lu Zhou2, Youwen Zhu1, Youwen Zhu3, Aniello Castiglione4 •
Nanjing University of Aeronautics and Astronautics1, Shandong University2, Nanjing University3, University of Salerno4
01 Aug 2017-Computers & Security
TL;DR: This paper proposes a new scheme to perform k -NN query over encrypted data in cloud while protecting the privacy of both data owner and query users from cloud, and presents a new scalar product protocol for gaining the properties.

58 citations

Journal Article•10.1109/TKDE.2017.2668419•
Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Data

[...]

Dong Zhou1, Xuan Wu1, Wenyu Zhao1, Séamus Lawless2, Jianxun Liu1 •
Hunan University of Science and Technology1, Trinity College, Dublin2
01 Jul 2017-IEEE Transactions on Knowledge and Data Engineering
TL;DR: This work proposes a novel model to construct enriched user profiles with the help of an external corpus for personalized query expansion, and builds two novel query expansion techniques based on topical weights-enhanced word embeddings and topical relevance between the query and the terms inside a user profile.
Abstract: Query expansion has been widely adopted in Web search as a way of tackling the ambiguity of queries. Personalized search utilizing folksonomy data has demonstrated an extreme vocabulary mismatch problem that requires even more effective query expansion methods. Co-occurrence statistics, tag-tag relationships, and semantic matching approaches are among those favored by previous research. However, user profiles which only contain a user's past annotation information may not be enough to support the selection of expansion terms, especially for users with limited previous activity with the system. We propose a novel model to construct enriched user profiles with the help of an external corpus for personalized query expansion. Our model integrates the current state-of-the-art text representation learning framework, known as word embeddings, with topic models in two groups of pseudo-aligned documents. Based on user profiles, we build two novel query expansion techniques. These two techniques are based on topical weights-enhanced word embeddings, and the topical relevance between the query and the terms inside a user profile, respectively. The results of an in-depth experimental evaluation, performed on two real-world datasets using different external corpora, show that our approach outperforms traditional techniques, including existing non-personalized and personalized query expansion methods.

46 citations

Journal Article•10.1109/TIFS.2017.2721221•
Privacy-Preserving Similarity Joins Over Encrypted Data

[...]

Xingliang Yuan1, Xinyu Wang1, Cong Wang1, Chenyun Yu1, Sarana Nutanong1 •
City University of Hong Kong1
28 Jun 2017-IEEE Transactions on Information Forensics and Security
TL;DR: This paper investigates privacy-preserving similarity join queries, a pivotal primitive of similarity search that finds pairwise similar data points across two data sets, and formalizes the leakage functions in the context of similarity joins, and conducts rigorous security analysis.
Abstract: Similarity search on high-dimensional data has been intensively studied for data processing and analytics. Despite its broad applicability, data security and privacy concerns along the trend of data outsourcing have not been fully addressed. In this paper, we investigate privacy-preserving similarity join queries, i.e., a pivotal primitive of similarity search that finds pairwise similar data points across two data sets. We start from locality-sensitive hashing and searchable symmetric encryption, i.e., the most practical techniques for similarity search and encrypted search, respectively. However, the immediate combination of two techniques discloses the distribution of the query set, which is exploitable to compromise the confidentiality of queries. To enhance the security, we propose the frequency hiding query scheme, which allows the server to see the flattened query distribution only. To improve the scalability, we further design the result sharing query scheme, which processes a small portion of query points and shares the results with other nearby points. Besides, we set up a strict constraint to carefully select query points to achieve “as-strong-as-possible” guarantees. We formalize the leakage functions in the context of similarity joins, and conduct rigorous security analysis. We implement and evaluate the proposed query schemes on Azure cloud. Experimental results indicate that they have different tradeoffs on security, efficiency, and accuracy, which can flexibly be used for different deployment scenarios.
Journal Article•10.1016/J.JCSS.2016.12.003•
A query privacy-enhanced and secure search scheme over encrypted data in cloud computing

[...]

Hui Yin1, Hui Yin2, Zheng Qin2, Lu Ou2, Keqin Li3 •
Changsha University1, Hunan University2, State University of New York System3
01 Dec 2017-Journal of Computer and System Sciences
TL;DR: This work proposes a privacy-enhanced search scheme by allowing the data user to generate random query trapdoor every time, and uses Bloom filter and bilinear pairing operation to construct secure index for each data file, which enables the cloud to perform search without obtaining any useful information.
Journal Article•10.1007/S00500-015-1881-4•
Query-based multi-documents summarization using linguistic knowledge and content word expansion

[...]

Asad Abdi1, Norisma Idris1, Rasim M. Alguliyev2, Ramiz M. Aliguliyev2•
Information Technology University1, Azerbaijan National Academy of Sciences2
1 Apr 2017
TL;DR: A query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced and demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.
Abstract: In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users' requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.
Proceedings Article•10.1109/ICDE.2017.97•
Reverse Top-k Geo-Social Keyword Queries in Road Networks

[...]

Jingwen Zhao1, Yunjun Gao1, Gang Chen1, Christian S. Jensen, Rui Chen1, Deng Cai1 •
Zhejiang University1
19 Apr 2017
TL;DR: This paper proposes a hybrid index, the GIM-tree, which indexes locations, keywords, and social information of geo-tagged users and objects, and then presents efficient RkGSK query processing algorithms that exploit several pruning strategies.
Abstract: Identifying prospective customers is an important aspect of marketing research In this paper, we provide support for a new type of query, the Reverse Top-k Geo-Social Keyword (RkGSK) query This query takes into account spatial, textual, and social information, and finds prospective customers for geotagged objects As an example, a restaurant manager might apply the query to find prospective customers To address this, we propose a hybrid index, the GIM-tree, which indexes locations, keywords, and social information of geo-tagged users and objects, and then, using the GIM-tree, we present efficient RkGSK query processing algorithms that exploit several pruning strategies The effectiveness of RkGSK retrieval is characterized via a case study, and extensive experiments using real datasets offer insight into the efficiency of the proposed index and algorithms
Journal Article•10.1109/ACCESS.2017.2712744•
A Web Service Discovery Approach Based on Common Topic Groups Extraction

[...]

Jian Wang1, Panpan Gao1, Yutao Ma1, Keqing He1, Patrick C. K. Hung2 •
Wuhan University1, University of Ontario Institute of Technology2
06 Jun 2017-IEEE Access
TL;DR: A novel Web service discovery approach based on topic models is presented that can maintain the performance of service discovery at an elevated level by greatly decreasing the number of candidate Web services, thus leading to faster response time.
Abstract: Web services have attracted much attention from distributed application designers and developers because of their roles in abstraction and interoperability among heterogeneous software systems, and a growing number of distributed software applications have been published as Web services on the Internet. Faced with the increasing numbers of Web services and service users, researchers in the services computing field have attempted to address a challenging issue, i.e., how to quickly find the suitable ones according to user queries. Many previous studies have been reported towards this direction. In this paper, a novel Web service discovery approach based on topic models is presented. The proposed approach mines common topic groups from the service-topic distribution matrix generated by topic modeling, and the extracted common topic groups can then be leveraged to match user queries to relevant Web services, so as to make a better trade-off between the accuracy of service discovery and the number of candidate Web services. Experiment results conducted on two publicly-available data sets demonstrate that, compared with several widely used approaches, the proposed approach can maintain the performance of service discovery at an elevated level by greatly decreasing the number of candidate Web services, thus leading to faster response time.
Proceedings Article•10.1145/3018661.3018678•
Semantic-aware Query Processing for Activity Trajectories

[...]

Huiwen Liu1, Jiajie Xu1, Kai Zheng1, Chengfei Liu, Lan Du2, Xian Wu1 •
Soochow University (Suzhou)1, Monash University2
2 Feb 2017
TL;DR: This paper proposes a novel trajectory query that not only considers the spatio-temporal closeness of trajectories but also leverages probabilistic topic modelling to capture the semantic relevance of the activities between data and query.
Abstract: Nowadays, users of social networks like tweets and weibo have generated massive geo-tagged records, and these records reveal their activities in the physical world together with spatio-temporal dynamics. Existing trajectory data management studies mainly focus on analyzing the spatio-temporal properties of trajectories, while leaving the understanding of their activities largely untouched. In this paper, we incorporate the semantic analysis of the activity information embedded in trajectories into query modelling and processing, with the aim of providing end users more accurate and meaningful trip recommendations. To this end, we propose a novel trajectory query that not only considers the spatio-temporal closeness but also, more importantly, leverages probabilistic topic modelling to capture the semantic relevance of the activities between data and query. To support efficient query processing, we design a novel hybrid index structure, namely ST-tree, to organize the trajectory points hierarchically, which enables us to prune the search space in spatial and topic dimensions simultaneously. The experimental results on real datasets demonstrate the efficiency and scalability of the proposed index structure and search algorithms.
Journal Article•10.1002/ASI.23735•
Behavior-based personalization in web search

[...]

Fei Cai1, Shuaiqiang Wang2, Maarten de Rijke3•
National University of Defense Technology1, University of Jyväskylä2, University of Amsterdam3
1 Apr 2017
TL;DR: The experiments show that for personalized ranking, behavioral information helps to improve retrieval effectiveness; and given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user‐dependent adaptive weight outperforms any combination with a fixed weight.
Abstract: Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we i investigate the impact of distributions of users and queries on document reranking; ii estimate the relevance of a document for a query at 2 levels, at the query-level and at the word-level, to alleviate the problem of sparseness; and iii perform an experimental evaluation both for users seen during the training period and for users not seen during training. For the latter, we explore the use of information from similar users who have been seen during the training period. We use the dwell time on clicked documents to estimate a document's relevance to a query, and perform Bayesian probabilistic matrix factorization to generate a relevance distribution of a document over queries. Our experiments show that: i for personalized ranking, behavioral information helps to improve retrieval effectiveness; and ii given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user-dependent adaptive weight outperforms any combination with a fixed weight.
Posted Content•
Task-Oriented Query Reformulation with Reinforcement Learning

[...]

Rodrigo Nogueira1, Kyunghyun Cho2•
New York University1, Microsoft2
15 Apr 2017-arXiv: Information Retrieval
TL;DR: In this article, a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned is introduced. But the results are often far from satisfactory.
Abstract: Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned. We train this neural network with reinforcement learning. The actions correspond to selecting terms to build a reformulated query, and the reward is the document recall. We evaluate our approach on three datasets against strong baselines and show a relative improvement of 5-20% in terms of recall. Furthermore, we present a simple method to estimate a conservative upper-bound performance of a model in a particular environment and verify that there is still large room for improvements.
Journal Article•10.1016/J.IPL.2016.10.008•
Group-based collective keyword querying in road networks

[...]

Sen Su1, Sen Zhao1, Xiang Cheng1, Rong Bi1, Xin Cao, Jie Wang2 •
Beijing University of Posts and Telecommunications1, University of Massachusetts Lowell2
01 Feb 2017-Information Processing Letters
TL;DR: This paper develops a series of query processing algorithms for answering the GBCK query, which aims to find a region containing a set of POIs that covers all the query keywords and these POIs areclose to the group of users and are close to each other.
Journal Article•10.1016/J.INS.2016.10.033•
Level-aware collective spatial keyword queries

[...]

Pengfei Zhang1, Huaizhong Lin1, Bin Yao2, Dongming Lu1•
Zhejiang University1, Shanghai Jiao Tong University2
01 Feb 2017-Information Sciences
TL;DR: It is proved the LCSK query is NP-hard, and the exact algorithm as well as approximate algorithm with provable approximation bound to this problem are devised.
Journal Article•10.1016/J.WEBSEM.2016.12.001•
Decomposing federated queries in presence of replicated fragments

[...]

Gabriela Montoya, Hala Skaf-Molli, Pascal Molli1, Maria-Esther Vidal2•
University of Nantes1, Simón Bolívar University2
01 Jan 2017-Journal of Web Semantics
TL;DR: A replication-aware framework named LILAC, sparqL query decomposItion against federations of repLicAted data sourCes, that relies on replicated fragment descriptions to accurately identify sources that provide replicated data is proposed.
Journal Article•10.3233/SW-150206•
Flexible Query Processing for SPARQL

[...]

Riccardo Frosini1, Andrea Calì2, Andrea Calì1, Alexandra Poulovassilis1, Peter T. Wood1 •
Birkbeck, University of London1, University of Oxford2
01 Jan 2017-Sprachwissenschaft
TL;DR: This paper presents query processing algorithms for a fragment of SPARQL 1.1 incorporating regular path queries (property path queries), extended with query approximation and relaxation operators, and formally shows the soundness, completeness and termination properties of the query rewriting algorithm.
Abstract: Flexible querying techniques can enhance users' access to complex, heterogeneous datasets in settings such as Linked Data, where the user may not always know how a query should be formulated in order to retrieve the desired answers. This paper presents query processing algorithms for a fragment of SPARQL 1.1 incorporating regular path queries (property path queries), extended with query approximation and relaxation operators. Our flexible query processing approach is based on query rewriting and returns answers incrementally according to their ``distance'' from the exact form of the query. We formally show the soundness, completeness and termination properties of our query rewriting algorithm. We also present empirical results that show promising query processing performance for the extended language.
Posted Content•10.7287/PEERJ.PREPRINTS.3186V1•
Improved query reformulation for concept location using CodeRank and document structures

[...]

Mohammad Masudur Rahman1, Chanchal K. Roy1•
University of Saskatchewan1
30 Oct 2017
TL;DR: A novel technique is proposed --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning.
Abstract: During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.
Book•10.1007/978-3-319-49493-7•
Reasoning Web: Logical Foundation of Knowledge Graph Construction and Query Answering

[...]

Jeff Z. Pan, Diego Calvanese, Thomas Eiter, Ian Horrocks, Michael Kifer, Fangzhen Lin, Yuting Zhao 
1 Jan 2017
Proceedings Article•10.5441/002/EDBT.2017.04•
Subgraph Querying with Parallel Use of Query Rewritings and Alternative Algorithms.

[...]

Foteini Katsarou1, Nikos Ntarmos1, Peter Triantafillou1•
University of Glasgow1
21 Mar 2017
TL;DR: The central idea is to employ parallelism in a novel way, whereby parallel matching/decision attempts are initiated, each using a query rewriting and/or an alternate algorithm, which is shown to be highly beneficial across algorithms and datasets.
Abstract: Subgraph queries are central to graph analytics and graph DBs. We analyze this problem and present key novel discoveries and observations on the nature of the problem which hold across query sizes, datasets, and top-performing algorithms. Firstly, we show that algorithms (for both the decision and matching versions of the problem) suffer from straggler queries, which dominate query workload times. As related research caps query times not reporting results for queries exceeding the cap, this can lead to erroneous conclusions of the methods’ relative performance. Secondly, we study and show the dramatic effect that isomorphic graph queries can have on query times. Thirdly, we show that for each query, isomorphic queries based on proposed query rewritings can introduce large performance benefits. Fourthly, that straggler queries are largely algorithm-specific: many challenging queries to one algorithm can be executed effi- ciently by another. Finally, the above discoveries naturally lead to the derivation of a novel framework for subgraph query processing. The central idea is to employ parallelism in a novel way, whereby parallel matching/decision attempts are initiated, each using a query rewriting and/or an alternate algorithm. The framework is shown to be highly beneficial across algorithms and datasets.
Posted Content•
Diversity driven Attention Model for Query-based Abstractive Summarization

[...]

Preksha Nema1, Mitesh M. Khapra1, Anirban Laha2, Balaraman Ravindran1•
Indian Institute of Technology Madras1, IBM2
26 Apr 2017-arXiv: Computation and Language
TL;DR: This work proposes a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions: a query attention model which learns to focus on different portions of the query at different time steps and a new diversity based Attention model which aims to alleviate the problem of repeating phrases in the summary.
Abstract: ive summarization aims to generate a shorter version of the document covering all the salient points in a compact and coherent fashion. On the other hand, query-based summarization highlights those points that are relevant in the context of a given query. The encode-attend-decode paradigm has achieved notable success in machine translation, extractive summarization, dialog systems, etc. But it suffers from the drawback of generation of repeated phrases. In this work we propose a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions (i) a query attention model (in addition to document attention model) which learns to focus on different portions of the query at different time steps (instead of using a static representation for the query) and (ii) a new diversity based attention model which aims to alleviate the problem of repeating phrases in the summary. In order to enable the testing of this model we introduce a new query-based summarization dataset building on debatepedia. Our experiments show that with these two additions the proposed model clearly outperforms vanilla encode-attend-decode models with a gain of 28% (absolute) in ROUGE-L scores.
Journal Article•10.1007/S10115-016-0952-X•
Context-aware query expansion method using Language Models and Latent Semantic Analyses

[...]

Btihal El Ghali1, Abderrahim El Qadi•
Mohammed V University1
01 Mar 2017-Knowledge and Information Systems
TL;DR: This paper used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents, and applied a query expansion approach based on thequery context and the Latent Semantic Analyses method.
Abstract: One of the key difficulties for users in information retrieval is to formulate appropriate queries to submit to the search engine. In this paper, we propose an approach to enrich the user's queries by additional context. We used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents. Then, we applied a query expansion approach based on the query context and the Latent Semantic Analyses method. Using a web test collection, we tested our approach on short and long queries. We varied the number of recommended queries and the number of expansion terms to specify the appropriate parameters for the proposed approach. Experimental results show that the proposed approach improves the effectiveness of the information retrieval system by 19.23 % for short queries and 52.94 % for long queries according to the retrieval results using the original users' queries.
Proceedings Article•10.1145/3077136.3080691•
Translation of Natural Language Query Into Keyword Query Using a RNN Encoder-Decoder

[...]

Hyun-Je Song1, A-Yeong Kim2, Seong-Bae Park2•
Naver Corporation1, Kyungpook National University2
7 Aug 2017
TL;DR: A novel method to translate anatural language query into a keyword query relevant to the natural language query for retrieving better search results without change of the engines is proposed.
Abstract: The number of natural language queries submitted to search engines is increasing as search environments get diversified. However, legacy search engines are still optimized for short keyword queries. Thus, the use of natural language queries at legacy search engines degrades the retrieval performance of the engines. This paper proposes a novel method to translate a natural language query into a keyword query relevant to the natural language query for retrieving better search results without change of the engines. The proposed method formulates the translation as a generation task. That is, the method generates a keyword query from a natural language query by preserving the semantics of the natural language query. A recurrent neural network encoder-decoder architecture is adopted as a generator of keyword queries from natural language queries. In addition, an attention mechanism is also used to cope with long natural language queries.
Journal Article•10.1007/S11280-016-0415-Z•
A semantic based Web page classification strategy using multi-layered domain ontology

[...]

Ahmed I. Saleh1, Mohammed F. Al Rahmawy1, Arwa E. Abulwafa1•
Mansoura University1
01 Sep 2017-World Wide Web
TL;DR: This paper introduces a novel strategy for vertical Web page classification, which is called Classification using Multi-layered Domain Ontology (CMDO), which employs several Web mining techniques, and depends mainly on proposed multi-layering domain ontology.
Abstract: World Wide Web is a continuously growing giant, and within the next few years, Web contents will surely increase tremendously Hence, there is a great requirement to have algorithms that could accurately classify Web pages Automatic Web page classification is significantly different from traditional text classification because of the presence of additional information, provided by the HTML structure Recently, several techniques have been arisen from combinations of artificial intelligence and statistical approaches However, it is not a simple matter to find an optimal classification technique for Web pages This paper introduces a novel strategy for vertical Web page classification, which is called Classification using Multi-layered Domain Ontology (CMDO) It employs several Web mining techniques, and depends mainly on proposed multi-layered domain ontology In order to promote the classification accuracy, CMDO implies a distiller to reject pages related to other domains CMDO also employs a novel classification technique, which is called Graph Based Classification (GBC) The proposed GBC has pioneering features that other techniques do not have, such as outlier rejection and pruning Experimental results have shown that CMDO outperforms recent techniques as it introduces better precision, recall, and classification accuracy
Journal Article•10.1007/S10916-016-0668-1•
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval

[...]

Ilyes Khennak, Habiba Drias
01 Feb 2017-Journal of Medical Systems
TL;DR: An original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field, using Bat Al algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity.
Abstract: With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve