Top 15 papers published in the topic of Web query classification in 2020

Showing papers on "Web query classification published in 2020"

Journal Article•10.1016/J.MICPRO.2020.103097•

Efficient fuzzy based K-nearest neighbour technique for web services classification

[...]

C. Viji¹, J. Beschi Raja², R.S. Ponmagal³, S. T. Suganthi, P. Parthasarathi¹, Sanjeevi Pandiyan⁴ - Show less +2 more•Institutions (4)

Akshaya College of Engineering and Technology¹, College of Information Technology², SRM University³, Jiangnan University⁴

01 Jul 2020-Microprocessors and Microsystems

TL;DR: This paper proposes an improved fuzzy with KNN algorithm for effective web service classification to increase an outcome in the form of accuracy and performance measures.

...read moreread less

24 citations

Journal Article•10.1134/S0361768820050072•

Improving Efficiency of Web Application Firewall to Detect Code Injection Attacks with Random Forest Method and Analysis Attributes HTTP Request

[...]

Nguyen Manh Thang

03 Oct 2020-Programming and Computer Software

TL;DR: In this article, the authors propose a method for detecting network attacks at the application level using a web application firewall and apply effective algorithms in this firewall to train web application firewalls automatically for increasing his efficiency.

...read moreread less

Abstract: In the era of information technology, the use of computer technology for both work and personal use is growing rapidly with time. Unfortunately, with the increasing number and size of computer networks and systems, their vulnerability also increases. Protecting web applications of organizations is becoming increasingly relevant as most of the transactions are carried out over the Internet. Traditional security devices control attacks at the network level, but modern web attacks occur through the HTTP protocol at the application level. On the other hand, the attacks often come together. For example, a denial of service attack is used to hide code injection attacks. The system administrator spends a lot of time to keep the system running, but they may forget the code injection attacks. Therefore, the main task for system administrators is to detect network attacks at the application level using a web application firewall and apply effective algorithms in this firewall to train web application firewalls automatically for increasing his efficiency. The article introduces parameterization of the task for increasing the accuracy of query classification by the random forest method, thereby creating the basis for detecting attacks at the application level.

...read moreread less

17 citations

Proceedings Article•10.1145/3397271.3401320•

Query Classification with Multi-objective Backoff Optimization

[...]

Hang Yu¹, Lester Litchfield¹•Institutions (1)

Wellington Management Company¹

25 Jul 2020

TL;DR: This work is the first attempt to enhance QC with multi-objective optimization and is evaluated using the real-world search data of Trade Me that is the largest e-commerce platform in New Zealand, delivering superior solutions with flexible tuning to satisfy different users' demands.

...read moreread less

Abstract: E-commerce platforms greatly benefit from high-quality search that retrieves relevant search results in response to search terms. For the sake of search relevance, Query Classification (QC) has been widely adopted to make search engines robust against low text quality and complex category hierarchy. Generally, QC solutions categorize search queries and direct users to the suggested categories whereby the search results are then retrieved. In this way, the search scope is contextually constrained to increase search relevance. However, such operations might risk deteriorating e-commerce metrics when irrelevant categories are suggested. Thus, QC solutions are expected to demonstrate high accuracy. Unfortunately, existing QC methods mainly focus on the intrinsic performance of classifiers whereas fail to consider post-inference optimization that could further improve reliability. To fill up the research gap, we propose the Query Classification with Multi-objective Backoff (QCMB). The proposed solution consists of two steps: 1) hierarchical text classification that classifies search queries into multi-level categories; and 2) multi-objective backoff that substitutes potentially misclassified leaf categories with appropriate ancestors that optimize the trade-off between accuracy and depth. The proposed QCMB is evaluated using the real-world search data of Trade Me that is the largest e-commerce platform in New Zealand. Compared with the benchmarks, QCMB delivers superior solutions with flexible tuning to satisfy different users' demands. To the best of our knowledge, this work is the first attempt to enhance QC with multi-objective optimization.

...read moreread less

8 citations

Proceedings Article•10.1145/3398682.3399167•

A Framework for DSL-Based Query Classification Using Relational and Graph-Based Data Models

[...]

Peter K. Schwab¹, Maximilian S. Langohr¹, Klaus Meyer-Wegener¹•Institutions (1)

University of Erlangen-Nuremberg¹

14 Jun 2020

TL;DR: A framework for DSL-based SQL query classification according to data-privacy directives that automatically derives query meta-information (QMI) and provides interfaces for browsing and filtering queries based on this QMI.

...read moreread less

Abstract: In this paper, we demonstrate a framework for DSL-based SQL query classification according to data-privacy directives. Based on query-log analysis, this framework automatically derives query meta-information (QMI) and provides interfaces for browsing and filtering queries based on this QMI. Domain-specific policy rules enable automatic classification of queries concerning their access to personal data. The generic policy-rule definition based on the QMI covers many syntactical SQL variations. To optimize classification performance, our framework stores the QMI both in relational and graph-based databases (DBs). This case study compares the behavior of a relational DB with that of a graph-based DB with respect to a particular task, namely searching for the policy rules applicable to a given query. It turned out that both solutions have their benefits, so a hybrid solution has been chosen in the end.

...read moreread less

3 citations

Proceedings Article•10.1145/3400903.3401692•

We Know What You Did Last Session: Policy-Based Query Classification for Data-Privacy Compliance With the DataEconomist

[...]

Peter K. Schwab¹, Maximilian S. Langohr¹, Klaus Meyer-Wegener¹•Institutions (1)

University of Erlangen-Nuremberg¹

7 Jul 2020

TL;DR: The DataEconomist is explained, a framework for policy-based SQL query classification according to data-privacy directives that will enable privacy officers to define domain-specific compliance policy rules based on the graphical filter mechanisms.

...read moreread less

Abstract: This paper explains the demonstration of the DataEconomist, a framework for policy-based SQL query classification according to data-privacy directives. Our framework automatically derives query meta-information based on query-log analysis and provides user-friendly, graphical interfaces for browsing and filtering queries based on this meta-information. We aim to complement existing data-privacy approaches and enable privacy officers to define domain-specific compliance policy rules based on the graphical filter mechanisms. Policies automatically classify queries as compliant or non-compliant regarding their processing of personal data. During our demonstration, conference attendees assess our system in several scenarios. They filter queries based on various query meta-information, learn how to define compliance policies for automatic query classification without profound technical knowledge, and test this classification by formulating non-compliant queries.

...read moreread less

2 citations

Towards Evolutionary, Domain-Specific Query Classification Based on Policy Rules.

[...]

Peter K. Schwab, Klaus Meyer-Wegener

1 Jan 2020

TL;DR: This framework enables users to define domainspecific policy rules for automatic query classification based on the query metadata according to domain-specific, contextual attributes that can be defined evolutionary at runtime, together with the policy rules.

...read moreread less

Abstract: Many devices like smart sensors produce a vast amount of data that are still commonly stored in relational databases and are being processed using SQL queries. This data is only useable if it is processed in a fashion that results in applicable information for the users posing these queries. Thus, it can be very supportive for them to assess other queries that have already processed the targeted data. This is not a simple exercise, as SQL allows alias names and various syntactic structures to express equivalent queries. A manual assessment is also hard to accomplish due to the amount of qualified queries. We present a framework for evolutionary SQL query classification. Based on the analysis of query logs, query metadata like schema lineage and result statistics are automatically derived. Our framework enables users to define domainspecific policy rules for automatic query classification based on the query metadata. Classification is done according to domain-specific, contextual attributes that can be defined evolutionary at runtime, together with the policy rules. The classification results enrich the query metadata.

...read moreread less

1 citations

Book Chapter•10.1007/978-3-030-58334-7_1•

An Introduction to Query Understanding

[...]

Hongbo Deng¹, Yi Chang²•Institutions (2)

Alibaba Group¹, Jilin University²

1 Jan 2020

TL;DR: In this article, a systematic study of practices and theories for query understanding of search engines is presented, which can be categorized into three major classes: one class is to figure out what the searcher wants by extracting semantic meaning from the keywords, such as query classification, query tagging, and query intent understanding.

...read moreread less

Abstract: This book aims to present a systematic study of practices and theories for query understanding of search engines. The studies in this book can be categorized into three major classes. One class is to figure out what the searcher wants by extracting semantic meaning from the searcher’s keywords, such as query classification, query tagging, and query intent understanding. Another class is to analyze search queries and then translate them into an enhanced query that can produce better search results, such as query spelling correction, query rewriting. The third class is to assist users to refine or suggest queries so as to reduce users’ search effort and satisfy their information needs, such as query auto-completion and query suggestion. This chapter discusses organization, audience, and further reading for this book.

...read moreread less

1 citations

Book Chapter•10.1007/978-3-030-60259-8_21•

Characterizing Robotic and Organic Query in SPARQL Search Sessions

[...]

Xinyue Zhang¹, Meng Wang¹, Bingchen Zhao², Ruyang Liu¹, Jingyuan Zhang¹, Han Yang³ - Show less +2 more•Institutions (3)

Southeast University¹, Tongji University², Peking University³

12 Aug 2020

TL;DR: This paper defines and partition SPARQL queries into different sessions, design an algorithm to detect loop patterns, which is an important characteristic of robotic queries, in a given query session, and employs a pipeline method that leverages loop pattern features and query request frequency to distinguish the robotic and organic SParQL queries.

...read moreread less

Abstract: SPARQL, as one of the most powerful query languages over knowledge graphs, has gained significant popularity in recent years. A large amount of SPARQL query logs have become available and provided new research opportunities to discover user interests, understand query intentions, and model search behaviors. However, a significant portion of the queries to SPARQL endpoints on the Web are robotic queries that are generated by automated scripts. Detecting and separating these robotic queries from those organic ones issued by human users is crucial to deep usage analysis of knowledge graphs. In light of this, in this paper, we propose a novel method to identify SPARQL queries based on session-level query features. Specifically, we define and partition SPARQL queries into different sessions. Then, we design an algorithm to detect loop patterns, which is an important characteristic of robotic queries, in a given query session. Finally, we employ a pipeline method that leverages loop pattern features and query request frequency to distinguish the robotic and organic SPARQL queries. Differing from other machine learning based methods, the proposed method can identify the query types accurately without labelled data. We conduct extensive experiments on six real-world SPARQL query log datasets. The results demonstrate that our approach can distinguish robotic and organic queries effectively and only need \(7.63 \times 10^{-4}\) s on average to process a query.

...read moreread less

1 citations

Posted Content•

Query Understanding via Intent Description Generation

[...]

Ruqing Zhang¹, Jiafeng Guo¹, Yixing Fan¹, Yanyan Lan¹, Xueqi Cheng¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

25 Aug 2020-arXiv: Computation and Language

TL;DR: A novel Contrastive Generation model, namely CtrsGen for short, is proposed to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query to address query understanding.

...read moreread less

Abstract: Query understanding is a fundamental problem in information retrieval (IR), which has attracted continuous attention through the past decades. Many different tasks have been proposed for understanding users' search queries, e.g., query classification or query clustering. However, it is not that precise to understand a search query at the intent class/cluster level due to the loss of many detailed information. As we may find in many benchmark datasets, e.g., TREC and SemEval, queries are often associated with a detailed description provided by human annotators which clearly describes its intent to help evaluate the relevance of the documents. If a system could automatically generate a detailed and precise intent description for a search query, like human annotators, that would indicate much better query understanding has been achieved. In this paper, therefore, we propose a novel Query-to-Intent-Description (Q2ID) task for query understanding. Unlike those existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description based on both relevant and irrelevant documents of a given query. To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query. We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task. We discuss the potential usage of such Q2ID technique through an example application.

...read moreread less

Posted Content•

Scalable Top-k Query on Information Networks with Hierarchical Inheritance Relations.

[...]

Fubao Wu¹, Lixin Gao•Institutions (1)

University of Massachusetts Amherst¹

01 Jun 2020-arXiv: Databases

TL;DR: This work proposes a graph query search algorithm by decomposing the original query graph into multiple star queries and applying a star query algorithm to each star query, which can effectively obtain more accurate results and competitive performances.

...read moreread less

Abstract: Graph query, pattern mining and knowledge discovery become challenging on large-scale heterogeneous information networks (HINs). State-of-the-art techniques involving path propagation mainly focus on the inference on nodes labels and neighborhood structures. However, entity links in the real world also contain rich hierarchical inheritance relations. For example, the vulnerability of a product version is likely to be inherited from its older versions. Taking advantage of the hierarchical inheritances can potentially improve the quality of query results. Motivated by this, we explore hierarchical inheritance relations between entities and formulate the problem of graph query on HINs with hierarchical inheritance relations. We propose a graph query search algorithm by decomposing the original query graph into multiple star queries and apply a star query algorithm to each star query. Further candidates from each star query result are then constructed for final top-k query answers to the original query. To efficiently obtain the graph query result from a large-scale HIN, we design a bound-based pruning technique by using uniform cost search to prune search spaces. We implement our algorithm in GraphX to test the effectiveness and efficiency on synthetic and real-world datasets. Compared with two common graph query algorithms, our algorithm can effectively obtain more accurate results and competitive performances.

...read moreread less

Patent•

Query classification method for database intrusion detection

[...]

Jung Il Hoon, Cho Sung Bae, Yun Ho Sang

24 Feb 2020

TL;DR: In this paper, the authors proposed a method for detecting a database intrusion caused by an insider attack by combining a convolutional neural network (CNN) and a genetic algorithm (GA) to monitor an intrusion.

...read moreread less

Abstract: The present invention relates to a technology for detecting a database intrusion caused by an insider attack. Provided is the technology for classifying queries using intelligent technologies to monitor an intrusion. In one embodiment of the present invention, the queries are classified into a hybrid structure by combining a convolutional neural network (CNN) and a genetic algorithm (GA) to monitor an intrusion.

...read moreread less

Patent•

System and method for templatizing conversations with an agent and user-originated follow-ups

[...]

Gnanasambandam Nathan, Anderson Mark Henry

16 Apr 2020

TL;DR: In this paper, a method for answering a user-generated natural language medical information query based on a diagnostic conversational template was proposed, including: receiving a medical information queries at an artificial intelligence-based diagnostic conversation agent; responsive to content of the query, selecting a diagnostic fact variable set relevant to generating an answer for the query by classifying the query into a domain-directed medical query classification associated with respective diagnostic fact variables sets; compiling user-specific medical fact variable values for respective medical fact variables of the diagnostic factvariable set, where the compiling further includes: extracting a first

...read moreread less

Abstract: A method for answering a user-generated natural language medical information query based on a diagnostic conversational template, including: receiving a medical information query at an artificial intelligence-based diagnostic conversation agent; responsive to content of the query, selecting a diagnostic fact variable set relevant to generating an answer for the query by classifying the query into a domain-directed medical query classification associated with respective diagnostic fact variable sets; compiling user-specific medical fact variable values for respective medical fact variables of the diagnostic fact variable set, where the compiling further includes: extracting a first set of user-specific medical fact variable values from a local user medical information profile associated with the query and requesting a second set of user-specific medical fact variable values through questions; and generating an answer in response to the query.

...read moreread less

Proceedings Article•10.1145/3366424.3382183•

Event-Related Query Classification with Deep Neural Networks

[...]

Sahaj Gandhi¹, Behrooz Mansouri¹, Ricardo Campos, Adam Jatowt²•Institutions (2)

Rochester Institute of Technology¹, Kyoto University²

20 Apr 2020

TL;DR: This paper proposes a novel model based on deep neural networks to classify event-related queries into four categories: periodic, aperiodic, one-time-only, and non-event, which uses only the time-series data of query frequencies.

...read moreread less

Abstract: Users tend to search over the Internet to get the most updated news when an event occurs. Search engines should then be capable of effectively retrieving relevant documents for event-related queries. As the previous studies have shown, different retrieval models are needed for different types of events. Therefore, the first step for improving effectiveness is identifying the event-related queries and determining their types. In this paper, we propose a novel model based on deep neural networks to classify event-related queries into four categories: periodic, aperiodic, one-time-only, and non-event. The proposed model combines recurrent neural networks (by feeding two LSTM layers with query frequencies) and visual recognition models (by transforming time-series data from a 1D signal to a 2D image - later passed to a CNN model) for effective query type estimation. Worth noting is that our method uses only the time-series data of query frequencies, without the need to resort to any external sources such as contextual data, which makes it language and domain-independent with regards to the query issued. For evaluation, we build upon the previous datasets on event-related queries to create a new dataset that fits the purpose of our experiments. The obtained results show that our proposed model can achieve an F1-score of 0.87.

...read moreread less

Repository•10.6082/uchicago.2736•

Thrifty Query Processing

[...]

Tang Dixin

18 Dec 2020

Abstract: Database systems have long been designed to take one of the two major approaches to process a dataset under changes (e.g. a data stream). Eager query processing methods, such as continuous query processing or immediate incremental view maintenance (IVM), are optimized to reduce query latency. They eagerly maintain standing queries by consuming all available resources to immediately process new data, which can be a major source of wasting CPU cycles and memory resources. On the other hand, lazy query processing methods, such as batch processing or deferred IVM, defer the query execution to a future point to reduce resource consumption but suffer high query latencies. We find that existing eager and lazy query execution approaches are optimized for the applications on the two ends of the resource-latency trade-off, but the middle ground between the two is rarely exploited. This dissertation proposes a new query processing paradigm Thrifty Query Processing (TQP), for the middle-ground applications where users do not need to see the up-to-date query result right after the data is ready and allow a slackness of time before the result is returned. TQP exploits this time slackness to reduce resource consumption and allows users to tune this slackness to adjust query latencies and resource consumption. Implementing TQP involves the redesigns of several core database components. First, we have a new user model that allows users to not just submit a SQL query, but also specify the time slackness information. Specifically, users can specify a performance goal that represents the maximally allowed time to return the result after the data is complete. After, we design a new query execution engine to leverage this performance goal information to reduce CPU cycles. This execution engine includes optimizations for both a single query and multiple queries. For a single query, we consider selectively delaying parts of a query to reduce the resource consumption while meeting the performance goals. For multiple queries, we find that shared execution may not decrease the resource consumption because sharing queries with different performance goals requires the whole plan to execute eagerly to meet the highest performance goal (i.e. the lowest query latency). Therefore, we consider selectively sharing queries to avoid the overhead of eager query execution but also exploit the benefit of eliminating redundant work across queries. Finally, we design a memory management component to release occupied memory resources when the query is not active. We find that in many cases the data arrival rate is low (e.g. late data), where the query may have a long idle time. Therefore, we selectively release memory resources (e.g. intermediate states) that are least useful for processing the new data. We implement TQP in CrocodileDB, a resource-efficient database, and perform extensive experiments to evaluate each component of CrocodileDB. We show that CrocodileDB can significantly reduce CPU and memory consumption while providing similar query latencies compared to existing approaches.

...read moreread less

Proceedings Article•10.1145/3340531.3411999•

Query Understanding via Intent Description Generation

[...]

Ruqing Zhang¹, Jiafeng Guo¹, Yixing Fan¹, Yanyan Lan¹, Xueqi Cheng¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

19 Oct 2020

TL;DR: In this article, a contrastive generation model is proposed to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query, which is a reverse task which aims to generate a natural language intent description based on both relevant and irrelevant documents of a given query.

...read moreread less