TL;DR: This paper presents the first formal definition of command injection attacks in the context of web applications, and gives a sound and complete algorithm for preventing them based on context-free grammars and compiler parsing techniques.
Abstract: Web applications typically interact with a back-end database to retrieve persistent data and then present the data to the user as dynamically generated output, such as HTML web pages. However, this interaction is commonly done through a low-level API by dynamically constructing query strings within a general-purpose programming language, such as Java. This low-level interaction is ad hoc because it does not take into account the structure of the output language. Accordingly, user inputs are treated as isolated lexical entities which, if not properly sanitized, can cause the web application to generate unintended output. This is called a command injection attack, which poses a serious threat to web application security. This paper presents the first formal definition of command injection attacks in the context of web applications, and gives a sound and complete algorithm for preventing them based on context-free grammars and compiler parsing techniques. Our key observation is that, for an attack to succeed, the input that gets propagated into the database query or the output document must change the intended syntactic structure of the query or document. Our definition and algorithm are general and apply to many forms of command injection attacks. We validate our approach with SqlCheckS, an implementation for the setting of SQL command injection attacks. We evaluated SqlCheckS on real-world web applications with systematically compiled real-world attack data as input. SqlCheckS produced no false positives or false negatives, incurred low runtime overhead, and applied straightforwardly to web applications written in different languages.
TL;DR: Tests performed on a Web directory show that link information alone allows classifying documents with an average precision of 86p, and when combined with a traditional text-based classifier, precision increases to values of up to 90p, representing gains that range from 63 to 132p over the use of text- based classification alone.
TL;DR: In this paper, a computer-implemented method of processing a geotext query is proposed, which involves: receiving a first free-text query string from a user; and decomposing the first freetext query into a non-geographic query and a geographic query.
Abstract: A computer-implemented method of processing a geotext query, said method involving: receiving a first free-text query string from a user; and decomposing the first free-text query into a non-geographic query and a geographic query, wherein the non-geographic query is a second free-text query string derived from the first free-text query string and the geographic query is a geographical location description.
TL;DR: In this article, a filter selects data for inclusion in the data subset based upon occurrence of the data in a database, and the database includes content selected for inclusion by designated users.
Abstract: Filtering Internet content includes receiving a search query message comprising a search query to an Internet search engine. Data is received from the Internet search engine, responsive to the search query message. Filtering of the data produces a data subset. The filter selects data for inclusion in the data subset based upon occurrence of the data in a database. The database includes content selected for inclusion by designated users. The data subset is displayed in a browser.
TL;DR: A search system for searching for electronic documents, and providing a search result in response to a search query is provided in this paper, which includes a search engine that executes a search based on the search query term and the equivalent terms.
Abstract: A search system for searching for electronic documents, and providing a search result in response to a search query is provided. The search system includes a processor, a user interface module adapted to receive a search query from a user, the search query having at least one search query term, and a query processing module that analyzes the search query term to identify candidate synonym words. The query processing module also determines which of the candidate synonym words are equivalent terms to the search query term, and in a same sense as the search query term. In addition, the search system includes a search engine that executes a search based on the search query term and the equivalent terms.
TL;DR: A method of retrieving documents using a search engine includes providing a reverse index including one or more keywords and a list of documents containing the keywords, the reverse index further including a measure of confidence (MOC) value associated with the keywords as mentioned in this paper.
Abstract: A method of retrieving documents using a search engine includes providing a reverse index including one or more keywords and a list of documents containing the one or more keywords, the reverse index further including a measure of confidence (MOC) value associated with the one or more keywords. One or more query terms are input into the search engine. The query terms are disambiguated and a MOC value is associated with each meaning of the disambiguated query term. A list of documents is retrieved containing the query terms wherein the documents are initially ranked based at least in part on the MOC values of the keywords and query terms. The list of documents may be re-ranked based at least in part on the semantic similarity of each document to the disambiguated query terms.
TL;DR: In this paper, a computer system and method for processing a search query directed to a collection of pages is described, where the winning bid corresponds to an advertiser who may specify a sponsored link or sponsored page that is offered to the user in response to the search query.
Abstract: A computer system and method for processing a search query directed to a collection of pages includes receiving a search query of a user, identifying one or mor result pages from the collection of pages in response to the search query, comparing keywords of the search query and a concept hierarchy of the result pages and user features against a set of bids for keywords, concepts, and user features that are submitted by advertisers to identify matching bids. A winning bid is selected from among the matching bids. The winning bid corresponds to an advertiser who may specify a sponsored link or sponsored page that is offered to the user in response to th search query.
TL;DR: This article presented a method for grouping the search results, which presents ranked derived queries together with their search results to the user, in such a way that derived queries with higher ranks and top-ranked documents of each derived query are preferentially presented, and the grouped results are displayed and navigated in independent framed subareas of an output window.
Abstract: Methods and systems are provided to present the search results in response to a search query that is submitted to a document retrieval system, such as a search engine. The search results are presented with a second-retrieval model that constructs multiple derived queries for the search query with a first small-document retrieval process, and then generates and outputs the results based on the retrieval of search results of at least part of the derived queries. One embodiment of the invention provides a method for grouping the search results, which presents ranked derived queries together with their search results to the user, in such a way that derived queries with higher ranks and top-ranked documents of each derived query are preferentially presented, and the grouped results are displayed and navigated in independent framed subareas of an output window. A further embodiment selects the search results from multiple result lists of the derived queries to form the final search results for the user query, wherein the merged results are re-ranked according to pre-determined criteria. The method can also be integrated with the local keyword associated clustering method by rank value adjustment, or result filtering or merging to achieve better technical effects.
TL;DR: A major contribution is the sharing technique that does not require any up-front multiple query optimization, a significant departure from existing techniques that rely on complex static analyses of fixed query workloads.
Abstract: Data streaming systems are becoming essential for monitoring applications such as financial analysis and network intrusion detection. These systems often have to process many similar but different queries over common data. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. In this paper we present ways to efficiently share streaming aggregate queries with differing periodic windows and arbitrary selection predicates. A major contribution is our sharing technique that does not require any up-front multiple query optimization. This is a significant departure from existing techniques that rely on complex static analyses of fixed query workloads. Our approach is particularly vital in streaming systems where queries can join and leave the system at any point. We present a detailed performance study that evaluates our strategies with an implementation and real data. In these experiments, our approach gives us as much as an order of magnitude performance improvement over the state of the art.
TL;DR: In this paper, the authors described improved capabilities for receiving a nontext based information request from a mobile communication facility, transforming the non-text-based information request into a text-based search query, and presenting a search result to the mobile communications facility based on the text based search query.
Abstract: Improved capabilities are described for receiving a non-text based information request from a mobile communication facility, transforming the non-text based information request into a text based search query, and presenting a search result to the mobile communication facility based on the text based search query.
TL;DR: In this article, the synonyms map mapping each of a plurality of keys to one or more corresponding variants, each variant being associated for each associated language with a variant-language score indicating a relative frequency of the variant among all variants for the associated language for the same key.
Abstract: Methods, systems, and apparatus, including computer program products, operable to perform operations including receiving from a user through a user interface a search query comprising a query term, the search query having attributed to it a query language; deriving a simplified query term from the query term; and identifying one or more potential synonyms for the query term by looking up the simplified query term in a synonyms map, the synonyms map mapping each of a plurality of keys to one or more corresponding variants, each variant being a word associated with one or more document languages, and each variant being associated for each associated language with a variant-language score indicating a relative frequency of the variant among all variants for the associated language for the same key.
TL;DR: In this article, the relevance of the search results for a target query can be judged based on one or more queries in the query log that are related to the target query temporally and/or lexically.
Abstract: A system(s) and/or method(s) that facilitate improving the relevance of search results through utilization of a query log. The relevance of the search results for a target query can be judged based on one or more queries in the log that are related to the target query temporally and/or lexically. The diversity of the top-ranked search results can be increased and/or decreased based on an iterative re-ranking process of the search result set.
TL;DR: It is shown that, despite the difficulty of an abundance of ambiguous queries and lack of training data, the query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework.
Abstract: Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4p and 44.4p, respectively, which is 9.6p and 24.3p higher than those from the runner-ups, respectively.
TL;DR: In this article, improved capabilities are described for mobile search substring query entry completion, wherein complete search terms are presented to a user in response to a search query that is not a fully formed query.
Abstract: In embodiments of the present invention improved capabilities are described for mobile search substring query entry completion, wherein complete search terms are presented to a user in response to a search query that is not a fully formed query.
TL;DR: The experience in developing a query-answering system that integrates multiple knowledge sources based on a novel architecture for combining knowledge sources in which the sources can produce new subgoals as well as ground facts in the search for answers to existing subGoals is reported on.
Abstract: We report on our experience in developing a query-answering system that integrates multiple knowledge sources. The system is based on a novel architecture for combining knowledge sources in which the sources can produce new subgoals as well as ground facts in the search for answers to existing subgoals. The system uses a query planner that takes into account different query-processing capabilities of individual sources and augments them gracefully. A reusable ontology provides a mediated schema that serves as the basis for integration. We have evaluated the system on a suite of test queries in a realistic application to verify the practicality of our approach.
TL;DR: In this article, a platform-independent process for data retrieval from ontology-oriented data systems over computer networks through a flexible system and method of query paraphrasing is presented.
Abstract: A platform-independent process for data retrieval from ontology-oriented data systems over computer networks through a flexible system and method of query paraphrasing. The present invention uses a “common ontology” that is not tied to any particular data system. Thus, each client computer issues queries to a target data system in the common ontology. Of course, the target data system will not be able to directly process the query (as it is not in its local ontology). Instead, the query is first paraphrased back from the common ontology into local ontology by taking the semantic query, passing it through a query paraphraser, and then sending the paraphrased query to the data system. Once it is paraphrased successfully, the target data system can process it and produce a result using local ontology. The result may then be sent from the data system to an answer paraphraser for paraphrasing, and the paraphrased answer may be returned to its original query issuer and on to the client.
TL;DR: In this article, a system is described for discovering query intent based on search queries and concept networks, and the system may construct frequency vectors from log data corresponding to a submitted query and at least one related query submitted to one or more search engines.
Abstract: A system is described for discovering query intent based on search queries and concept networks. The system may construct frequency vectors from log data corresponding to a submitted query and at least one related query submitted to one or more search engines. The system may also construct a query intent vector based on the frequency vectors. The query intent vector may include frequency scores that represent the intent of the query.
TL;DR: A computer system and method for processing search query result includes identifying a plurality of result pages in response to a search query submitted from a computing device directed to a collection of pages, determining a relevancy ranking of the result pages according with a multiple dimension parameter set that includes metrics relating to the search query itself and also includes metrics unique to a subscriber associated with the search queries.
Abstract: A computer system and method a computer system and method for processing a search query result includes identifying a plurality of result pages in response to a search query submitted from a computing device directed to a collection of pages, determining a relevancy ranking of the result pages in accordance with a multiple dimension parameter set that includes metrics relating to the search query itself and also includes metrics unique to a subscriber associated with the search query, and providing the result pages in accordance with the determined relevancy ranking. This provides an active ranking process for the search results before they are provided to a user.
TL;DR: In this paper, a user query is disclosed and a list of search concepts associated with the query is then displayed, and a preferred search query for the search concepts can be determined.
Abstract: Refining a user query is disclosed. In one method, a query is received from a user, and then mapped to one or more search concepts. A list of search concepts associated associated with the query is then displayed. Alternatively or additionally, the search concepts associated with the query are used to provide a set of improved search results. In another method, a number of queries from a number of users are analyzed to identify two or more search concepts, and a popularity value is assigned to them based on the queries. Thus, the relative popularity of the respective search concepts can be determined. Alternatively or additionally, a preferred search query for the search concepts can be determined. The popularity and preferred queries can be used to allow automatic or user-initiated refinement.
TL;DR: Novel methods for use of distributional similarity estimated from query logs in learning improved query spelling correction models are described, which can significantly outperform their baseline systems in the web query spelling Correction task.
Abstract: A query speller is crucial to search engine in improving web search relevance. This paper describes novel methods for use of distributional similarity estimated from query logs in learning improved query spelling correction models. The key to our methods is the property of distributional similarity between two terms: it is high between a frequently occurring misspelling and its correction, and low between two irrelevant terms only with similar spellings. We present two models that are able to take advantage of this property. Experimental results demonstrate that the distributional similarity based models can significantly outperform their baseline systems in the web query spelling correction task.
TL;DR: In this paper, a system may automatically identify prior search queries that include the one or more terms of the search query from a history of prior queries, and automatically identify possible spelling corrected search queries based on the search queries.
Abstract: A system may receive one or more terms of a search query. The system, may automatically identify prior search queries that include the one or more terms of the search query from a history of prior search queries. The system may automatically identify possible spelling corrected search queries based on the one or more terms of the search queries. The system may automatically receive remote server-based query completion suggestions including the one or more terms of the search query. The system may present query refinement options, the query refinement box being populated with the prior search queries as suggested queries for possible selection by a user, the identified possible spelling corrected search queries, and the received query completion suggestions.
TL;DR: The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream, and proposes an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier.
Abstract: Current models of the classification problem do not effectively handle bursts of particular classes coming in at different times. In fact, the current model of the classification problem simply concentrates on methods for one-pass classification modeling of very large data sets. Our model for data stream classification views the data stream classification problem from the point of view of a dynamic approach in which simultaneous training and test streams are used for dynamic classification of data sets. This model reflects real-life situations effectively, since it is desirable to classify test streams in real time over an evolving training and test stream. The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream. In order to achieve this goal, we propose an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier. The empirical results indicate that the system maintains an high classification accuracy in an evolving data stream, while providing an efficient solution to the classification task.
TL;DR: This paper proposes a lattice based framework to systematically relax queries involving joins and selections, and describes the properties of relaxation at each node and presents algorithms to compute the corresponding answer.
Abstract: Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the 'minimal' amount of relaxation that yields an answer. We first characterize the types of answers that we return to relaxed queries. We then propose a lattice based framework in order to aid query relaxation. Nodes in the lattice correspond to different ways to relax queries. We characterize the properties of relaxation at each node and present algorithms to compute the corresponding answer. We then discuss how to traverse this lattice in a way that a non-empty query answer is obtained with the minimum amount of query condition relaxation. We implemented this framework and we present our results of a thorough performance evaluation using real and synthetic data. Our results indicate the practical utility of our framework.
TL;DR: In this article, an alternative search query with a spelling suggestion of "britney spears" is provided to the user in response to a search query of "brittany sp".
Abstract: Providing an alternative search query to a predicted search query is disclosed herein. A search query is received from a client node. Prior to receiving an indication from the client node that the search query is completely formed, the following steps are performed: 1) a predicted search query is determined by predicting what the search query will be when completed; and 2) an alternative search query that differs from the predicted search query is determined based on the predicted search query. The alternative search query is provided to the client node. The alternative search query may be something that the user search query is unlikely to complete to. For example, in response to the user entering a search query of “brittany sp”, an alternative search query with a spelling suggestion of “britney spears” is determined and provided to the user.
TL;DR: In this article, the authors propose a caching system to determine the tables, rows, or other partitions of data a received query is dependent upon or modifies by submitting a version of the received query to the database through a native facility provided by the database.
Abstract: Database data is maintained reliably and invalidated based on actual changes to data in the database. Updates or changes to data are detected without parsing queries submitted to the database. The dependencies of a query can be determined by submitting a version of the received query to the database through a native facility provided by the database to analyze how query structures are processed. The caching system can access the results of the facility to determine the tables, rows, or other partitions of data a received query is dependent upon or modifies. An abstracted form of the query can be cached with an indication of the tables, rows, etc. that queries of that structure access or modify. The tables a write or update query modifies can be cached with a time of last modification. When a query is received for which the results are cached, the system can readily determine dependency information for the query, the last time the dependencies were modified, and compare this time with the time indicated for when the cached results were retrieved. By passing versions of write queries to the database, updates to the database can be detected.
TL;DR: In this article, a query phrase may be automatically classified to one or more topics of interest (e.g., categories) to assist in routing the query phrase to the appropriate backend databases.
Abstract: A query phrase may be automatically classified to one or more topics of interest (e.g., categories) to assist in routing the query phrase to one or more appropriate backend databases. A selectional preference query classification technique may be used to classify the query phrase based on a comparison between the query phrase and patterns of query phrases. Additionally, or alternatively, a combination of query classification techniques may be used to classify the query phrase. Topical classification of a query phrase also may be used to assist a search system in delivering auxiliary information to a user who entered the query phrase. Advertisements, for instance, may be tailored based on classification rather than query keywords.
TL;DR: In this article, a query rank reviser suggests known highly-ranked queries as revisions to a first query by initially assigning a rank to all queries, and identifying a set of known highly ranked queries (KHRQ).
Abstract: An information retrieval system includes a query revision architecture providing one or more query revisers, each of which implements a query revision strategy. A query rank reviser suggests known highly-ranked queries as revisions to a first query by initially assigning a rank to all queries, and identifying a set of known highly-ranked queries (KHRQ). Queries with a strong probability of being revised to a KHRQ are identified as nearby queries (NQ). Alternative queries that are KHRQs are provided as candidate revisions for a given query. For alternative queries that are NQs, the corresponding known highly-ranked queries are provided as candidate revisions.
TL;DR: In this article, a method and system for dynamically refining and navigating between alternative search query elements are disclosed, which is applicable to searching an information system such as the Internet, an intranet, or any database, lexicon, or collection of documents, disk drive, images or video or audio content.
Abstract: A method and system for dynamically refining and navigating between alternative search query elements are disclosed. The method and system are applicable to searching an information system such as the Internet, an intranet, or any database, lexicon, or collection of documents, disk drive, images or video or audio content. A user enters their search query into a search query receiver. As the user enters their search query, they see, in real-time in a dynamically-generated object, such as a drop-down menu, iFrame, or browser window, possible matches to their search query string, and more specifically, the user receives within the dynamic object alternative semantically- and lexically-related search elements that relate to the search query string and from which the user can either make a selection to further refine their search query, or the user can proceed to view search results based on the selected query element. The relation of alternate lexical elements is based on a controlled or structured vocabulary (for example a thesaurus).
TL;DR: A search engine can be interactively coupled with one or more social networks, and that maps individuals and groups within respective social networks to subsets of categories associated with searches.
Abstract: Architecture that monitors interaction data (e.g., search queries, query results and click-through rates), and provides users with links to other users that fall into similar categories with respect to the foregoing monitored activities (e.g., providing links to individuals and groups that share common interests and/or profiles). A search engine can be interactively coupled with one or more social networks, and that maps individuals and/or groups within respective social networks to subsets of categories associated with searches. A database stores mapped information which can be continuously updated and reorganized as links within the system mapping become stronger or weaker. The architecture can comprise a social network system that includes a database for mapping search-related information to an entity of a social network, and a search component for processing a search query for search results and returning a link to an entity of a social network based on the search query.
TL;DR: In this article, a method and system for generating clusters of images for a search result of an image query is presented, where the search system considers the image search result for each image query to represent a cluster of related images.
Abstract: A method and system for generating clusters of images for a search result of an image query is provided. When an original image query is received, the search system identifies text associated with the original image query by submitting the original image query to a search engine. The search system identifies phrases from the text of the web page containing the search result. The search system uses each of the identified phrases as an image query and submits the image queries to an image search engine. The search system considers the image search result for each image query to represent a cluster of related images. The search system then presents the clusters of images as the images of the image search result of the original image query.