Top 595 papers published in the topic of Web query classification in 2006

Showing papers on "Web query classification published in 2006"

Proceedings Article•10.1145/1111037.1111070•

The essence of command injection attacks in web applications

[...]

Zhendong Su¹, Gary Wassermann¹•Institutions (1)

11 Jan 2006

TL;DR: This paper presents the first formal definition of command injection attacks in the context of web applications, and gives a sound and complete algorithm for preventing them based on context-free grammars and compiler parsing techniques.

...read moreread less

Abstract: Web applications typically interact with a back-end database to retrieve persistent data and then present the data to the user as dynamically generated output, such as HTML web pages. However, this interaction is commonly done through a low-level API by dynamically constructing query strings within a general-purpose programming language, such as Java. This low-level interaction is ad hoc because it does not take into account the structure of the output language. Accordingly, user inputs are treated as isolated lexical entities which, if not properly sanitized, can cause the web application to generate unintended output. This is called a command injection attack, which poses a serious threat to web application security. This paper presents the first formal definition of command injection attacks in the context of web applications, and gives a sound and complete algorithm for preventing them based on context-free grammars and compiler parsing techniques. Our key observation is that, for an attack to succeed, the input that gets propagated into the database query or the output document must change the intended syntactic structure of the query or document. Our definition and algorithm are general and apply to many forms of command injection attacks. We validate our approach with SqlCheckS, an implementation for the setting of SQL command injection attacks. We evaluated SqlCheckS on real-world web applications with systematically compiled real-world attack data as input. SqlCheckS produced no false positives or false negatives, incurred low runtime overhead, and applied straightforwardly to web applications written in different languages.

...read moreread less

619 citations

Journal Issue•10.1002/ASI.V57:2•

Link-based similarity measures for the classification of Web documents

[...]

Pável Calado¹, Marco Cristo¹, Marcos André Gonçalves², Edleno Silva de Moura³, Berthier Ribeiro-Neto¹, Nivio Ziviani¹ - Show less +2 more•Institutions (3)

Universidade Federal de Minas Gerais¹, Virginia Tech², Federal University of Amazonas³

15 Jan 2006-Journal of the Association for Information Science and Technology

TL;DR: Tests performed on a Web directory show that link information alone allows classifying documents with an average precision of 86p, and when combined with a traditional text-based classifier, precision increases to values of up to 90p, representing gains that range from 63 to 132p over the use of text- based classification alone.

...read moreread less

Abstract: Traditional text-based document classifiers tend to perform poorly on the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed on a Web directory show that link information alone allows classifying documents with an average precision of 86p. Further, when combined with a traditional text-based classifier, precision increases to values of up to 90p, representing gains that range from 63 to 132p over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines on how link structure can be used effectively to classify Web documents. © 2006 Wiley Periodicals, Inc.

...read moreread less

590 citations

Patent•

User interface for geographic search

[...]

John R. Frank

28 Jun 2006

TL;DR: In this paper, a computer-implemented method of processing a geotext query is proposed, which involves: receiving a first free-text query string from a user; and decomposing the first freetext query into a non-geographic query and a geographic query.

...read moreread less

Abstract: A computer-implemented method of processing a geotext query, said method involving: receiving a first free-text query string from a user; and decomposing the first free-text query into a non-geographic query and a geographic query, wherein the non-geographic query is a second free-text query string derived from the first free-text query string and the geographic query is a geographical location description.

...read moreread less

310 citations

Patent•

Social network-based internet search engine

[...]

Kapenda Thomas

21 Jun 2006

TL;DR: In this article, a filter selects data for inclusion in the data subset based upon occurrence of the data in a database, and the database includes content selected for inclusion by designated users.

...read moreread less

Abstract: Filtering Internet content includes receiving a search query message comprising a search query to an Internet search engine. Data is received from the Internet search engine, responsive to the search query message. Filtering of the data produces a data subset. The filter selects data for inclusion in the data subset based upon occurrence of the data in a database. The database includes content selected for inclusion by designated users. The data subset is displayed in a browser.

...read moreread less

262 citations

Patent•

System and method for searching for a query

[...]

Timothy A. Musgrove, Robin Hiroko Walsh

11 Apr 2006

TL;DR: A search system for searching for electronic documents, and providing a search result in response to a search query is provided in this paper, which includes a search engine that executes a search based on the search query term and the equivalent terms.

...read moreread less

Abstract: A search system for searching for electronic documents, and providing a search result in response to a search query is provided. The search system includes a processor, a user interface module adapted to receive a search query from a user, the search query having at least one search query term, and a query processing module that analyzes the search query term to identify candidate synonym words. The query processing module also determines which of the candidate synonym words are equivalent terms to the search query term, and in a same sense as the search query term. In addition, the search system includes a search engine that executes a search based on the search query term and the equivalent terms.

...read moreread less

238 citations

Patent•

Method For Information Retrieval

[...]

Alexandros Ntoulas¹, Gerald Chao¹•Institutions (1)

University of California¹

13 Apr 2006

TL;DR: A method of retrieving documents using a search engine includes providing a reverse index including one or more keywords and a list of documents containing the keywords, the reverse index further including a measure of confidence (MOC) value associated with the keywords as mentioned in this paper.

...read moreread less

Abstract: A method of retrieving documents using a search engine includes providing a reverse index including one or more keywords and a list of documents containing the one or more keywords, the reverse index further including a measure of confidence (MOC) value associated with the one or more keywords. One or more query terms are input into the search engine. The query terms are disambiguated and a MOC value is associated with each meaning of the disambiguated query term. A list of documents is retrieved containing the query terms wherein the documents are initially ranked based at least in part on the MOC values of the keywords and query terms. The list of documents may be re-ranked based at least in part on the semantic similarity of each document to the disambiguated query terms.

...read moreread less

236 citations

Patent•

Method and system of bidding for advertisement placement on computing devices

[...]

Michael Libes, Brian Lent

9 Mar 2006

TL;DR: In this paper, a computer system and method for processing a search query directed to a collection of pages is described, where the winning bid corresponds to an advertiser who may specify a sponsored link or sponsored page that is offered to the user in response to the search query.

...read moreread less

Abstract: A computer system and method for processing a search query directed to a collection of pages includes receiving a search query of a user, identifying one or mor result pages from the collection of pages in response to the search query, comparing keywords of the search query and a concept hierarchy of the result pages and user features against a set of bids for keywords, concepts, and user features that are submitted by advertisers to identify matching bids. A winning bid is selected from among the matching bids. The winning bid corresponds to an advertiser who may specify a sponsored link or sponsored page that is offered to the user in response to th search query.

...read moreread less

231 citations

Patent•

Method for presenting search results

[...]

Bing Swen

13 Feb 2006

TL;DR: This article presented a method for grouping the search results, which presents ranked derived queries together with their search results to the user, in such a way that derived queries with higher ranks and top-ranked documents of each derived query are preferentially presented, and the grouped results are displayed and navigated in independent framed subareas of an output window.

...read moreread less

Abstract: Methods and systems are provided to present the search results in response to a search query that is submitted to a document retrieval system, such as a search engine. The search results are presented with a second-retrieval model that constructs multiple derived queries for the search query with a first small-document retrieval process, and then generates and outputs the results based on the retrieval of search results of at least part of the derived queries. One embodiment of the invention provides a method for grouping the search results, which presents ranked derived queries together with their search results to the user, in such a way that derived queries with higher ranks and top-ranked documents of each derived query are preferentially presented, and the grouped results are displayed and navigated in independent framed subareas of an output window. A further embodiment selects the search results from multiple result lists of the derived queries to form the final search results for the user query, wherein the merged results are re-ranked according to pre-determined criteria. The method can also be integrated with the local keyword associated clustering method by rank value adjustment, or result filtering or merging to achieve better technical effects.

...read moreread less

227 citations

Proceedings Article•10.1145/1142473.1142543•

On-the-fly sharing for streamed aggregation

[...]

Sailesh Krishnamurthy¹, Chung Wu², Michael J. Franklin¹•Institutions (2)

University of California, Berkeley¹, Google²

27 Jun 2006

TL;DR: A major contribution is the sharing technique that does not require any up-front multiple query optimization, a significant departure from existing techniques that rely on complex static analyses of fixed query workloads.

...read moreread less

Abstract: Data streaming systems are becoming essential for monitoring applications such as financial analysis and network intrusion detection. These systems often have to process many similar but different queries over common data. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. In this paper we present ways to efficiently share streaming aggregate queries with differing periodic windows and arbitrary selection predicates. A major contribution is our sharing technique that does not require any up-front multiple query optimization. This is a significant departure from existing techniques that rely on complex static analyses of fixed query workloads. Our approach is particularly vital in streaming systems where queries can join and leave the system at any point. We present a detailed performance study that evaluates our strategies with an implementation and real data. In these experiments, our approach gives us as much as an order of magnitude performance improvement over the state of the art.

...read moreread less

224 citations

Patent•

Multimodal search query processing

[...]

Jorey Ramer, Adam Soroca, Dennis Doughty

3 Feb 2006

TL;DR: In this paper, the authors described improved capabilities for receiving a nontext based information request from a mobile communication facility, transforming the non-text-based information request into a text-based search query, and presenting a search result to the mobile communications facility based on the text based search query.

...read moreread less

Abstract: Improved capabilities are described for receiving a non-text based information request from a mobile communication facility, transforming the non-text based information request into a text based search query, and presenting a search result to the mobile communication facility based on the text based search query.

...read moreread less

170 citations

Patent•

Augmenting queries with synonyms selected using language statistics

[...]

Ruchira S. Datta¹, Fabio Lopiano¹•Institutions (1)

Google¹

19 Apr 2006

TL;DR: In this article, the synonyms map mapping each of a plurality of keys to one or more corresponding variants, each variant being associated for each associated language with a variant-language score indicating a relative frequency of the variant among all variants for the associated language for the same key.

...read moreread less

Abstract: Methods, systems, and apparatus, including computer program products, operable to perform operations including receiving from a user through a user interface a search query comprising a query term, the search query having attributed to it a query language; deriving a simplified query term from the query term; and identifying one or more potential synonyms for the query term by looking up the simplified query term in a synonyms map, the synonyms map mapping each of a plurality of keys to one or more corresponding variants, each variant being a word associated with one or more document languages, and each variant being associated for each associated language with a variant-language score indicating a relative frequency of the variant among all variants for the associated language for the same key.

...read moreread less

Patent•

Re-ranking search results based on query log

[...]

Silviu Cucerzan¹, Ziming Zhuang¹•Institutions (1)

Microsoft¹

13 Mar 2006

TL;DR: In this article, the relevance of the search results for a target query can be judged based on one or more queries in the query log that are related to the target query temporally and/or lexically.

...read moreread less

Abstract: A system(s) and/or method(s) that facilitate improving the relevance of search results through utilization of a query log. The relevance of the search results for a target query can be judged based on one or more queries in the log that are related to the target query temporally and/or lexically. The diversity of the top-ranked search results can be increased and/or decreased based on an iterative re-ranking process of the search result set.

...read moreread less

Journal Article•10.1145/1165774.1165776•

Query enrichment for web-query classification

[...]

Dou Shen¹, Rong Pan¹, Jian-Tao Sun², Jeffrey Junfeng Pan¹, Kangheng Wu¹, Jie Yin¹, Qiang Yang¹ - Show less +3 more•Institutions (2)

Hong Kong University of Science and Technology¹, Microsoft²

01 Jul 2006-ACM Transactions on Information Systems

TL;DR: It is shown that, despite the difficulty of an abundance of ambiguous queries and lack of training data, the query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework.

...read moreread less

Abstract: Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4p and 44.4p, respectively, which is 9.6p and 24.3p higher than those from the runner-ups, respectively.

...read moreread less

Patent•

Mobile search substring query completion

[...]

Jorey Ramer, Adam Soroca, Dennis Doughty

8 May 2006

TL;DR: In this article, improved capabilities are described for mobile search substring query entry completion, wherein complete search terms are presented to a user in response to a search query that is not a fully formed query.

...read moreread less

Abstract: In embodiments of the present invention improved capabilities are described for mobile search substring query entry completion, wherein complete search terms are presented to a user in response to a search query that is not a fully formed query.

...read moreread less

Proceedings Article•

Design and implementation of the CALO query manager

[...]

José Luis Ambite¹, Vinay K. Chaudhri², Richard Fikes³, Jessica Jenkins³, Sunil Mishra², Maria Muslea¹, Tomás E. Uribe², Guizhen Yang² - Show less +4 more•Institutions (3)

Information Sciences Institute¹, Artificial Intelligence Center², Stanford University³

16 Jul 2006

TL;DR: The experience in developing a query-answering system that integrates multiple knowledge sources based on a novel architecture for combining knowledge sources in which the sources can produce new subgoals as well as ground facts in the search for answers to existing subGoals is reported on.

...read moreread less

Abstract: We report on our experience in developing a query-answering system that integrates multiple knowledge sources. The system is based on a novel architecture for combining knowledge sources in which the sources can produce new subgoals as well as ground facts in the search for answers to existing subgoals. The system uses a query planner that takes into account different query-processing capabilities of individual sources and augments them gracefully. A reusable ontology provides a mediated schema that serves as the basis for integration. We have evaluated the system on a suite of test queries in a realistic application to verify the practicality of our approach.

...read moreread less

Patent•

System and method of query paraphrasing

[...]

William Wu

4 Aug 2006

TL;DR: In this article, a platform-independent process for data retrieval from ontology-oriented data systems over computer networks through a flexible system and method of query paraphrasing is presented.

...read moreread less

Abstract: A platform-independent process for data retrieval from ontology-oriented data systems over computer networks through a flexible system and method of query paraphrasing. The present invention uses a “common ontology” that is not tied to any particular data system. Thus, each client computer issues queries to a target data system in the common ontology. Of course, the target data system will not be able to directly process the query (as it is not in its local ontology). Instead, the query is first paraphrased back from the common ontology into local ontology by taking the semantic query, passing it through a query paraphraser, and then sending the paraphrased query to the data system. Once it is paraphrased successfully, the target data system can process it and produce a result using local ontology. The result may then be sent from the data system to an answer paraphraser for paraphrasing, and the paraphrased answer may be returned to its original query issuer and on to the client.

...read moreread less

Patent•

Discovering query intent from search queries and concept networks

[...]

Deepa Joshi¹, John Thrall•Institutions (1)

Yahoo!¹

20 Dec 2006

TL;DR: In this article, a system is described for discovering query intent based on search queries and concept networks, and the system may construct frequency vectors from log data corresponding to a submitted query and at least one related query submitted to one or more search engines.

...read moreread less

Abstract: A system is described for discovering query intent based on search queries and concept networks. The system may construct frequency vectors from log data corresponding to a submitted query and at least one related query submitted to one or more search engines. The system may also construct a query intent vector based on the frequency vectors. The query intent vector may include frequency scores that represent the intent of the query.

...read moreread less

Patent•

[...]

Nathaniel Harward, Andrew Geweke, Alexander Voskoboynik

25 May 2006

TL;DR: In this article, the authors propose a caching system to determine the tables, rows, or other partitions of data a received query is dependent upon or modifies by submitting a version of the received query to the database through a native facility provided by the database.

...read moreread less

Abstract: Database data is maintained reliably and invalidated based on actual changes to data in the database. Updates or changes to data are detected without parsing queries submitted to the database. The dependencies of a query can be determined by submitting a version of the received query to the database through a native facility provided by the database to analyze how query structures are processed. The caching system can access the results of the facility to determine the tables, rows, or other partitions of data a received query is dependent upon or modifies. An abstracted form of the query can be cached with an indication of the tables, rows, etc. that queries of that structure access or modify. The tables a write or update query modifies can be cached with a time of last modification. When a query is received for which the results are cached, the system can readily determine dependency information for the query, the last time the dependencies were modified, and compare this time with the time indicated for when the cached results were retrieved. By passing versions of write queries to the database, updates to the database can be detected.

...read moreread less

Patent•

Web query classification

[...]

Abdur Chowdhury, Steven M. Beitzel, David D. Lewis¹, Aleksander Kolcz•Institutions (1)

Marathon Oil¹

27 Jan 2006

TL;DR: In this article, a query phrase may be automatically classified to one or more topics of interest (e.g., categories) to assist in routing the query phrase to the appropriate backend databases.

...read moreread less

Abstract: A query phrase may be automatically classified to one or more topics of interest (e.g., categories) to assist in routing the query phrase to one or more appropriate backend databases. A selectional preference query classification technique may be used to classify the query phrase based on a comparison between the query phrase and patterns of query phrases. Additionally, or alternatively, a combination of query classification techniques may be used to classify the query phrase. Topical classification of a query phrase also may be used to assist a search system in delivering auxiliary information to a user who entered the query phrase. Advertisements, for instance, may be tailored based on classification rather than query keywords.

...read moreread less

Patent•

Query revision using known highly-ranked queries

[...]

David R. Bailey¹, Alexis Battle¹, David Cohn¹, Barbara Englehardt¹, P. Pandurang Nayak¹ - Show less +1 more•Institutions (1)

Google¹

13 Mar 2006

TL;DR: In this article, a query rank reviser suggests known highly-ranked queries as revisions to a first query by initially assigning a rank to all queries, and identifying a set of known highly ranked queries (KHRQ).

...read moreread less

Abstract: An information retrieval system includes a query revision architecture providing one or more query revisers, each of which implements a query revision strategy. A query rank reviser suggests known highly-ranked queries as revisions to a first query by initially assigning a rank to all queries, and identifying a set of known highly-ranked queries (KHRQ). Queries with a strong probability of being revised to a KHRQ are identified as nearby queries (NQ). Alternative queries that are KHRQs are provided as candidate revisions for a given query. For alternative queries that are NQs, the corresponding known highly-ranked queries are provided as candidate revisions.

...read moreread less

Patent•

Systems and methods for using lexically-related query elements within a dynamic object for semantic search refinement and navigation

[...]

Thomas D. Holt, Larry Stephen Burke

11 Apr 2006

TL;DR: In this article, a method and system for dynamically refining and navigating between alternative search query elements are disclosed, which is applicable to searching an information system such as the Internet, an intranet, or any database, lexicon, or collection of documents, disk drive, images or video or audio content.

...read moreread less

Abstract: A method and system for dynamically refining and navigating between alternative search query elements are disclosed. The method and system are applicable to searching an information system such as the Internet, an intranet, or any database, lexicon, or collection of documents, disk drive, images or video or audio content. A user enters their search query into a search query receiver. As the user enters their search query, they see, in real-time in a dynamically-generated object, such as a drop-down menu, iFrame, or browser window, possible matches to their search query string, and more specifically, the user receives within the dynamic object alternative semantically- and lexically-related search elements that relate to the search query string and from which the user can either make a selection to further refine their search query, or the user can proceed to view search results based on the selected query element. The relation of alternate lexical elements is based on a controlled or structured vocabulary (for example a thesaurus).

...read moreread less

Patent•

Search engine that identifies and uses social networks in communications, retrieval, and electronic commerce

[...]

Christopher A. Meek¹, Eric Horvitz¹, Joshua T. Goodman¹, Gary W. Flake¹, Oliver Hurst-Hiller¹, Anoop Gupta¹, Ramez Naam¹, Kenneth A. Moss¹, William H. Gates¹, John Platt¹, Trenholme J. Griffin¹, Bradly A. Brunell¹ - Show less +8 more•Institutions (1)

Microsoft¹

28 Jun 2006

TL;DR: A search engine can be interactively coupled with one or more social networks, and that maps individuals and groups within respective social networks to subsets of categories associated with searches.

...read moreread less

Abstract: Architecture that monitors interaction data (e.g., search queries, query results and click-through rates), and provides users with links to other users that fall into similar categories with respect to the foregoing monitored activities (e.g., providing links to individuals and groups that share common interests and/or profiles). A search engine can be interactively coupled with one or more social networks, and that maps individuals and/or groups within respective social networks to subsets of categories associated with searches. A database stores mapped information which can be continuously updated and reorganized as links within the system mapping become stronger or weaker. The architecture can comprise a social network system that includes a database for mapping search-related information to an entity of a social network, and a search component for processing a search query for search results and returning a link to an entity of a social network based on the search query.

...read moreread less

Patent•

Generating clusters of images for search results

[...]

Feng Jing¹, Lei Zhang¹, Mingjing Li¹, Wei-Ying Ma¹, Changhu Wang¹ - Show less +1 more•Institutions (1)

Microsoft¹

23 Jan 2006

TL;DR: In this article, a method and system for generating clusters of images for a search result of an image query is presented, where the search system considers the image search result for each image query to represent a cluster of related images.

...read moreread less

Abstract: A method and system for generating clusters of images for a search result of an image query is provided. When an original image query is received, the search system identifies text associated with the original image query by submitting the original image query to a search engine. The search system identifies phrases from the text of the web page containing the search result. The search system uses each of the identified phrases as an image query and submits the image queries to an image search engine. The search system considers the image search result for each image query to represent a cluster of related images. The search system then presents the clusters of images as the images of the image search result of the original image query.

...read moreread less

...

Expand