Top 22 papers published in the topic of URL normalization in 2007

Showing papers on "URL normalization published in 2007"

Patent•

System and method of analyzing web content

[...]

Dan Hubbard, Nicholas J. Verenini, Victor L. Baddour

9 Jul 2007

TL;DR: In this paper, a system and method for identifying inappropriate content in websites on a network is presented, where unrecognized uniform resource locators (URLs) or other web content are accessed by workstations and are identified as possibly having malicious content.

...read moreread less

Abstract: A system and method are provided for identifying inappropriate content in websites on a network. Unrecognized uniform resource locators (URLs) or other web content are accessed by workstations and are identified as possibly having malicious content. The URLs or web content may be preprocessed within a gateway server module or some other software module to collect additional information related to the URLs. The URLs may be scanned for known attack signatures, and if any are found, they may be tagged as candidate URLs in need of further analysis by a classification module.

...read moreread less

362 citations

Patent•

Systems and methods for removing duplicate search engine results

[...]

Navin Martin Joy¹, Sally Salas¹•Institutions (1)

Microsoft¹

12 Jan 2007

TL;DR: In this paper, a system and method for removing unnecessary multiple references to a common resource such as redundant listed Uniform Resource Locators (URLs) that reference the same display URLs (and thus the same Web page) as another listed URL is presented.

...read moreread less

Abstract: The present invention is directed toward efficiently locating desired information and, more specifically, to providing a system and method for removing unnecessary multiple references to a common resource such as redundant listed Uniform Resource Locators (URLs) that reference the same display URLs (and thus the same Web page) as another listed URL. Consequently, in circumstances where only a smaller, finite number of listed results are immediately used (such as displaying only the twenty most relevant results on the first page presented to a search engine end-user), the finite number of listed results may corresponds to a greater number of unique display URLs than would otherwise occur absent this form of filtering.

...read moreread less

118 citations

Patent•

System and method for detecting malicious mobile program code

[...]

Christoph Alme¹•Institutions (1)

McAfee¹

23 Apr 2007

TL;DR: In this paper, a program file is received and analysis performed to identify URLs embedded in the program file and the URLs are categorized as a function of a URL filter database and a malware probability is assigned to each URL identified.

...read moreread less

Abstract: A system and method of detecting malware. A program file is received and analysis performed to identify URLs embedded in the program file. The URLs are categorized as a function of a URL filter database and a malware probability is assigned to each URL identified. A decision is made on how to dispose of the program file as a function of the malware probability of one or more of the URLs identified. In one example approach, a malware type is also assigned to the program file as a function of one or more of the URLs identified.

...read moreread less

71 citations

Patent•

Business-oriented search

[...]

Richard A. Heggem

12 Feb 2007

TL;DR: In this article, a system and architecture for enhancing search results generated by an Internet search engine, so that those search results include enhanced buyer-oriented information, is disclosed, where a user who submitted query terms, based upon which the list of search results was generated, can use the presented rating information to determine which of the search results to investigate further.

...read moreread less

Abstract: A system and architecture for enhancing search results generated by an Internet search engine, so that those search results include enhanced buyer-oriented information, is disclosed. According to one aspect, a list of search results generated by an Internet search engine comprises one or more search results that are associated with one or more URLs in a set of URLs. For each such URL, seller-specific information, which may be based on and/or comprise ratings that are associated with registered selling entities that are associated with that URL, is presented in association with that URL's corresponding search result in the list of search results. A user who submitted query terms, based upon which the list of search results was generated, can use the presented rating information to determine which of the search results to investigate further.

...read moreread less

50 citations

Patent•

Method for normalizing dynamic urls of web pages through hierarchical organization of urls from a web site

[...]

Krishna Prasad Chitrapura¹, Anandsudhakar Kesari¹, Alok S. Kirpal, Mahesh Tiyyagura¹•Institutions (1)

Yahoo!¹

30 Aug 2007

TL;DR: In this article, an information extraction method is used to generate data structures that represent the content or structure of each of the web pages associated with a web site, and these data structures are appended to the corresponding dynamic URLs.

...read moreread less

Abstract: Techniques are described for normalizing dynamic URLs using a hierarchical organization of a web site. Given web pages associated with a web site, an information extraction method is used to generate data structures that represent the content or structure of each of the web pages. These data structures are appended to the corresponding dynamic URLs. The modified URLs with the data structures are tokenized with the resulting tokens clustered to create a hierarchical organization. Nodes of the hierarchical organization may be merged based upon occurrence or patterns of content and structure. The merged hierarchical organization may then be pruned to remove irrelevant information and to reduce the memory footprint of the hierarchical organization. When a new dynamic URL is received, the new dynamic URL is matched to the hierarchical organization. Important parameters are taken into account and irrelevant information may be removed. Based upon the matching to the hierarchical organization, a normalized URL is returned.

...read moreread less

47 citations

Patent•

Method and system for searching using image based tagging

[...]

Kenneth Andam, James M. Jensen, Jared Weinman

1 Nov 2007

TL;DR: In this article, a system and method for searching networked electronic media is provided for search result generation in a search engine, which includes the operation of mapping a user selected keyword to a category identifier.

...read moreread less

Abstract: A system and method are provided for searching networked electronic media. The method includes the operation of mapping a user selected keyword to a category identifier for search result generation in a search engine. A search result can be produced that contains alisting of URLs for the category identifier. Then the search result can be dynamically populated with images through identity mapping of the image to a URL. The search can determine the display images to a URL based on a user's profile, like age, and the URL's popularity with other users sharing a similar user profile. The URLs and an associated image for the search can be obtained through a community of users.

...read moreread less

39 citations

Patent•

Techniques for keyword extraction from urls using statistical analysis

[...]

Krishna Leela Poola¹, Arun Ramanujapuram¹•Institutions (1)

Yahoo!¹

8 Nov 2007

TL;DR: In this paper, a technique for keyword extraction from URLs using regular expression patterns and keyword ranking has been described, where the keywords extracted from the URLs are then ranked based on any ranking methodology for better relevance and performance.

...read moreread less

Abstract: Techniques are described for keyword extraction from URLs using regular expression patterns and keyword ranking. Tokenization of URLs also generates regular expressions of URLs from a website. The regular expressions are stored in the form of any type of indexing structure. When a new URL is received, the URL is examined to determine whether the URL is from a website that has previously been tokenized. If the URL is not from such a website, then the URL is tokenized using every delimiter and unit change to extract keywords. If the URL is from a website previously processed, the corresponding regular expression is used to extract keywords from the URL. The keywords extracted from the URLs are then ranked based on any ranking methodology for better relevance and performance.

...read moreread less

34 citations

Patent•

Method and systems for using community bookmark data to supplement internet search results

[...]

Vik Singh¹, Raghu Ramakrishnan¹•Institutions (1)

Yahoo!¹

4 Dec 2007

TL;DR: In this paper, the authors proposed a method for generating overlay data to supplement search results obtained as a result of an internet search for a query provided by a user, which includes accessing a universal resource locator (URL) database having URLs that are processed.

...read moreread less

Abstract: Methods and systems for generating overlay data to supplement search results obtained as a result of an internet search for a query provided by a user. The method includes accessing a universal resource locator (URL) database having URLs that are processed. The URL database has information regarding the number of times a URL in the URL database has been bookmarked and any descriptive tags assigned to specific URLs in the URL database. Then, receiving the query provided by the user that generates search results, where each search result is associated with a URL. The method further includes, before displaying the search results, analyzing each URL of a plurality of the search results to identify if the URL is present in the accessed URL database, and applying overlay data to particular ones of the search results. The overlay data includes information regarding the number of times the URL has been bookmarked and includes particular descriptive tags from the URL database. In one embodiment, a detailed sub-query is associated with each overlay descriptive tag that includes the original query and the overlay descriptive tag.

...read moreread less

29 citations

Patent•

Extensible framework for managing UI state in a composite AJAX application

[...]

William P. Higgins¹, Walter J. Staiger¹•Institutions (1)

IBM¹

5 Nov 2007

TL;DR: In this article, a method, system and computer-usable medium are disclosed for managing the user interface (UI) state of an AJAX application by automatically binding a uniform resource locator (URL) to an application code component.

...read moreread less

Abstract: A method, system and computer-usable medium are disclosed for managing the user interface (UI) state of an AJAX application by automatically binding a uniform resource locator (URL) to an application code component. The metadata for controller functions contained in an AJAX Web page are read as it is loaded. Once loaded, the URL of the page is monitored for changes in its value. If the URL's value changes, then the value of the ‘action’ property of the changed URL is compared to the application metadata for validation. If the ‘action’ property of the changed URL does not exist in the application metadata, then the changed URL is considered invalid and its associated actions are ignored. If the ‘action’ property is valid, the function specified by the ‘action’ request parameter is called. A single object parameter is sent, with the properties of the single object parameter derived from the request parameters other than ‘action’. The function is executed and the page is updated to display the value of the object property.

...read moreread less

27 citations

Patent•

Visually Emphasizing Query Results Based on Relevance Feedback

[...]

Daniel C. Fain¹•Institutions (1)

Yahoo!¹

30 Mar 2007

TL;DR: In this paper, the authors present a process for visually emphasizing the displayed URLs in query results based on implicit relevance feedback, which detects click-through by matching the actual URL in an HTTP request emanating from a browser to an actual URL for a stored URL.

...read moreread less

Abstract: An example embodiment of the present invention provides processes for visually emphasizing the displayed URLs in query results based on implicit relevance feedback. In one process, the process identifies a web page which includes results returned by a search engine. Each result might include a displayed URL and an actual URL. The process determines whether the displayed URL matches any stored URLs which were included in previous results returned by the search engine and clicked through by the user. The process detects a click-through by matching the actual URL in an HTTP request emanating from a browser to an actual URL for a stored URL. The process visually emphasizes the displayed URL when presenting the web page to the user, if the displayed URL does not match any stored URL which has been clicked through and other factors indicate a probability the user will click through the displayed URL.

...read moreread less

26 citations

Patent•

User segment suggestion for online advertising

[...]

Min Wu¹, Chenxi Lin¹, Benyu Zhang¹, Zheng Chen¹, Jian Wang¹ - Show less +1 more•Institutions (1)

Microsoft¹

15 May 2007

TL;DR: In this article, a behavioral targeting technology for online advertising is described, by which an original attribute is uniformly expanded by aggregating users that meet the original attribute into a mid-result used to determine similarity relative to candidate attribute types.

...read moreread less

Abstract: Described is a behavioral targeting technology for online advertising, by which an original attribute is uniformly expanded. Users that meet an original attribute are aggregated into a mid-result used to determine similarity relative to candidate attribute types. The most similar candidate attributes are selected for the expanded attribute. A URL/URL pattern suggestion technology is provided, with similarity computed from users/URLs visited by the users. URLs are separated into URL tree nodes, for calculating the number of users who have visited each URL and the number of users who have visited the URL on a sub-tree whose root is the node. URL/URL patterns are output based on similarity. Domains are also suggested based on user-visits. Similarities between pairs of domains may be computed (e.g., offline), with an output for a given domain provided in based on its similarity with each other domain.

...read moreread less

Patent•

System and method for downloading hypertext markup language formatted web pages

[...]

Chung-I Lee¹, Chien-Fa Yeh¹, Chiu-Hua Lu¹, Zhi-Qiang Jiang•Institutions (1)

Foxconn¹

31 May 2007

TL;DR: In this paper, a method for downloading HTML formatted Web pages is provided, which includes the steps of writing a URL of a web page to be downloaded to an XQuery script; analyzing the XQuery scripts to obtain the URL of the HTML Web page and saving the downloaded Web page in a database as the local web page; analyzing contents of the local Web page to obtain target contents; converting the relative URLs of all image files to the absolute URLs; downloading all the image files according to theabsolute URLs.

...read moreread less

Abstract: A method for downloading HTML formatted Web pages is provided. The method includes the steps of writing a URL of a Web page to be downloaded to an XQuery script; analyzing the XQuery script to obtain the URL of the HTML Web page and saving the downloaded Web page in a database as the local Web page; analyzing the contents of the local Web page to obtain target contents; converting the relative URLs of all image files to the absolute URLs; downloading all the image files according to the absolute URLs; replacing the absolute URLs of the image files with an local image file path; converting the relative URLs of the embedded links to the absolute URLs of the embedded links; saving all the converted absolute URLs in the database, creating identifiers; replacing the converted absolute URLs of the embedded links with an embedded link local path. A related system is also disclosed.

...read moreread less

Patent•

Systems and methods for generating a descriptive uniform resource locator (URL)

[...]

Pranav Dandekar¹, Vinit Kalra, Jan Klier•Institutions (1)

Amazon.com¹

30 Aug 2007

TL;DR: In this article, a method for generating a Uniform Resource Locator (URL) is described, which is based on the content of a web page and includes one or more tokens.

...read moreread less

Abstract: A method for generating a Uniform Resource Locator (URL) is described. Content associated with a web page is obtained. A URL is generated based on the content of the web page. The URL includes one or more tokens. The URL is limited to a token threshold. The token threshold is defined as a maximum number of words in the URL. One or more tokens are removed from the URL. The URL is associated with the web page.

...read moreread less

Patent•

Processing omission decision program for similarity analysis of url

[...]

Shoko Wada, 昌紘和田

25 Jul 2007

TL;DR: In this paper, when large amounts of URL access permission requests are made from one web page whose access has been already permitted, the next similarity decision is made, and the number of items of access requests is reduced to quickly achieve page display processing.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To solve the problem that when a PC on which Web filtering by a URL list is mounted performs access to a page using Ajax or the like, huge URL access permission requests are generated, and the display of a page is extremely delayed. SOLUTION: When large amounts of URL access permission requests are made from one Web page whose access has been already permitted, the next similarity decision is made, and the number of items of access requests is reduced to quickly achieve page display processing. Then, URL request transmission circumstances are monitored, and when it is detected that a large amount of requests have been transmitted is detected, the similarity of URL requested in the past from the same access source page with the current page is decided. Access permission decision logic is bypassed to the URL decided as the similar URL. COPYRIGHT: (C)2008,JPO&INPIT

...read moreread less

Patent•

Method and system for providing improved url mangling performance using fast re-write

[...]

Vineet Dixit¹, Siva S. Jayasenan¹, Mahadev Somasundaram¹•Institutions (1)

Cisco Systems, Inc.¹

4 Apr 2007

TL;DR: In this paper, a method and system for providing improved uniform resource locator (URL) mangling performance using fast re-write including scanning a web page, detecting an absolute URL in the web page and modifying the detected absolute URL to a corresponding relative URL.

...read moreread less

Abstract: Method and system for providing improved uniform resource locator (URL) mangling performance using fast re-write including scanning a web page, detecting an absolute URL in the web page, and modifying the detected absolute URL to a corresponding relative URL in the web page, is disclosed.

...read moreread less

Patent•

Pre-emptive URL filtering technique

[...]

Bharath Kumar Chandra Sekhar, Narasimham Kodukula

5 Sep 2007

TL;DR: In this article, a pre-emptive URL filtering technique called Disclosed is proposed to reduce the number of HTTP connections that have to be made by the browser, in situations where there is a blocked URL in the original URL set.

...read moreread less

Abstract: Disclosed is a technique for pre-emptive URL filtering. A filtering engine may be configured to receive an original set of URLs from a web server along with a main content, the original set of URLs and the main content being intended for a web browser running in a client computer. The filtering engine may be running in a gateway. The filtering engine may check the original set of URLs for blocked URLs. The filtering engine may create a reconstructed set of URLs that suppresses blocked URLs in the original set of URLs. The filtering engine may send the client computer the reconstructed, instead of the original, set of URLs. This advantageously cuts down on the number of HTTP connections that have to be made by the browser, and corresponding URL filtering at the gateway, in situations where there is a blocked URL in the original URL set.

...read moreread less

Patent•

Augmenting URL queries

[...]

Ryan Stewart¹, Girish Kumar¹•Institutions (1)

Microsoft¹

31 Aug 2007

TL;DR: In this article, the authors described computer-readable media, systems, and methods for augmenting URL queries, including word-breaking at least a portion of the URL query and associated with one or more ranking preferences.

...read moreread less

Abstract: Computer-readable media, systems, and methods for augmenting URL queries are described. In embodiments, a URL query is received from a user and it is determined whether the URL query is a simple URL query. Further, if the URL query is a simple URL query, an augmented query is created by word-breaking at least a portion of the URL query and the augmented query is associated with one or more ranking preferences. In various other embodiments, a URL query is received from a user and it is determined whether the URL query is a complex URL query. Further, if the URL query is a complex URL query, an augmented query is created that is identical to the URL query and the augmented query is associated with one or more ranking preferences.

...read moreread less

Patent•

Method for setting url filter, client device, terminal device, client/server system, and program for setting url filter

[...]

Ito Tetsushi, Masao Kawane, Tetsuya Komatsu, Kenji Monma, Hiroyoshi Sato, Hiroyuki Tajima, Takezawa Wataru, Takashi Yamada, 哲史伊東, 弘佳佐藤, 哲也小松, 隆山田, 正夫川音, 裕之田島, 弥竹澤, 謙二門間 - Show less +12 more

28 Sep 2007

TL;DR: In this article, a method for setting a URL filter to reduce the workload of a user when registering/editing a white-list in the URL filtering of a white list system is presented.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for setting a URL filter to reduce the workload of a user when registering/editing a white list in the URL filtering of a white list system. SOLUTION: This method for setting a URL filter includes: a step of filter setting release instruction for instructing a server device to release URL filter setting to itself as processing to be executed by a representative client; a step of URL information generation for acquiring the URL information of a Web page designated by a user from the Internet accessed through the server device, and for generating additional URL information as information to be added to a white list based on the acquired URL information; and a step of URL information addition instruction for instructing the server device to add the additional URL information to the white list. COPYRIGHT: (C)2009,JPO&INPIT

...read moreread less

Journal Article•

Similarity Measurement of Web Page Access Based on URL Structure and Access Time

[...]

Li Chao

01 Jan 2007-Computer Science

Patent•

Url management device and url management system

[...]

Masakazu Ikeda, 正和池田

15 Feb 2007

TL;DR: In this paper, a URL management system is characterized in that it has table data including an access URL for accessing data stored externally and a present URL showing the present location of the data and associated with the access URL.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a URL management device and a URL management system for performing access to changed data, and for preventing link disconnection from being generated even when current URL showing the current location of data stored in the outside is changed due to the movement of data. SOLUTION: A URL management system is characterized in that it has table data including an access URL for accessing data stored externally and a present URL showing the present location of the data and associated with the access URL, that when an access request signal containing the access URL is received from outside, the present URL corresponding to the access URL is obtained from the table data to check the link, and that when the present URL is not accessible, a new URL showing the changed location of the data is retrieved to connect to the new URL. COPYRIGHT: (C)2010,JPO&INPIT

...read moreread less

A Method to Block Spam Mail Automatically Through the Connection to Link URL

[...]

Nam-Cheol Jung

1 Jan 2007

TL;DR: A method whereby spam mail is automatically blocked through the connection to link URL, which blocks the electronic mail if those web pages contain any key word which was defined as a clue to spam mail.

...read moreread less

Abstract: In this paper, I developed a method whereby spam mail is automatically blocked through the connection to link URL. The blocking system works as follows. First, the system extracts information of URL linked to electronic mail which was delivered from any server on the internet. Next, the system lets itself be connected to the web pages through this URL. Last, the system blocks the electronic mail if those web pages contain any key word which was defined as a clue to spam mail.

...read moreread less

Patent•

Method of generating a web page

[...]

David A. Farber, Richard E. Greer, Andrew D. Swart, James A. Balter

30 May 2007

TL;DR: In this article, a method of generating a web page modifies uniform resource locators (URLs) of embedded resources in web pages, including data prepended to information from the original URLs.

...read moreread less

Abstract: A method of generating a web page modifies uniform resource locators (URLs) of embedded resources in a web page. The modified URLs include data prepended to information from the original URLs. The prepended data may be a hostname or an network address that is resolvable to a shared network of servers.

...read moreread less