Top 19 papers published in the topic of URL normalization in 2004

Showing papers on "URL normalization published in 2004"

Patent•

System and method for identifying and filtering junk e-mail messages or spam based on url content

[...]

9 Jul 2004

TL;DR: In this article, a method for identifying e-mail messages as being unwanted junk or spam is proposed, which identifies contact and link data, such as URL information, within the content of the received e mail message.

...read moreread less

Abstract: A method for identifying e-mail messages as being unwanted junk or spam. The method includes receiving an e-mail message and then identifying contact and link data, such as URL information, within the content of the received e-mail message. A blacklist including contact information and/or link information previously associated with spam is accessed, and the e-mail message is determined to be spam or to likely be spam based on the contents of the blacklist. The contact or link data from the received e-mail is compared to similar information in the blacklist to find a match, such as by comparing URL information from e-mail content with URLs found previously in spam. If a match is not identified, the URL information from the e-mail message is processed to classify the URL as spam or “bad.” The content indicated by the URL information is accessed and spam classifiers or statistical tools are applied.

...read moreread less

152 citations

Patent•

Spell checking URLs in a resource

[...]

Mark Joseph Hamzy¹•Institutions (1)

IBM¹

22 Nov 2004

TL;DR: In this article, spell checking of URLs in a web resource is described, and a method for suggesting an alternative spelling for the URL is proposed. But, the method is not suitable for web search engines.

...read moreread less

Abstract: Methods, systems, and computer program products are provided for spell checking URLs in a resource. Embodiments include identifying within a resource a URL, determining whether the URL is valid, and marking the URL as misspelled if the URL is invalid. In typical embodiments, determining whether the URL is valid is carried out by resolving a domain name contained in the URL. Typical embodiments also include suggesting an alternative spelling for the URL. In some embodiments, suggesting an alternative spelling for the URL is carried out by identifying a keyword in the resource, querying a search engine with the identified keyword, and selecting a URL in dependence upon search results returned by the search engine.

...read moreread less

141 citations

Patent•

Method and system for presenting links associated with a requested website

[...]

Slava Yevdayev

10 Aug 2004

TL;DR: In this article, a dynamic toolbar operates in conjunction with a web server for presenting links associated with a website requested by a web surfer at a client computer, and a Popularity Index is determined by an actual count of redirections from the URL of the source website to the respective URLs of the related websites.

...read moreread less

Abstract: A dynamic toolbar operates in conjunction with a web server for presenting links associated with a website requested by a web surfer at a client computer. The web server receives a source URL of a source website requested by the web surfer and compiles a directory of URLs of related websites that may be of interest to the web surfer for selecting therefrom a subset of URLs according to their popularity. Data representative of the subset is uploaded to the client computer for displaying by a web browser thereof. The subset of URLs is selected by accessing the directory to determine a category to which the source URL belongs and extracting from the directory respective URLs of related websites of the category. A Popularity Index is determined by an actual count of redirections from the URL of the source website to the respective URLs of the related websites.

...read moreread less

120 citations

Patent•

System and method for filtering spam messages utilizing URL filtering module

[...]

David Cowings¹, David Hoogstrate¹, Sandy Jensen¹, Art Medlar¹, Ken Schneider¹ - Show less +1 more•Institutions (1)

Symantec¹

17 Jun 2004

TL;DR: In this article, a method for filtering spam messages utilizing a URL filtering module is described. But the method is limited to spam messages and does not consider the content of the spam messages.

...read moreread less

Abstract: Systems and methods for filtering spam messages utilizing a URL filtering module are described. In one embodiment, the method includes detecting, in an incoming message, data indicative of a URL and comparing the URL from the incoming message with URLs characterizing spam. The method further includes determining whether the incoming message is spam based on the comparison of the URL from the incoming message with the URLs characterizing spam.

...read moreread less

77 citations

Patent•

Advanced URL and IP features

[...]

Joshua T. Goodman¹, Robert L. Rounthwaite¹, Geoffrey J. Hulten¹, Mehr John D¹, Manav Mishra¹, Anthony P. Penta¹ - Show less +2 more•Institutions (1)

Microsoft¹

28 May 2004

TL;DR: In this paper, the authors present systems and methods that facilitate spam detection and prevention at least in part by building or training filters using advanced IP address and/or URL features in connection with machine learning techniques.

...read moreread less

Abstract: Disclosed are systems and methods that facilitate spam detection and prevention at least in part by building or training filters using advanced IP address and/or URL features in connection with machine learning techniques. A variety of advanced IP address related features can be generated from performing a reverse IP lookup. Similarly, many different advanced URL based features can be created from analyzing at least a portion of any one URL detected in a message.

...read moreread less

43 citations

Patent•

System and method for URL filtering in a firewall

[...]

Jai Balasubrahmaniyan¹, Kuntal Daftary¹, Venkateswara Rao Yarlagadda¹, Krishna Kumar¹•Institutions (1)

Cisco Systems, Inc.¹

23 Sep 2004

TL;DR: In this paper, a method, system and a computer program product for managing requests for Uniform Resource Locators (URLs) in a firewall is provided, where the firewall scans for requests for URLs and extracts the URLs from the requests.

...read moreread less

Abstract: A method, system and a computer program product for managing requests for Uniform Resource Locators (URLs) in a firewall is provided. The firewall scans for requests for URLs and extracts the URLs from the requests. The firewall then checks for the URLs in an exclusive domains list. If the exclusive domains list allows the requested URLs, the firewall allows the URLs. In case the exclusive domains list disallows the requested URLs, the firewall blocks the requests for the URLs.

...read moreread less

37 citations

Patent•

Page views proxy servers

[...]

Eric O'laughlen, Sudheer Agrawal, John D. Robinson

2 Jun 2004

TL;DR: A page view field is included in an HTTP request that contains a requested URL and indicates the URL for the web page or other document from which the requested URL was obtained (either directly or indirectly) as mentioned in this paper.

...read moreread less

Abstract: A page view field is included in an HTTP request that contains a requested URL and indicates the URL for the web page or other document from which the requested URL was obtained (either directly or indirectly). Certain processes may be used to help insure that the URL included in the page view field is the URL of the web page or other document that caused the information to be requested (i.e., the web page or other document from which the requested URL was obtained, either directly or indirectly). The page view field may be used by a proxy or other server to perform processing related to a number of applications. The processing, for instance, may relate to access controls (e.g., parentally controlled accounts) or to accurately tracking frequently requested resources such as web pages.

...read moreread less

35 citations

Patent•

URL mapping with shadow page support

[...]

Walfrey Ng¹, Madeline Fok¹, Barbara Wong¹, Darl Andrew Crick¹, Yong Yuan¹ - Show less +1 more•Institutions (1)

IBM¹

29 Sep 2004

TL;DR: In this paper, a technique for managing a web page having at least one URL supporting search engine preferred Universal Resource Locator (URL) links through URL mapping and shadow page support is provided.

...read moreread less

Abstract: A technique for managing a web page having at least one URL supporting search engine preferred Universal Resource Locator (URL) links through URL mapping and shadow page support is provided. Because a search engine crawler typically does not want to crawl through dynamic URLs, a search engine friendly page would typically contain static URLs. Support is provided for obtaining the web page containing the at least one URL link and determining the at least one URL link to be of a dynamic format then converting the dynamic format of the at least one URL link into a static format. Next, a shadow page of the web page is created, containing the static format link, and placed in the shadow page repository. A web application server may then enabled to provide a URL mapping function to convert such a static URL to a desired dynamic format, based on a provided mapping file. Web administrators or developers may then define an entry in such a mapping file for each URL key that needs to be mapped.

...read moreread less

33 citations

Patent•

URL retrieval system, server and URL retrieval method for the same

[...]

Madoka Iwama¹•Institutions (1)

NEC¹

18 Jun 2004

TL;DR: In this paper, a search engine server extracts URL according to the search expression from the search server and sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection.

...read moreread less

Abstract: In a URL retrievl system and a URL retrieval method, a user is not required to assume a keyword for information to be accessed and in which even when many URL are obtained through a search, the user need not to select desired URL therefrom. A proxy search server creates a search expression using information from a user terminal. A search engine server extracts URL according to the search expression from the search server. If the number of the extracted URL exceeds a predetermined value, the search engine server sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection. From the extracted URL, the engine server obtains URL belonging to a hierarchic level selected by the user. If the number of the URL does not exceed a predetermined value, the engine server sends the URL as a retrieval result via the proxy engine server to the user terminal for user selection.

...read moreread less

33 citations

Patent•

Method and system for providing information over a network

[...]

Carl Perkins, Duane Brinson

30 Sep 2004

TL;DR: In this paper, a method of providing information over the Internet by processing erroneous URLs entered into a Web browser for their relevant word content that approximate the intended URL, and delivering useful information to the user which approximates the information that would have been provided to the users if the intended URLs was correctly entered into the Web browser.

...read moreread less

Abstract: A method of providing information over the Internet by processing erroneous URLs entered into a Web browser for their relevant word content that approximate the intended URL, and delivering useful information to the user which approximates the information that would have been provided to the user if the intended URL was correctly entered into the Web browser. Preferably, further a search is performed using that relevant word content and a search page is delivered to the user which provides the user with useful information related to the information that would have been provided if the intended URL had been correctly entered into the Web browser.

...read moreread less

30 citations

Patent•

Filtering cached content based on embedded URLs

[...]

John Lyman Ahlander¹, Mikko Valimaki¹•Institutions (1)

Blue Coat Systems¹

13 Jul 2004

TL;DR: In this paper, the authors present methods, systems, computer program products, and data structures for filtering cached content based on embedded URLs, where the computer system determines whether or not access to the cached content is to be allowed based on the embedded URL.

...read moreread less

Abstract: The present invention extends to methods, systems, computer program products, and data structures for filtering cached content based on embedded URLs. A computer system accesses a URL that corresponds to cached content. The computer system identifies an embedded URL included in the accessed URL. The embedded URL corresponds to a site that was accessed to retrieve the cached content. The computer system extracts the embedded URL from the accessed URL. The computer system determines whether or not access to the cached content is to be allowed based on the embedded URL.

...read moreread less

Patent•

Web research tool

[...]

Fonda Daniels¹, David Bruce Kumhyr¹, Dustin Kirkland¹•Institutions (1)

IBM¹

17 May 2004

TL;DR: In this paper, a web browser plug-in supports a team approach to Internet research, where an initial search, preferably by a web robot, generates an initial plurality of potentially relevant URLs, which are stored in a shared URL database.

...read moreread less

Abstract: A web browser plug-in supports a team approach to Internet research. An initial search, preferably by a web robot, generates an initial plurality of potentially relevant URLs, which are stored in a shared URL database. Team members are notified when new URLs are added to the database. Team members, optionally through an access control system, evaluate and rank the URLs for relevance. URLs are managed based on their rank, such as ordering their display and deleting non-relevant URLs. The rank of a URL may be indicated visually in a web browser, such as by displaying graphic icons adjacent its title. The method may be iterative, with additional searches conducted, preferably via additional web robots, with the additional URLs returned being evaluated, ranked, and managed in the URL database.

...read moreread less

Patent•

Search engine system supporting inclusion of unformatted search string after domain name portion of URL

[...]

Andrew R. Jassy, Udi Manber, Jonathan Leblang

23 Aug 2004

TL;DR: In this article, a web site system provides functionality for searching a repository of information, such as the World Wide Web, by including a search string at the end of a URL without any special formatting.

...read moreread less

Abstract: A web site system provides functionality for searching a repository of information, such as the World Wide Web, by including a search string at the end of a URL without any special formatting. In one embodiment, when the system receives a request for a URL of the form www.domain_name/char_string, where char_string is a character string that may include spaces and non-alphabetic characters, the system initially determines whether the character string includes a prefix that identifies the URL as a non-search-request URL. If no such prefix is present, the character string is used in its entirely as a search string to execute a search, and the results of the search are returned to the user.

...read moreread less

Patent•

Method and system to identify a previously visited universal resource locator (url) in results from a search

[...]

Fonda Daniels¹, Timothy Figgins¹, David Bruce Kumhyr¹•Institutions (1)

IBM¹

15 Oct 2004

TL;DR: In this article, a method to identify a previously visited URL in results from a search may include loading a URL personal data book collection object and identifying any matches between results from the search and any URL object references in the URL personal databook collection object.

...read moreread less

Abstract: A method to identify a previously visited URL in results from a search may include loading a URL personal databook collection object. The method may also include identifying any matches between results from the search and any URL object references in the URL personal databook collection object.

...read moreread less

Patent•

Web site access control system which queries server for URL category which is used to determine access and keeps cache of recent URL categories

[...]

John Sinclair, Ian James Pettener, Alistair Nash

9 Sep 2004

TL;DR: In this article, a local cache of categorised URLs is queried by a remote server to obtain a category for that URL, where the server only partially matches a URL and a match length is provided.

...read moreread less

Abstract: Categorising URLs during internet access, wherein a specific URL is first checked against a local cache of categorised URLs to see if it is there, if not then a remote server is queried to obtain a category for that URL. The cache is preferably structured as: a hash array comprising one or more index elements, each index element comprising a host tree pointer and a hash key derived from a stored URL. The server searches in a similar manner. The server query including the URL is formed using UDP messages including sequence numbers for message identification and time stamps for retry time outs in case a query message is lost. Where the server only partially matches a URL a match length is provided. By keeping only a limited cache a low powered device may be used to implement the categorization system. The categorization of URLs may be used parental locks on web sites and similar access control.

...read moreread less

Patent•

Categorizing uniform resource locators

[...]

John Sinclair, Ian James Pettener, Alistair Nash

9 Sep 2004

TL;DR: In this paper, a method of categorising URLs during internet access is proposed, wherein a specific URL is first checked against a local cache of categorised URLs to see if it is there, if not then a remote server is queried to generate a category for that URL.

...read moreread less

Abstract: A method of categorising URLs during internet access, wherein a specific URL is first checked against a local cache of categorised URLs to see if it is there, if not then a remote server is queried to generate a category for that URL. Preferably the categorization of URLs is used in controlling access to the internet. The cache is preferably structured as: a hash array comprising one or more index elements, each index element comprising a host tree pointer and a hash key derived from a stored URL; one or more host trees depending from the index elements, each host tree comprising one or more nodes each holding data (representative of a URL and associated category code) and pointers to a next older and next younger node.

...read moreread less

Patent•

System, method, and software to automate and assist web research tasks

[...]

Fonda Daniels¹, David Bruce Kumhyr¹, Dustin Kirkland¹•Institutions (1)

IBM¹

17 May 2004

...read moreread less

Patent•

Method for automatically blocking spam mail through connection of link url

[...]

Ahn Jae Geun, Ha Jeong Ho, Kang Su Hun

18 Aug 2004

TL;DR: In this paper, a method for automatically blocking a spam mail through the connection of a link URL (Uniform Resource Locator) is provided to automatically block the spam mail if a preset spam keyword is present after connecting to a web page by extracting the URL information linked to a received e-mail.

...read moreread less

Abstract: PURPOSE: A method for automatically blocking a spam mail through the connection of a link URL(Uniform Resource Locator) is provided to automatically block the spam mail if a preset spam keyword is present after connecting to a web page by extracting the URL information linked to a received e-mail. CONSTITUTION: The linked URL and a sender address are extracted from an original message of the received e-mail(S12). It is inquired that the sender address is the address registered to a blocking list(S14). In case that the sender address is in the blocking list, the e-mail is removed without reading a body(S34). In the case that the sender address is not in the blocking list, it is inquired that the linked URL is present in a blocking URL list(S16). In case that the linked URL is in the blocking URL list, the reception of the e-mail is refused. In case that the linked URL is not in the blocking URL list, it is inquired that the linked URL is present in a pass URL list(S18). In case that the linked URL is in the pass URL list, the e-mail is stored in a mailbox. In the case that the linked URL is not in the pass URL list, the web page is connected by using the linked URL(S20).

...read moreread less

Patent•

URL retrieval method and system

[...]

Madoka Iwama¹•Institutions (1)

NEC¹

19 Jun 2004

TL;DR: In this article, a proxy search server (200) creates a search expression using information from a user terminal (100), a search engine server (300) extracts URL according to the search expression from the search engine (200), and if the number of the extracted URL exceeds a predetermined value, the engine server sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection.

...read moreread less

Abstract: In a URL retrieval system and a URL retrieval method, a user is not required to assume a keyword for information to be accessed and in which even when many URL are obtained through a search, the user need not select desired URL therefrom. A proxy search server (200) creates a search expression using information from a user terminal (100). A search engine server (300) extracts URL according to the search expression from the search server (200). If the number of the extracted URL exceeds a predetermined value, the search engine server (300) sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection. From the extracted URL, the engine server (300) obtains URL belonging to a hierarchic level selected by the user. If the number of the URL does not exceed a predetermined value, the engine server (300) sends the URL as a retrieval result via the proxy engine server (200) to the user terminal (100) for user selection.

...read moreread less