TL;DR: In this article, a method for identifying e-mail messages as being unwanted junk or spam is proposed, which identifies contact and link data, such as URL information, within the content of the received e mail message.
Abstract: A method for identifying e-mail messages as being unwanted junk or spam. The method includes receiving an e-mail message and then identifying contact and link data, such as URL information, within the content of the received e-mail message. A blacklist including contact information and/or link information previously associated with spam is accessed, and the e-mail message is determined to be spam or to likely be spam based on the contents of the blacklist. The contact or link data from the received e-mail is compared to similar information in the blacklist to find a match, such as by comparing URL information from e-mail content with URLs found previously in spam. If a match is not identified, the URL information from the e-mail message is processed to classify the URL as spam or “bad.” The content indicated by the URL information is accessed and spam classifiers or statistical tools are applied.
TL;DR: In this article, spell checking of URLs in a web resource is described, and a method for suggesting an alternative spelling for the URL is proposed. But, the method is not suitable for web search engines.
Abstract: Methods, systems, and computer program products are provided for spell checking URLs in a resource. Embodiments include identifying within a resource a URL, determining whether the URL is valid, and marking the URL as misspelled if the URL is invalid. In typical embodiments, determining whether the URL is valid is carried out by resolving a domain name contained in the URL. Typical embodiments also include suggesting an alternative spelling for the URL. In some embodiments, suggesting an alternative spelling for the URL is carried out by identifying a keyword in the resource, querying a search engine with the identified keyword, and selecting a URL in dependence upon search results returned by the search engine.
TL;DR: In this article, a dynamic toolbar operates in conjunction with a web server for presenting links associated with a website requested by a web surfer at a client computer, and a Popularity Index is determined by an actual count of redirections from the URL of the source website to the respective URLs of the related websites.
Abstract: A dynamic toolbar operates in conjunction with a web server for presenting links associated with a website requested by a web surfer at a client computer. The web server receives a source URL of a source website requested by the web surfer and compiles a directory of URLs of related websites that may be of interest to the web surfer for selecting therefrom a subset of URLs according to their popularity. Data representative of the subset is uploaded to the client computer for displaying by a web browser thereof. The subset of URLs is selected by accessing the directory to determine a category to which the source URL belongs and extracting from the directory respective URLs of related websites of the category. A Popularity Index is determined by an actual count of redirections from the URL of the source website to the respective URLs of the related websites.
TL;DR: In this article, a method for filtering spam messages utilizing a URL filtering module is described. But the method is limited to spam messages and does not consider the content of the spam messages.
Abstract: Systems and methods for filtering spam messages utilizing a URL filtering module are described. In one embodiment, the method includes detecting, in an incoming message, data indicative of a URL and comparing the URL from the incoming message with URLs characterizing spam. The method further includes determining whether the incoming message is spam based on the comparison of the URL from the incoming message with the URLs characterizing spam.
TL;DR: In this paper, the authors present systems and methods that facilitate spam detection and prevention at least in part by building or training filters using advanced IP address and/or URL features in connection with machine learning techniques.
Abstract: Disclosed are systems and methods that facilitate spam detection and prevention at least in part by building or training filters using advanced IP address and/or URL features in connection with machine learning techniques. A variety of advanced IP address related features can be generated from performing a reverse IP lookup. Similarly, many different advanced URL based features can be created from analyzing at least a portion of any one URL detected in a message.
TL;DR: In this paper, a method, system and a computer program product for managing requests for Uniform Resource Locators (URLs) in a firewall is provided, where the firewall scans for requests for URLs and extracts the URLs from the requests.
Abstract: A method, system and a computer program product for managing requests for Uniform Resource Locators (URLs) in a firewall is provided. The firewall scans for requests for URLs and extracts the URLs from the requests. The firewall then checks for the URLs in an exclusive domains list. If the exclusive domains list allows the requested URLs, the firewall allows the URLs. In case the exclusive domains list disallows the requested URLs, the firewall blocks the requests for the URLs.
TL;DR: A page view field is included in an HTTP request that contains a requested URL and indicates the URL for the web page or other document from which the requested URL was obtained (either directly or indirectly) as mentioned in this paper.
Abstract: A page view field is included in an HTTP request that contains a requested URL and indicates the URL for the web page or other document from which the requested URL was obtained (either directly or indirectly). Certain processes may be used to help insure that the URL included in the page view field is the URL of the web page or other document that caused the information to be requested (i.e., the web page or other document from which the requested URL was obtained, either directly or indirectly). The page view field may be used by a proxy or other server to perform processing related to a number of applications. The processing, for instance, may relate to access controls (e.g., parentally controlled accounts) or to accurately tracking frequently requested resources such as web pages.
TL;DR: In this paper, a technique for managing a web page having at least one URL supporting search engine preferred Universal Resource Locator (URL) links through URL mapping and shadow page support is provided.
Abstract: A technique for managing a web page having at least one URL supporting search engine preferred Universal Resource Locator (URL) links through URL mapping and shadow page support is provided. Because a search engine crawler typically does not want to crawl through dynamic URLs, a search engine friendly page would typically contain static URLs. Support is provided for obtaining the web page containing the at least one URL link and determining the at least one URL link to be of a dynamic format then converting the dynamic format of the at least one URL link into a static format. Next, a shadow page of the web page is created, containing the static format link, and placed in the shadow page repository. A web application server may then enabled to provide a URL mapping function to convert such a static URL to a desired dynamic format, based on a provided mapping file. Web administrators or developers may then define an entry in such a mapping file for each URL key that needs to be mapped.
TL;DR: In this paper, a search engine server extracts URL according to the search expression from the search server and sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection.
Abstract: In a URL retrievl system and a URL retrieval method, a user is not required to assume a keyword for information to be accessed and in which even when many URL are obtained through a search, the user need not to select desired URL therefrom. A proxy search server creates a search expression using information from a user terminal. A search engine server extracts URL according to the search expression from the search server. If the number of the extracted URL exceeds a predetermined value, the search engine server sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection. From the extracted URL, the engine server obtains URL belonging to a hierarchic level selected by the user. If the number of the URL does not exceed a predetermined value, the engine server sends the URL as a retrieval result via the proxy engine server to the user terminal for user selection.
TL;DR: In this paper, a method of providing information over the Internet by processing erroneous URLs entered into a Web browser for their relevant word content that approximate the intended URL, and delivering useful information to the user which approximates the information that would have been provided to the users if the intended URLs was correctly entered into the Web browser.
Abstract: A method of providing information over the Internet by processing erroneous URLs entered into a Web browser for their relevant word content that approximate the intended URL, and delivering useful information to the user which approximates the information that would have been provided to the user if the intended URL was correctly entered into the Web browser. Preferably, further a search is performed using that relevant word content and a search page is delivered to the user which provides the user with useful information related to the information that would have been provided if the intended URL had been correctly entered into the Web browser.
TL;DR: In this paper, the authors present methods, systems, computer program products, and data structures for filtering cached content based on embedded URLs, where the computer system determines whether or not access to the cached content is to be allowed based on the embedded URL.
Abstract: The present invention extends to methods, systems, computer program products, and data structures for filtering cached content based on embedded URLs. A computer system accesses a URL that corresponds to cached content. The computer system identifies an embedded URL included in the accessed URL. The embedded URL corresponds to a site that was accessed to retrieve the cached content. The computer system extracts the embedded URL from the accessed URL. The computer system determines whether or not access to the cached content is to be allowed based on the embedded URL.
TL;DR: In this paper, a web browser plug-in supports a team approach to Internet research, where an initial search, preferably by a web robot, generates an initial plurality of potentially relevant URLs, which are stored in a shared URL database.
Abstract: A web browser plug-in supports a team approach to Internet research. An initial search, preferably by a web robot, generates an initial plurality of potentially relevant URLs, which are stored in a shared URL database. Team members are notified when new URLs are added to the database. Team members, optionally through an access control system, evaluate and rank the URLs for relevance. URLs are managed based on their rank, such as ordering their display and deleting non-relevant URLs. The rank of a URL may be indicated visually in a web browser, such as by displaying graphic icons adjacent its title. The method may be iterative, with additional searches conducted, preferably via additional web robots, with the additional URLs returned being evaluated, ranked, and managed in the URL database.
TL;DR: In this article, a web site system provides functionality for searching a repository of information, such as the World Wide Web, by including a search string at the end of a URL without any special formatting.
Abstract: A web site system provides functionality for searching a repository of information, such as the World Wide Web, by including a search string at the end of a URL without any special formatting. In one embodiment, when the system receives a request for a URL of the form www.domain_name/char_string, where char_string is a character string that may include spaces and non-alphabetic characters, the system initially determines whether the character string includes a prefix that identifies the URL as a non-search-request URL. If no such prefix is present, the character string is used in its entirely as a search string to execute a search, and the results of the search are returned to the user.
TL;DR: In this article, a method to identify a previously visited URL in results from a search may include loading a URL personal data book collection object and identifying any matches between results from the search and any URL object references in the URL personal databook collection object.
Abstract: A method to identify a previously visited URL in results from a search may include loading a URL personal databook collection object. The method may also include identifying any matches between results from the search and any URL object references in the URL personal databook collection object.
TL;DR: In this article, a local cache of categorised URLs is queried by a remote server to obtain a category for that URL, where the server only partially matches a URL and a match length is provided.
Abstract: Categorising URLs during internet access, wherein a specific URL is first checked against a local cache of categorised URLs to see if it is there, if not then a remote server is queried to obtain a category for that URL. The cache is preferably structured as: a hash array comprising one or more index elements, each index element comprising a host tree pointer and a hash key derived from a stored URL. The server searches in a similar manner. The server query including the URL is formed using UDP messages including sequence numbers for message identification and time stamps for retry time outs in case a query message is lost. Where the server only partially matches a URL a match length is provided. By keeping only a limited cache a low powered device may be used to implement the categorization system. The categorization of URLs may be used parental locks on web sites and similar access control.
TL;DR: In this paper, a method of categorising URLs during internet access is proposed, wherein a specific URL is first checked against a local cache of categorised URLs to see if it is there, if not then a remote server is queried to generate a category for that URL.
Abstract: A method of categorising URLs during internet access, wherein a specific URL is first checked against a local cache of categorised URLs to see if it is there, if not then a remote server is queried to generate a category for that URL. Preferably the categorization of URLs is used in controlling access to the internet. The cache is preferably structured as: a hash array comprising one or more index elements, each index element comprising a host tree pointer and a hash key derived from a stored URL; one or more host trees depending from the index elements, each host tree comprising one or more nodes each holding data (representative of a URL and associated category code) and pointers to a next older and next younger node.
TL;DR: In this paper, a web browser plug-in supports a team approach to Internet research, where an initial search, preferably by a web robot, generates an initial plurality of potentially relevant URLs, which are stored in a shared URL database.
Abstract: A web browser plug-in supports a team approach to Internet research. An initial search, preferably by a web robot, generates an initial plurality of potentially relevant URLs, which are stored in a shared URL database. Team members are notified when new URLs are added to the database. Team members, optionally through an access control system, evaluate and rank the URLs for relevance. URLs are managed based on their rank, such as ordering their display and deleting non-relevant URLs. The rank of a URL may be indicated visually in a web browser, such as by displaying graphic icons adjacent its title. The method may be iterative, with additional searches conducted, preferably via additional web robots, with the additional URLs returned being evaluated, ranked, and managed in the URL database.
TL;DR: In this paper, a method for automatically blocking a spam mail through the connection of a link URL (Uniform Resource Locator) is provided to automatically block the spam mail if a preset spam keyword is present after connecting to a web page by extracting the URL information linked to a received e-mail.
Abstract: PURPOSE: A method for automatically blocking a spam mail through the connection of a link URL(Uniform Resource Locator) is provided to automatically block the spam mail if a preset spam keyword is present after connecting to a web page by extracting the URL information linked to a received e-mail. CONSTITUTION: The linked URL and a sender address are extracted from an original message of the received e-mail(S12). It is inquired that the sender address is the address registered to a blocking list(S14). In case that the sender address is in the blocking list, the e-mail is removed without reading a body(S34). In the case that the sender address is not in the blocking list, it is inquired that the linked URL is present in a blocking URL list(S16). In case that the linked URL is in the blocking URL list, the reception of the e-mail is refused. In case that the linked URL is not in the blocking URL list, it is inquired that the linked URL is present in a pass URL list(S18). In case that the linked URL is in the pass URL list, the e-mail is stored in a mailbox. In the case that the linked URL is not in the pass URL list, the web page is connected by using the linked URL(S20).
TL;DR: In this article, a proxy search server (200) creates a search expression using information from a user terminal (100), a search engine server (300) extracts URL according to the search expression from the search engine (200), and if the number of the extracted URL exceeds a predetermined value, the engine server sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection.
Abstract: In a URL retrieval system and a URL retrieval method, a user is not required to assume a keyword for information to be accessed and in which even when many URL are obtained through a search, the user need not select desired URL therefrom. A proxy search server (200) creates a search expression using information from a user terminal (100). A search engine server (300) extracts URL according to the search expression from the search server (200). If the number of the extracted URL exceeds a predetermined value, the search engine server (300) sends information indicating for each URL a hierarchic level to which the URL belongs via the proxy engine server to the terminal for user selection. From the extracted URL, the engine server (300) obtains URL belonging to a hierarchic level selected by the user. If the number of the URL does not exceed a predetermined value, the engine server (300) sends the URL as a retrieval result via the proxy engine server (200) to the user terminal (100) for user selection.