TL;DR: In this paper, a system and method for identifying inappropriate content in websites on a network is presented, where unrecognized uniform resource locators (URLs) or other web content are accessed by workstations and are identified as possibly having malicious content.
Abstract: A system and method are provided for identifying inappropriate content in websites on a network. Unrecognized uniform resource locators (URLs) or other web content are accessed by workstations and are identified as possibly having malicious content. The URLs or web content may be preprocessed within a gateway server module or some other software module to collect additional information related to the URLs. The URLs may be scanned for known attack signatures, and if any are found, they may be tagged as candidate URLs in need of further analysis by a classification module.
TL;DR: In this paper, a system and method for removing unnecessary multiple references to a common resource such as redundant listed Uniform Resource Locators (URLs) that reference the same display URLs (and thus the same Web page) as another listed URL is presented.
Abstract: The present invention is directed toward efficiently locating desired information and, more specifically, to providing a system and method for removing unnecessary multiple references to a common resource such as redundant listed Uniform Resource Locators (URLs) that reference the same display URLs (and thus the same Web page) as another listed URL. Consequently, in circumstances where only a smaller, finite number of listed results are immediately used (such as displaying only the twenty most relevant results on the first page presented to a search engine end-user), the finite number of listed results may corresponds to a greater number of unique display URLs than would otherwise occur absent this form of filtering.
TL;DR: In this paper, a program file is received and analysis performed to identify URLs embedded in the program file and the URLs are categorized as a function of a URL filter database and a malware probability is assigned to each URL identified.
Abstract: A system and method of detecting malware. A program file is received and analysis performed to identify URLs embedded in the program file. The URLs are categorized as a function of a URL filter database and a malware probability is assigned to each URL identified. A decision is made on how to dispose of the program file as a function of the malware probability of one or more of the URLs identified. In one example approach, a malware type is also assigned to the program file as a function of one or more of the URLs identified.
TL;DR: In this article, a system and architecture for enhancing search results generated by an Internet search engine, so that those search results include enhanced buyer-oriented information, is disclosed, where a user who submitted query terms, based upon which the list of search results was generated, can use the presented rating information to determine which of the search results to investigate further.
Abstract: A system and architecture for enhancing search results generated by an Internet search engine, so that those search results include enhanced buyer-oriented information, is disclosed. According to one aspect, a list of search results generated by an Internet search engine comprises one or more search results that are associated with one or more URLs in a set of URLs. For each such URL, seller-specific information, which may be based on and/or comprise ratings that are associated with registered selling entities that are associated with that URL, is presented in association with that URL's corresponding search result in the list of search results. A user who submitted query terms, based upon which the list of search results was generated, can use the presented rating information to determine which of the search results to investigate further.
TL;DR: In this article, an information extraction method is used to generate data structures that represent the content or structure of each of the web pages associated with a web site, and these data structures are appended to the corresponding dynamic URLs.
Abstract: Techniques are described for normalizing dynamic URLs using a hierarchical organization of a web site. Given web pages associated with a web site, an information extraction method is used to generate data structures that represent the content or structure of each of the web pages. These data structures are appended to the corresponding dynamic URLs. The modified URLs with the data structures are tokenized with the resulting tokens clustered to create a hierarchical organization. Nodes of the hierarchical organization may be merged based upon occurrence or patterns of content and structure. The merged hierarchical organization may then be pruned to remove irrelevant information and to reduce the memory footprint of the hierarchical organization. When a new dynamic URL is received, the new dynamic URL is matched to the hierarchical organization. Important parameters are taken into account and irrelevant information may be removed. Based upon the matching to the hierarchical organization, a normalized URL is returned.
TL;DR: In this article, a system and method for searching networked electronic media is provided for search result generation in a search engine, which includes the operation of mapping a user selected keyword to a category identifier.
Abstract: A system and method are provided for searching networked electronic media. The method includes the operation of mapping a user selected keyword to a category identifier for search result generation in a search engine. A search result can be produced that contains alisting of URLs for the category identifier. Then the search result can be dynamically populated with images through identity mapping of the image to a URL. The search can determine the display images to a URL based on a user's profile, like age, and the URL's popularity with other users sharing a similar user profile. The URLs and an associated image for the search can be obtained through a community of users.
TL;DR: In this paper, a technique for keyword extraction from URLs using regular expression patterns and keyword ranking has been described, where the keywords extracted from the URLs are then ranked based on any ranking methodology for better relevance and performance.
Abstract: Techniques are described for keyword extraction from URLs using regular expression patterns and keyword ranking. Tokenization of URLs also generates regular expressions of URLs from a website. The regular expressions are stored in the form of any type of indexing structure. When a new URL is received, the URL is examined to determine whether the URL is from a website that has previously been tokenized. If the URL is not from such a website, then the URL is tokenized using every delimiter and unit change to extract keywords. If the URL is from a website previously processed, the corresponding regular expression is used to extract keywords from the URL. The keywords extracted from the URLs are then ranked based on any ranking methodology for better relevance and performance.
TL;DR: In this paper, the authors proposed a method for generating overlay data to supplement search results obtained as a result of an internet search for a query provided by a user, which includes accessing a universal resource locator (URL) database having URLs that are processed.
Abstract: Methods and systems for generating overlay data to supplement search results obtained as a result of an internet search for a query provided by a user. The method includes accessing a universal resource locator (URL) database having URLs that are processed. The URL database has information regarding the number of times a URL in the URL database has been bookmarked and any descriptive tags assigned to specific URLs in the URL database. Then, receiving the query provided by the user that generates search results, where each search result is associated with a URL. The method further includes, before displaying the search results, analyzing each URL of a plurality of the search results to identify if the URL is present in the accessed URL database, and applying overlay data to particular ones of the search results. The overlay data includes information regarding the number of times the URL has been bookmarked and includes particular descriptive tags from the URL database. In one embodiment, a detailed sub-query is associated with each overlay descriptive tag that includes the original query and the overlay descriptive tag.
TL;DR: In this article, a method, system and computer-usable medium are disclosed for managing the user interface (UI) state of an AJAX application by automatically binding a uniform resource locator (URL) to an application code component.
Abstract: A method, system and computer-usable medium are disclosed for managing the user interface (UI) state of an AJAX application by automatically binding a uniform resource locator (URL) to an application code component. The metadata for controller functions contained in an AJAX Web page are read as it is loaded. Once loaded, the URL of the page is monitored for changes in its value. If the URL's value changes, then the value of the ‘action’ property of the changed URL is compared to the application metadata for validation. If the ‘action’ property of the changed URL does not exist in the application metadata, then the changed URL is considered invalid and its associated actions are ignored. If the ‘action’ property is valid, the function specified by the ‘action’ request parameter is called. A single object parameter is sent, with the properties of the single object parameter derived from the request parameters other than ‘action’. The function is executed and the page is updated to display the value of the object property.
TL;DR: In this paper, the authors present a process for visually emphasizing the displayed URLs in query results based on implicit relevance feedback, which detects click-through by matching the actual URL in an HTTP request emanating from a browser to an actual URL for a stored URL.
Abstract: An example embodiment of the present invention provides processes for visually emphasizing the displayed URLs in query results based on implicit relevance feedback. In one process, the process identifies a web page which includes results returned by a search engine. Each result might include a displayed URL and an actual URL. The process determines whether the displayed URL matches any stored URLs which were included in previous results returned by the search engine and clicked through by the user. The process detects a click-through by matching the actual URL in an HTTP request emanating from a browser to an actual URL for a stored URL. The process visually emphasizes the displayed URL when presenting the web page to the user, if the displayed URL does not match any stored URL which has been clicked through and other factors indicate a probability the user will click through the displayed URL.
TL;DR: In this article, a behavioral targeting technology for online advertising is described, by which an original attribute is uniformly expanded by aggregating users that meet the original attribute into a mid-result used to determine similarity relative to candidate attribute types.
Abstract: Described is a behavioral targeting technology for online advertising, by which an original attribute is uniformly expanded. Users that meet an original attribute are aggregated into a mid-result used to determine similarity relative to candidate attribute types. The most similar candidate attributes are selected for the expanded attribute. A URL/URL pattern suggestion technology is provided, with similarity computed from users/URLs visited by the users. URLs are separated into URL tree nodes, for calculating the number of users who have visited each URL and the number of users who have visited the URL on a sub-tree whose root is the node. URL/URL patterns are output based on similarity. Domains are also suggested based on user-visits. Similarities between pairs of domains may be computed (e.g., offline), with an output for a given domain provided in based on its similarity with each other domain.
TL;DR: In this paper, a method for downloading HTML formatted Web pages is provided, which includes the steps of writing a URL of a web page to be downloaded to an XQuery script; analyzing the XQuery scripts to obtain the URL of the HTML Web page and saving the downloaded Web page in a database as the local web page; analyzing contents of the local Web page to obtain target contents; converting the relative URLs of all image files to the absolute URLs; downloading all the image files according to theabsolute URLs.
Abstract: A method for downloading HTML formatted Web pages is provided. The method includes the steps of writing a URL of a Web page to be downloaded to an XQuery script; analyzing the XQuery script to obtain the URL of the HTML Web page and saving the downloaded Web page in a database as the local Web page; analyzing the contents of the local Web page to obtain target contents; converting the relative URLs of all image files to the absolute URLs; downloading all the image files according to the absolute URLs; replacing the absolute URLs of the image files with an local image file path; converting the relative URLs of the embedded links to the absolute URLs of the embedded links; saving all the converted absolute URLs in the database, creating identifiers; replacing the converted absolute URLs of the embedded links with an embedded link local path. A related system is also disclosed.
TL;DR: In this article, a method for generating a Uniform Resource Locator (URL) is described, which is based on the content of a web page and includes one or more tokens.
Abstract: A method for generating a Uniform Resource Locator (URL) is described. Content associated with a web page is obtained. A URL is generated based on the content of the web page. The URL includes one or more tokens. The URL is limited to a token threshold. The token threshold is defined as a maximum number of words in the URL. One or more tokens are removed from the URL. The URL is associated with the web page.
TL;DR: In this paper, when large amounts of URL access permission requests are made from one web page whose access has been already permitted, the next similarity decision is made, and the number of items of access requests is reduced to quickly achieve page display processing.
Abstract: PROBLEM TO BE SOLVED: To solve the problem that when a PC on which Web filtering by a URL list is mounted performs access to a page using Ajax or the like, huge URL access permission requests are generated, and the display of a page is extremely delayed. SOLUTION: When large amounts of URL access permission requests are made from one Web page whose access has been already permitted, the next similarity decision is made, and the number of items of access requests is reduced to quickly achieve page display processing. Then, URL request transmission circumstances are monitored, and when it is detected that a large amount of requests have been transmitted is detected, the similarity of URL requested in the past from the same access source page with the current page is decided. Access permission decision logic is bypassed to the URL decided as the similar URL. COPYRIGHT: (C)2008,JPO&INPIT
TL;DR: In this paper, a method and system for providing improved uniform resource locator (URL) mangling performance using fast re-write including scanning a web page, detecting an absolute URL in the web page and modifying the detected absolute URL to a corresponding relative URL.
Abstract: Method and system for providing improved uniform resource locator (URL) mangling performance using fast re-write including scanning a web page, detecting an absolute URL in the web page, and modifying the detected absolute URL to a corresponding relative URL in the web page, is disclosed.
TL;DR: In this article, a pre-emptive URL filtering technique called Disclosed is proposed to reduce the number of HTTP connections that have to be made by the browser, in situations where there is a blocked URL in the original URL set.
Abstract: Disclosed is a technique for pre-emptive URL filtering. A filtering engine may be configured to receive an original set of URLs from a web server along with a main content, the original set of URLs and the main content being intended for a web browser running in a client computer. The filtering engine may be running in a gateway. The filtering engine may check the original set of URLs for blocked URLs. The filtering engine may create a reconstructed set of URLs that suppresses blocked URLs in the original set of URLs. The filtering engine may send the client computer the reconstructed, instead of the original, set of URLs. This advantageously cuts down on the number of HTTP connections that have to be made by the browser, and corresponding URL filtering at the gateway, in situations where there is a blocked URL in the original URL set.
TL;DR: In this article, the authors described computer-readable media, systems, and methods for augmenting URL queries, including word-breaking at least a portion of the URL query and associated with one or more ranking preferences.
Abstract: Computer-readable media, systems, and methods for augmenting URL queries are described. In embodiments, a URL query is received from a user and it is determined whether the URL query is a simple URL query. Further, if the URL query is a simple URL query, an augmented query is created by word-breaking at least a portion of the URL query and the augmented query is associated with one or more ranking preferences. In various other embodiments, a URL query is received from a user and it is determined whether the URL query is a complex URL query. Further, if the URL query is a complex URL query, an augmented query is created that is identical to the URL query and the augmented query is associated with one or more ranking preferences.
TL;DR: In this article, a method for setting a URL filter to reduce the workload of a user when registering/editing a white-list in the URL filtering of a white list system is presented.
Abstract: PROBLEM TO BE SOLVED: To provide a method for setting a URL filter to reduce the workload of a user when registering/editing a white list in the URL filtering of a white list system. SOLUTION: This method for setting a URL filter includes: a step of filter setting release instruction for instructing a server device to release URL filter setting to itself as processing to be executed by a representative client; a step of URL information generation for acquiring the URL information of a Web page designated by a user from the Internet accessed through the server device, and for generating additional URL information as information to be added to a white list based on the acquired URL information; and a step of URL information addition instruction for instructing the server device to add the additional URL information to the white list. COPYRIGHT: (C)2009,JPO&INPIT
TL;DR: In this paper, a URL management system is characterized in that it has table data including an access URL for accessing data stored externally and a present URL showing the present location of the data and associated with the access URL.
Abstract: PROBLEM TO BE SOLVED: To provide a URL management device and a URL management system for performing access to changed data, and for preventing link disconnection from being generated even when current URL showing the current location of data stored in the outside is changed due to the movement of data. SOLUTION: A URL management system is characterized in that it has table data including an access URL for accessing data stored externally and a present URL showing the present location of the data and associated with the access URL, that when an access request signal containing the access URL is received from outside, the present URL corresponding to the access URL is obtained from the table data to check the link, and that when the present URL is not accessible, a new URL showing the changed location of the data is retrieved to connect to the new URL. COPYRIGHT: (C)2010,JPO&INPIT
TL;DR: A method whereby spam mail is automatically blocked through the connection to link URL, which blocks the electronic mail if those web pages contain any key word which was defined as a clue to spam mail.
Abstract: In this paper, I developed a method whereby spam mail is automatically blocked through the connection to link URL. The blocking system works as follows. First, the system extracts information of URL linked to electronic mail which was delivered from any server on the internet. Next, the system lets itself be connected to the web pages through this URL. Last, the system blocks the electronic mail if those web pages contain any key word which was defined as a clue to spam mail.
TL;DR: In this article, a method of generating a web page modifies uniform resource locators (URLs) of embedded resources in web pages, including data prepended to information from the original URLs.
Abstract: A method of generating a web page modifies uniform resource locators (URLs) of embedded resources in a web page. The modified URLs include data prepended to information from the original URLs. The prepended data may be a hostname or an network address that is resolvable to a shared network of servers.