Patent
Web crawler system based on grid computing, and method thereof
Song Ji Hwan,Choi Dong Hoon,Lee Yoon Joon +2 more
- 26 Dec 2008
6
TL;DR: In this article, a grid computing based web crawler system and a method thereof are presented to select grid computing resource with lowest cost by considering geographical position of a web page according to the grid computing-based web crawling method.
read more
Abstract: A grid computing based web crawler system and a method thereof are presented to select grid computing resource with lowest cost by considering geographical position of a web page According to a grid computing based web crawling method, a surface web crawler service instance is dynamically generated by calling a service web crawler service factory(151) to perform surface web crawling of a corresponding web page when the web page is a surface web and then an index of the web page is generated When the web page is a deep web, a deep web crawler service instance is dynamically generated by calling a deep web crawler service factory(152) to search a deep web search form in the corresponding web page and then the deep web search form is extracted from the deep web crawler service instance A result page is generated by inputting a query to the deep web search form and then an index of a page is generated by extracting a keyword of the result page, to be returned to a caller
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Patent
Deep web space data acquisition method and apparatus
Liu Jiping,An Luo,Wang Yong,Cai Di +3 more
- 09 Dec 2015
TL;DR: In this article, a deep web space data acquisition method and apparatus is proposed, which includes constructing a distributed system infrastructure, constructing a Web request pool, dynamically calculating task quantity and elastically allocating tasks to an acquisition engine deployed in a distributed systems, according to the allocated acquisition tasks, acquiring deep web data of text space based on an asynchronous I/O model by the acquisition engine, and storing the acquired deep Web data to a data warehouse of the distributed system.
5
Patent
Webpage presenting method and webpage presenting device
Li Da
- 24 Jul 2013
TL;DR: In this paper, the authors present a webpage presenting method and a web presenting device based on a grid-type manner. But they do not provide a detailed description of the presentation.
2
Patent
Crawling method and system for collecting deep web data complete set
Li Huan,Sun Yang,Zhou Weibin,Wu Jiang,Zhang Yuanming +4 more
- 27 Apr 2016
TL;DR: In this article, the authors proposed a crawling method and system for collecting a deep web data complete set, which comprises the steps of performing deep web search according to a key word, and obtaining a query result; if the query result overflows, the query results are segmented to obtain a feature word set, combining each feature word in the feature word sets with the key word from the last search to get a plurality of new key words.
1
Patent
Method apparatus and computer program for collating data in multi domain
Suh Sang Duk,Yoon Chang Hoon,Lee Seung Hyeon +2 more
- 24 Aug 2020
TL;DR: In this paper, the authors proposed a method for collecting data from multiple domains by a data collection apparatus, which includes at least one TOR node container, comprising of collecting domain information of the dark web sites by using a distribution crawler, and a step B of formatting the collected data according to a preset format and generating metadata with respect to collected data.
1
Patent
Deep web analysis system and method using browser simulator
Nam Kihyo,Jeong Munkweon,Ahn Sangkyu,An Seongho,Lee Heewoong +4 more
- 03 Jun 2020
TL;DR: In this article, a deep web analysis system using browser simulation and an analysis method thereof is presented, where malicious information such as malware, pornography, firearms, drugs, copy card transactions, etc. are shared and distributed in advance.
Related Papers (5)
Xiang Peisu,Tian Ke,Huang Qin-zhen +2 more
- 16 Jul 2008
Guo Weigang,Yong Zhong,Jianqin Xie +2 more
- 26 Aug 2012
Manish Kumar,Rajesh Bhatia +1 more
- 03 Mar 2016