Crawling deep web content through query forms

Open AccessProceedings Article

Crawling deep web content through query forms

- 23 Aug 2018

pp 629-637

18

TL;DR: This paper proposes the concept of Minimum Executable Pattern (MEP), and then presents a MEP generation method and a MEP-based Deep Web adaptive query method that outperforms existing methods in terms of query capability and applicability.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.4018/JITWE.2011010103

A Novel Architecture for Deep Web Crawler

Dilip Kumar Sharma, +1 more

- 01 Jan 2011

- International Journal of Information Tec...

TL;DR: To minimize limitations of existing deep Web crawlers, a novel architecture is proposed based on QIIIEP specifications Sharma & Sharma, 2009, which is cost effective and has features of privatized search and general search for deep Web data hidden behind html forms.

...read moreread less

30

•Proceedings Article•10.1109/SERVICES.2019.00016

A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence

Paris Koloveas, +3 more

- 08 Jul 2019

TL;DR: This work focuses on the information gathering task, and presents a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web.

...read moreread less

28

Proceedings Article•10.1109/WI-IAT.2009.96

Learning Deep Web Crawling with Diverse Features

Lu Jiang, +3 more

- 15 Sep 2009

TL;DR: This paper proposes a novel Deep Web crawling method that breaks through the assumption of plain-text search made by existing methods and outperforms the state of the art methods.

...read moreread less

25

•Proceedings Article•10.1109/SERVICES.2019.00016

A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence

Paris Koloveas, +3 more

- 14 Sep 2021

- arXiv: Cryptography and Security

TL;DR: In this paper, the authors focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web.

...read moreread less

18

Reference Book•10.1201/B19694

Big Data : Storage, Sharing, and Security

Fei Hu

- 03 May 2016

17

...

Expand

References

•Book

The Fractal Geometry of Nature

Benoit B. Mandelbrot

- 01 Jan 1982

TL;DR: This book is a blend of erudition, popularization, and exposition, and the illustrations include many superb examples of computer graphics that are works of art in their own right.

...read moreread less

26.1K

Book Chapter•10.1016/B978-155860869-6/50042-1

Distributed search over the hidden web: hierarchical database sampling and selection

Panagiotis G. Ipeirotis, +1 more

- 20 Aug 2002

TL;DR: In this paper, the authors present an algorithm to derive content summaries from "uncooperative" databases by using focused query probes, which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases.

...read moreread less

213

•Journal Article

Siphoning Hidden-Web Data through Keyword-Based Interfaces

Luciano Barbosa, +1 more

- 27 May 2010

- Journal of Information and Data Manageme...

Abstract: In this paper, we study the problem of automating the retrieval of data hidden behind simple search interfaces that accept keyword-based queries. Our goal is to automatically retrieve all available results (or, as many as possible). We propose a new approach to siphon hidden data that automatically generates a small set of representative keywords and builds queries which lead to high coverage. We evaluate our algorithms over several real Web sites. Preliminary results indicate our approach is effective: coverage of over 90% is obtained for most of the sites considered.

...read moreread less

163

•Proceedings Article

Searching for Hidden-Web Databases

Luciano Barbosa, +1 more

- 01 Jan 2005

TL;DR: A new crawling strategy to automatically locate hidden-Web databases is proposed which aims to achieve a balance between the two conflicting requirements of this problem: the need to perform a broad search while at the same time avoiding theneed to crawl a large number of irrelevant pages.

...read moreread less

149

Journal Article•10.1145/1132863.1132872

Automatic complex schema matching across Web query interfaces: A correlation mining approach

Bin He, +1 more

- 01 Mar 2006

- ACM Transactions on Database Systems

TL;DR: This article develops the DCM framework, which consists of data preprocessing, dual mining of positive and negative correlations, and finally matching construction, and develops a novel “ensemble” approach, which creates an ensemble of DCM matchers by randomizing the schema data into many trials and aggregating their ranked results by taking majority voting.

...read moreread less

125