Open AccessJournal Article
Siphoning Hidden-Web Data through Keyword-Based Interfaces
Luciano Barbosa,Juliana Freire +1 more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Web Crawling
Christopher Olston,Marc Najork +1 more
TL;DR: The fundamental challenges of web crawling are outlined and the state-of-the-art models and solutions are described, and avenues for future work are highlighted.
420
An adaptive crawler for locating hidden-Web entry points
Luciano Barbosa,Juliana Freire +1 more
- 08 May 2007
TL;DR: A new framework is proposed whereby crawlers automatically learn patterns of promising links and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup andtuning.
Structured data on the web
TL;DR: Fusion Tables is described, a recently launched data-management service that lets users create and visualize structured and easily and emphasizes the ability to collaborate with other data owners.
119
•Proceedings Article
Harnessing the Deep Web: present and future
Jayant Madhavan,Loredana Afanasiev,Lyublena Antova,Alon Halevy +3 more
- 01 Jan 2009
TL;DR: This paper reports some of the key observations in building the system that exposed content from the Deep Web to web-search users of Google.com and discusses the choice of underlying approach in exposing deep-web content in a search engine.
•Posted Content
Harnessing the Deep Web: Present and Future
TL;DR: In this article, the authors report on where they believe the Deep Web provides value and where it does not, and contrast two very different approaches to exposing Deep-Web content, the surfacing approach that we used, and the virtual integration approach that has been pursued in the data management literature.
96
References
Focused crawling: a new approach to topic-specific Web resource discovery
Soumen Chakrabarti,Martin van den Berg,Byron Dom +2 more
- 17 May 1999
TL;DR: A new hypertext resource discovery system called a Focused Crawler that is robust against large perturbations in the starting set of URLs, and capable of exploring out and discovering valuable resources that are dozens of links away from the start set, while carefully pruning the millions of pages that may lie within this same radius.
Searching the World Wide Web
Steve Lawrence,C. Lee Giles +1 more
TL;DR: The coverage and recency of the major World Wide Web search engines was analyzed, yielding some surprising results, including a lower bound on the size of the indexable Web of 320 million pages.
Keyword searching and browsing in databases using BANKS
G. Bhalotia,Arvind Hulgeri,Charuta Nakhe,Soumen Chakrabarti,Sundararajarao Sudarshan +4 more
- 26 Feb 2002
TL;DR: BANKS is described, a system which enables keyword-based search on relational databases, together with data and schema browsing, and presents an efficient heuristic algorithm for finding and ranking query results.
DBXplorer: a system for keyword-based search over relational databases
Sanjay Agrawal,Surajit Chaudhuri,Gautam Das +2 more
- 07 Aug 2002
TL;DR: DBXplorer, a system that enables keyword-based searches in relational databases using a commercial relational database and Web server and allows users to interact via a browser front-end is discussed.
White Paper: The Deep Web: Surfacing Hidden Value
TL;DR: BrightPlanet's search technology automates the process of making dozens of direct queries simultaneously using multiple-thread technology and thus is the only search technology, so far, that is capable of identifying, retrieving, qualifying, classifying, and organizing both "deep" and "surface" content.
691