Web Forum Crawling Techniques
TL;DR: The various techniques of web forum crawler and challenges of crawling are discussed and the overview of web crawling and web forums is given.
read more
Abstract: The web contains large data and it contains innumerable websites that is monitored by a tool or a program known as Crawler. The main goal of this paper is to focus on the web forum crawling techniques. In this paper, the various techniques of web forum crawler and challenges of crawling are discussed. The paper also gives the overview of web crawling and web forums.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Journal of Digital Forensics, Security and Law
Lae
- 14 Nov 2014
TL;DR: This peer-reviewed, multidisciplinary Journal of Digital Forensics, Security and Law (JDFSL) focuses on the advancement of the field by publishing the state of the art in both basic and applied research conducted worldwide.
73
A survey of event detection techniques in online social networks
Anuradha Goswami,Ajey Kumar +1 more
TL;DR: A survey is done for event detection techniques in OSN based on social text streams—newswire, web forums, emails, blogs and microblogs, for natural disasters, trending or emerging topics and public opinion-based events.
70
Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation.
TL;DR: Vigi4Med Scraper is presented, a generic open source framework for extracting structured data from web forums that enables efficient manipulation by data analysis algorithms and allows the collected data to be directly linked to any existing semantic resource.
Measure the Similarity of Complaint Document Using Cosine Similarity Based on Class-Based Indexing
TL;DR: A model that can measure the identities of the Query (Incoming) with Document (Archive) with Cosine Similarities to analyse document similarities is proposed and delivers a high accuracy.
4
Profiling and tracking a cyberlocker link sharer in a public web forum
Xiao-Xi Fan,Kam-Pui Chow,Fei Xu +2 more
- 26 Jan 2015
TL;DR: This chapter describes a framework for collecting cyberlocker data from web forums and using cyberlockeder link sharing behavior to identify users, and the experimental results demonstrate that the framework provides valuable insights in investigations of cyberlockers-based piracy.
References
•Book
Mining the Web: Discovering Knowledge from Hypertext Data
Soumen Chakrabarti
- 01 Jan 2002
TL;DR: This chapter discusses the infrastructure of the Web, the future of Web mining, and applications of semi-supervised learning for text and similarity and clustering.
759
Effective web crawling
Carlos Castillo
- 01 Jun 2005
TL;DR: The World Wide Web is a context in which traditional Information Retrieval methods are challenged, and given the volume of the Web and its speed of change, the coverage of modern search engines is relatively small.
Structure-driven crawler generation by example
Márcio L. A. Vidal,Altigran Soares da Silva,Edleno Silva de Moura,João M. B. Cavalcanti +3 more
- 06 Aug 2006
TL;DR: A structure-driven approach for generating Web crawlers that requires a minimum effort from users based on navigation patterns, sequences of patterns for the links a crawler has to follow to reach the pages structurally similar to the sample page is presented.
61
Board Forum Crawling: A Web Crawling Method for Web Forum
Yan Guo,Kui Li,Kai Zhang,Gang Zhang +3 more
- 18 Dec 2006
TL;DR: A new method of board forum crawling to crawl Web forum that exploits the organized characteristics of the Web forum sites and simulates human behavior of visiting Web forums is presented.
57