Proceedings Article10.1109/DCABES.2011.19
A Web Page Classification Algorithm Based on Link Information
Zhaohui Xu,Fuliang Yan,Jie Qin,Haifeng Zhu +3 more
- 14 Oct 2011
- pp 82-86
19
TL;DR: A web page classification algorithm, Link Information Categorization (LIC), based on the K nearest neighbor method, which combines information on the website features, to implement the Web page link to information classification.
read more
Abstract: Effective classification of web pages can improve the quality of information retrieval. The traditional classification algorithms are basically based on the analysis of Web content, but the content of the web page is complicated, filled with a large number of false, erroneous information, has seriously affected the accuracy of the classification of network information. To solve this problem, this paper presents a web page classification algorithm, Link Information Categorization(LIC). Based on the K nearest neighbor method, it combines information on the website features, to implement the Web page link to information classification. Experiments show that the algorithm can get higher efficiency and accuracy on the Web page classification.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Review of Machine Learning Algorithms for Web Page Classification
Lassri Safae,Benlahmar El Habib,Tragha Abderrahim +2 more
- 01 Oct 2018
TL;DR: The characteristics of web page classification are presented, the different machine learning algorithms used to categorize web pages are reviewed, and some assumptions from the studied methods are tracked.
23
Categorisation of web pages for protection against inappropriate content in the internet
TL;DR: A framework for automated categorisation of web pages to protect against inappropriate content is outlined and the categorisation system developed are planned to be partially implemented in F-Secure Corporation in mass production systems performing analysis of web content.
21
Analysis and Evaluation of Web Pages Classification Techniques for Inappropriate Content Blocking
Igor Kotenko,Andrey Chechulin,Andrey Shorov,Dmitry Komashinsky +3 more
- 16 Jul 2014
TL;DR: The paper considers the problem of automated categorization of web sites for systems used to block web pages that contain inappropriate content, and applies the techniques of analysis of the text, html tags, URL addresses and other information using Machine Learning and Data Mining methods.
19
A Simple Study of Webpage Text Classification Algorithms for Arabic and English Languages
Sumaia Mohammed Al-Ghuribi,Saleh Alshomrani +1 more
- 01 Dec 2013
TL;DR: In this survey, the widely used algorithms for text classification are given with a comparison of the recent researches in classification field for Arabic and English languages to conclude which is the best algorithm that the authors can apply for both Arabic andEnglish Languages.
18
Web Pages Classification with Parliamentary Optimization Algorithm
Soner Kiziloluk,Ahmet Bedri Özer +1 more
TL;DR: This study is the first to propose the parliamentary optimization algorithm (POA) for effective Web page classifica, which was found to yield promising results compared to the other algorithms.
11
References
Web page classification: Features and algorithms
Xiaoguang Qi,Brian D. Davison +1 more
TL;DR: As work in Web page classification is reviewed, the importance of these Web-specific features and algorithms are noted, state-of-the-art practices are described, and the underlying assumptions behind the use of information from neighboring pages are tracked.
551
•Journal Article
Dimension Reduction in Text Classification with Support Vector Machines
TL;DR: Novel dimension reduction methods to reduce the dimension of the document vectors dramatically are adopted and decision functions for the centroid-based classification algorithm and support vector classifiers are introduced to handle the classification problem where a document may belong to multiple classes.
Advances in Machine Learning Based Text Categorization
Su Jinshu,Zhang Bofeng,Xu Xin +2 more
TL;DR: It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization.
Dimension Reduction in Text Classification with Support Vector Machines
KimHyunsoo,HowlandPeg,ParkHaesun +2 more
TL;DR: Support vector machines have been recognized as one of the most successful classification methods for many applications including text classification, and the learning ability and co-operation of these machines is improving.
112
•Journal Article
Automated text classification model based on projection pursuit regression
TL;DR: This paper presents an automated text classification model based on projection pursuit regression that can describe the external disciplinarian of high-dimensional data, and increase the precision of text classification.
4
Related Papers (5)
Huaxin Li,Zhaoxin Zhang,Yongdong Xu +2 more
- 25 May 2019
Xiaoguang Qi,Brian D. Davison +1 more
- 06 Nov 2006