Journal Article10.4304/JCP.6.3.474-479
Web Page Classification Using Relational Learning Algorithm and Unlabeled Data
Yanjuan Li,Maozu Guo +1 more
7
TL;DR: R-tri-training, as a new relational semi-supervised learning algorithm, is well suitable for learning in web page classification because it allows it to exploit unlabeled web pages to enhance the learning performance effectively.
read more
Abstract: Applying relational tri-training (R-tri-training for short) to web page classification is investigated in this paper. R-tri-training, as a new relational semi-supervised learning algorithm, is well suitable for learning in web page classification. The semi-supervised component of R-tri-training allows it to exploit unlabeled web pages to enhance the learning performance effectively. In addition, the relational component of R-tri-training is able to describe how the neighboring web pages are related to each other by hyperlinks. Experiments on Web-Kb dataset show that: 1) a large amount of unlabeled web pages (the unlabeled data) can be used by R-tri-training to enhance the performance of the learned hypothesis; 2) the performance of R-tri-training is better than the other algorithms compared with it.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Mining Web Logs with PLSA Based Prediction Model to Improve Web Caching Performance
TL;DR: A PLSA based prediction model is presented to predict the user access patterns and interest to extend the well-known NGRAM-GDSF caching policy and shows that the approach gets better web-access performance.
3
The Chinese Duplicate Web Pages Detection Algorithm based on Edit Distance
Junxiu An,Pengsen Cheng +1 more
TL;DR: This paper gets the definition of the largest number of common character by taking antisense concept of edit distance; it suggests that the feature string of web page built by a previous Chinese character of period in simple processing text is suggested; and it utilizes thelargest number ofcommon character to calculate the overlap factor between the feature strings of webpage.
2
An Inductive Logic Programming Algorithm Based on Artificial Bee Colony
TL;DR: Experimental results show that: 1) the proposed new fitness function can more precisely measure the quality of hypothesis and can avoid generating an over-specific rule; 2) the performance of ABCILP is better than other systems compared with it.
2
Phase transition and New Fitness Function based Genetic Inductive Logic Programming algorithm
Yanjuan Li,Maozu Guo +1 more
- 10 Jun 2012
TL;DR: Experiments show that the new method of generating initial population can effectively reduce iteration number and enhance predictive accuracy of GILP algorithm and the new fitness function measures the quality of first-order rules more precisely and avoids generating over-specific hypothesis.
1
Connection of the beam width and the learning success rate in the phase transition framework for relational learning
Yanjuan Li,Maozu Guo +1 more
- 29 May 2012
TL;DR: This paper investigates the problem that whether the low learning success rate due to phase transition can be enhanced by enlarging the beam width, and shows that beam width has almost no effect on learning success rates under phase transition framework.
References
Combining labeled and unlabeled data with co-training
Avrim Blum,Tom M. Mitchell +1 more
- 24 Jul 1998
TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
6.4K
•Proceedings Article
Proceedings of the 28th International Conference on Machine Learning, ICML 2011
Darío García-García,Ulrike von Luxburg,Raul Santos-Rodriguez +2 more
- 07 Oct 2011
4.9K
Scene Labeling by Relaxation Operations
Azriel Rosenfeld,Robert A. Hummel,Steven W. Zucker +2 more
- 01 Jun 1976
TL;DR: This paper formulates the ambiguity-reduction process in terms of iterated parallel operations (i.e., relaxation operations) performed on an array of object, identification data.
1.5K
Inverse entailment and PROGOL
TL;DR: Mode-Directed Inverse Entailment (MDIE) is introduced as a generalisation and enhancement of previous approaches for inverting deduction and an implementation of MDIE in the Progol system is described.
1.5K
Tri-training: exploiting unlabeled data using three classifiers
Zhi-Hua Zhou,Ming Li +1 more
TL;DR: Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.
1.3K