Web Page Classification Using Relational Learning Algorithm and Unlabeled Data

doi:10.4304/JCP.6.3.474-479

Journal Article10.4304/JCP.6.3.474-479

Web Page Classification Using Relational Learning Algorithm and Unlabeled Data

Yanjuan Li, +1 more

- 03 Jan 2011

- Journal of Computers

- Vol. 6, Iss: 3, pp 474-479

7

TL;DR: R-tri-training, as a new relational semi-supervised learning algorithm, is well suitable for learning in web page classification because it allows it to exploit unlabeled web pages to enhance the learning performance effectively.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.4304/JCP.8.5.1351-1356

Mining Web Logs with PLSA Based Prediction Model to Improve Web Caching Performance

Chuibi Huang, +3 more

- 05 Jan 2013

- Journal of Computers

TL;DR: A PLSA based prediction model is presented to predict the user access patterns and interest to extend the well-known NGRAM-GDSF caching policy and shows that the approach gets better web-access performance.

...read moreread less

3

Journal Article•10.4304/JSW.8.7.1666-1670

The Chinese Duplicate Web Pages Detection Algorithm based on Edit Distance

Junxiu An, +1 more

- 07 Jan 2013

- Journal of Software

TL;DR: This paper gets the definition of the largest number of common character by taking antisense concept of edit distance; it suggests that the feature string of web page built by a previous Chinese character of period in simple processing text is suggested; and it utilizes thelargest number ofcommon character to calculate the overlap factor between the feature strings of webpage.

...read moreread less

2

Journal Article•10.4018/JITR.2019010107

An Inductive Logic Programming Algorithm Based on Artificial Bee Colony

Yanjuan Li, +2 more

- 01 Jan 2019

- Journal of Information Technology Resear...

TL;DR: Experimental results show that: 1) the proposed new fitness function can more precisely measure the quality of hypothesis and can avoid generating an over-specific rule; 2) the performance of ABCILP is better than other systems compared with it.

...read moreread less

2

Proceedings Article•10.1109/CEC.2012.6256626

Phase transition and New Fitness Function based Genetic Inductive Logic Programming algorithm

Yanjuan Li, +1 more

- 10 Jun 2012

TL;DR: Experiments show that the new method of generating initial population can effectively reduce iteration number and enhance predictive accuracy of GILP algorithm and the new fitness function measures the quality of first-order rules more precisely and avoids generating over-specific hypothesis.

...read moreread less

1

Proceedings Article•10.1109/FSKD.2012.6233833

Connection of the beam width and the learning success rate in the phase transition framework for relational learning

Yanjuan Li, +1 more

- 29 May 2012

TL;DR: This paper investigates the problem that whether the low learning success rate due to phase transition can be enhanced by enlarging the beam width, and shows that beam width has almost no effect on learning success rates under phase transition framework.

...read moreread less

References

Proceedings Article•10.1145/279943.279962

Combining labeled and unlabeled data with co-training

Avrim Blum, +1 more

- 24 Jul 1998

TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.

...read moreread less

6.4K

•Proceedings Article

Proceedings of the 28th International Conference on Machine Learning, ICML 2011

Darío García-García, +2 more

- 07 Oct 2011

4.9K

Journal Article•10.1109/TSMC.1976.4309519

Scene Labeling by Relaxation Operations

Azriel Rosenfeld, +2 more

- 01 Jun 1976

TL;DR: This paper formulates the ambiguity-reduction process in terms of iterated parallel operations (i.e., relaxation operations) performed on an array of object, identification data.

...read moreread less

1.5K

Journal Article•10.1007/BF03037227

Inverse entailment and PROGOL

Stephen Muggleton

- 01 Dec 1995

- New Generation Computing

TL;DR: Mode-Directed Inverse Entailment (MDIE) is introduced as a generalisation and enhancement of previous approaches for inverting deduction and an implementation of MDIE in the Progol system is described.

...read moreread less

1.5K

Journal Article•10.1109/TKDE.2005.186

Tri-training: exploiting unlabeled data using three classifiers

Zhi-Hua Zhou, +1 more

- 01 Nov 2005

- IEEE Transactions on Knowledge and Data ...

TL;DR: Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.

...read moreread less

1.3K

...

Expand

Web Page Classification Using Relational Learning Algorithm and Unlabeled Data

Chat with Paper

AI Agents for this Paper

Citations

Mining Web Logs with PLSA Based Prediction Model to Improve Web Caching Performance

The Chinese Duplicate Web Pages Detection Algorithm based on Edit Distance

An Inductive Logic Programming Algorithm Based on Artificial Bee Colony

Phase transition and New Fitness Function based Genetic Inductive Logic Programming algorithm

Connection of the beam width and the learning success rate in the phase transition framework for relational learning

References

Combining labeled and unlabeled data with co-training

Proceedings of the 28th International Conference on Machine Learning, ICML 2011

Scene Labeling by Relaxation Operations

Inverse entailment and PROGOL

Tri-training: exploiting unlabeled data using three classifiers

Related Papers (5)

Knowing a web page by the company it keeps

Novel Method for Improving Web Text Classifiers Performance Through Machine Learning

LWCS: A large-scale web page classification system based on anchor graph hashing

Web Mining with Relational Clustering

A Web page classification system based on a genetic algorithm using tagged-terms as features