Distributed Data Mining vs. Sampling Techniques: A Comparison

doi:10.1007/978-3-540-24840-8_37

Book Chapter10.1007/978-3-540-24840-8_37

Distributed Data Mining vs. Sampling Techniques: A Comparison

Mohamed Aounallah, +2 more

- 17 May 2004

- Lecture Notes in Computer Science

- pp 454-460

8

TL;DR: An overview of the most common sampling techniques and a new technique of distributed data-mining based on rule set models, where the aggregation technique is based on a confidence coefficient associated with each rule and on very small samples from each database.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

An Analysis of the Predictive Capability of C5.0 and Chaid Decision Trees and Bayes Net in the Classification of fatal Traffic Accidents in the UK

Aiden O'Connor

- 01 Jan 2015

10

Book Chapter•10.1007/978-3-642-31488-9_3

Research on application of data mining methods to diagnosing gastric cancer

Arnis Kirshners, +2 more

- 13 Jul 2012

TL;DR: This research reveals several possibilities of application of data mining methods to diagnosing gastric cancer, which is the fourth leading cancer type in incidence after the breast, lung and colorectal cancers.

...read moreread less

6

Book Chapter•10.1007/978-3-642-13541-5_11

Distributed data mining system based on multi-agent communication mechanism

Sung Gook Kim, +3 more

- 23 Jun 2010

TL;DR: This paper presents an overview of a distributed data mining system developed according to two approaches; 1) distributed data modeling and 2) distributed decision making.

...read moreread less

3

Le forage distribué des données : une approche basée sur l'agrégation et le raffinement de modèles

Mohamed Aoun-Allah

- 01 Jan 2006

TL;DR: This research proposes a distributed data mining approach, aggregating and refining models from geographically dispersed sites to create a metaclassifier, improving efficiency and providing a unified view of the data set.

...read moreread less

3

Journal Article•10.1016/J.CSL.2019.101020

Adaptive scheduling for adaptive sampling in pos taggers construction

Manuel Vilares Ferro, +2 more

- 01 Mar 2020

- Computer Speech & Language

TL;DR: An adaptive scheduling for adaptive sampling as a novel way of machine learning in the construction of part-of-speech taggers by analyzes the shape of the learning curve geometrically in conjunction with a functional model to increase or decrease it at any time.

...read moreread less

3

References

UCI Repository of machine learning databases

Catherine Blake

- 01 Jan 1998

14.1K

•Posted Content

A Sequential Algorithm for Training Text Classifiers

David D. Lewis, +1 more

- 24 Jul 1994

- arXiv: Computation and Language

TL;DR: An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task and reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

...read moreread less

2.7K

•Journal Article•10.1613/JAIR.279

Improved use of continuous attributes in C4.5

J. R. Quinlan

- 01 Jan 1996

- Journal of Artificial Intelligence Resea...

TL;DR: A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes with an MDL-inspired penalty, leading to smaller decision trees with higher predictive accuracies.

...read moreread less

1.9K

•Proceedings Article•10.5555/188490.188495

A sequential algorithm for training text classifiers

David D. Lewis, +1 more

- 01 Aug 1994

TL;DR: In this article, an algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task, which reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

...read moreread less

1.9K

Cancer Diagnosis Via Linear Programming

Olvi L. Mangasarian, +1 more

- 01 Jan 1990

678

Distributed Data Mining vs. Sampling Techniques: A Comparison

Chat with Paper

AI Agents for this Paper

Citations

An Analysis of the Predictive Capability of C5.0 and Chaid Decision Trees and Bayes Net in the Classification of fatal Traffic Accidents in the UK

Research on application of data mining methods to diagnosing gastric cancer

Distributed data mining system based on multi-agent communication mechanism

Le forage distribué des données : une approche basée sur l'agrégation et le raffinement de modèles

Adaptive scheduling for adaptive sampling in pos taggers construction

References

UCI Repository of machine learning databases

A Sequential Algorithm for Training Text Classifiers

Improved use of continuous attributes in C4.5

A sequential algorithm for training text classifiers

Cancer Diagnosis Via Linear Programming

Related Papers (5)

Stratified Sampling for Data Mining on the Deep Web

On static and dynamic methods for condensation-based privacy-preserving data mining

Distance functions in dynamic integration of data mining techniques

Distributed Multi-class Rule Based Classification Using RIPPER

A new approach for generating efficient sample from market basket data