Open AccessProceedings Article
A visual interactive framework for attribute discretization
Ramesh Subramonian,Ramana Venkata,Joyce Chen +2 more
- 14 Aug 1997
- pp 82-88
TL;DR: An in tegrated visual framework in which several discretization strategies can be experimented with, and which visually assists the user in intuitively determining the appropriate number and locations of intervals is presented.
read more
Abstract: Discretization is the process of dividing a continuous-valued base attribute into discrete intervals, which highlight distinct patterns in the behavior of a re lated goal attribute. In this paper, we present an in tegrated visual framework in which several discretization strategies can be experimented with, and which visually assists the user in intuitively determining the appropriate number and locations of intervals. In addition to featuring methods based on minimizing classification error or entropy, we introduce (i) an optimal algorithm that minimizes the approximation introduced by discretization and (ii) a novel algorithm that uses an unsupervised learning technique, clustering, to identify intervals. We also extend discretization to work with continuous-valued goal attributes.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning
TL;DR: A survey of discretization methods can be found in this paper, where the main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data.
553
Entropy and MDL Discretization of Continuous Variables for Bayesian Belief Networks
Ellis Clarke,Bruce A. Barton +1 more
TL;DR: The use of the partitioning algorithm resulted in more sparsely connected BBNs, than with binary partitioning, with little information loss from mapping continuous variables into discrete ones.
Facilitating data mining on a net-work of workstations
Srinivasan Parthasarathy,Ramesh Subramonian +1 more
- 01 Jan 2000
TL;DR: This paper proposes the programmable, distributed doall, a generic mechanism, similar to the doall primitive on SMPs, which schedules a set of independent tasks on a NOW, which seeks to reduce communication bandwidth requirements by allowing specification of resource requirements of the tasks at the application programming level.
17
•Book
Data reduction: discretization of numerical attributes
Jerzy W. Grzymala-Busse
- 01 Jan 2002
TL;DR: This chapter presents the taxonomy of currently developed discretization systems and describes techniques based on equal interval frequency, equal interval width, minimum class entropy, minimum description length, and clustering.
13
•Proceedings Article
Parallel Incremental 2D-Discretization on Dynamic Datasets
Srinivasan Parthasarathy,Arun Ramakrishnan +1 more
- 15 Apr 2002
TL;DR: This paper proposes a time-optimal solution to the problem of 2-dimensional discretization within a multiattribute database, and parallelize and incrementalize the algorithm so that it can dynamically maintain the required information even in the presence of data updates without re-executing the algorithm on the entire dataset.
11
References
•Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
- 01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
•Proceedings Article
Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning
Usama M. Fayyad,Keki B. Irani +1 more
- 01 Sep 1993
TL;DR: This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals.
Supervised and unsupervised discretization of continuous features
James Dougherty,Ron Kohavi,Mehran Sahami +2 more
- 09 Jul 1995
TL;DR: Binning, an unsupervised discretization method, is compared to entropy-based and purity-based methods, which are supervised algorithms, and it is found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy- based method.
•Proceedings Article
Bayesian classification (AutoClass): theory and results
Peter Cheeseman,John Stutz +1 more
- 01 Feb 1996
TL;DR: It is emphasized that no current unsupervised classi cation system can produce maximally useful results when operated alone and that it is the interaction between domain experts and the machine searching over the model space that generates new knowledge.
1.2K