Journal Article10.1023/A:1009735908398
Mathematical Programming in Data Mining
147
TL;DR: A novel approach is proposed that purposely tolerates a small error in the training process in order to avoid overfitting data that may contain errors and is utilized to discover very useful survival curves for breast cancer patients from a medical database.
read more
Abstract: Mathematical programming approaches to three fundamental problems will be described: feature selection, clustering and robust representation. The feature selection problem considered is that of discriminating between two sets while recognizing irrelevant and redundant features and suppressing them. This creates a lean model that often generalizes better to new unseen data. Computational results on real data confirm improved generalization of leaner models. Clustering is exemplified by the unsupervised learning of patterns and clusters that may exist in a given database and is a useful tool for knowledge discovery in databases (KDD). A mathematical programming formulation of this problem is proposed that is theoretically justifiable and computationally implementable in a finite number of steps. A resulting k-Median Algorithm is utilized to discover very useful survival curves for breast cancer patients from a medical database. Robust representation is concerned with minimizing trained model degradation when applied to new problems. A novel approach is proposed that purposely tolerates a small error in the training process in order to avoid overfitting data that may contain errors. Examples of applications of these concepts are given.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An efficient k-means clustering algorithm: analysis and implementation
Tapas Kanungo,David M. Mount,Nathan S. Netanyahu,Christine D. Piatko,Ruth Silverman,Angela Y. Wu +5 more
TL;DR: This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.
•Posted Content
Principles of data mining
David J. Hand,Heikki Mannila,Padhraic Smyth +2 more
- 01 Jan 2001
TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.
4K
•Book
Advances in Large Margin Classifiers
Alexander J. Smola,Peter L. Bartlett +1 more
- 01 Oct 2000
TL;DR: This book provides an overview of recent developments in large margin classifiers, examines connections with other methods, and identifies strengths and weaknesses of the method, as well as directions for future research.
1.9K
References
Nonparametric Estimation from Incomplete Observations
Edward L. Kaplan,Paul Meier +1 more
TL;DR: In this article, the product-limit (PL) estimator was proposed to estimate the proportion of items in the population whose lifetimes would exceed t (in the absence of such losses), without making any assumption about the form of the function P(t).
•Book
The Nature of Statistical Learning Theory
Vladimir Vapnik
- 01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
46K
•Book
C4.5: Programs for Machine Learning
J. Ross Quinlan
- 15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
27.2K
•Book
Theory of Games and Economic Behavior
John von Neumann,Oskar Morgenstern +1 more
- 01 Jan 1944
TL;DR: Theory of games and economic behavior as mentioned in this paper is the classic work upon which modern-day game theory is based, and it has been widely used to analyze a host of real-world phenomena from arms races to optimal policy choices of presidential candidates, from vaccination policy to major league baseball salary negotiations.
Related Papers (5)
Vladimir Vapnik
- 01 Jan 1995
Anil K. Jain,Richard C. Dubes +1 more
- 01 Jan 1988