Journal Article10.1017/S0962492901000058
Data mining techniques
TL;DR: This review provides a discussion of and pointers to efficient algorithms for the common data mining tasks in a mathematical framework.
read more
Abstract: Methods for knowledge discovery in data bases (KDD) have been studied for more than a decade. New methods are required owing to the size and complexity of data collections in administration, business and science. They include procedures for data query and extraction, for data cleaning, data analysis, and methods of knowledge representation. The part of KDD dealing with the analysis of the data has been termed data mining. Common data mining tasks include the induction of association rules, the discovery of functional relationships (classification and regression) and the exploration of groups of similar data objects in clustering. This review provides a discussion of and pointers to efficient algorithms for the common data mining tasks in a mathematical framework. Because of the size and complexity of the data sets, efficient algorithms and often crude approximations play an important role.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure
S. Aranganayagi,K. Thangavel +1 more
- 13 Dec 2007
TL;DR: Experimental results show that the proposed method to cluster categorical data is efficient and based on the minimum dissimilarity value objects are grouped into cluster using silhouette coefficient.
183
Can the use of cognitive and metacognitive self-regulated learning strategies be predicted by learners' levels of prior knowledge in hypermedia-learning environments?
TL;DR: Results have important implications on designing multi-agent, hypermedia environments; pedagogical agents that adapt to students' learning needs, based on their prior knowledge levels, are revealed.
131
Mining Frequent Itemsets Using Genetic Algorithm
TL;DR: In this article, the main aim of this paper is to find all the frequent itemsets from given data sets using genetic algorithm and the major advantage of using GA in the discovery of frequent itemets is that they perform global search and its time complexity is less compared to other algorithms as the genetic algorithm is based on the greedy approach.
Numerical linear algebra in data mining
TL;DR: The emphasis is on rank reduction as a method of extracting information from a data matrix, low-rank approximation of matrices using the singular value decomposition and clustering, and on eigenvalue methods for network analysis.
56
References
•Book
Data Mining: Concepts and Techniques
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Some methods for classification and analysis of multivariate observations
James B. MacQueen
- 01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
•Book
Classification and regression trees
Leo Breiman
- 01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
22.7K
•Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
20.3K
Related Papers (5)
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
B.N. Lakshmi,G.H. Raghunandhan +1 more
- 24 Mar 2011
Wolfgang Gaul,F. Säuberlich +1 more
- 01 Jan 1999