Proceedings Article10.1145/263661.263684
Data mining, hypergraph transversals, and machine learning (extended abstract)
Dimitrios Gunopulos,Heikki Mannila,Roni Khardon,Hannu Toivonen +3 more
- 01 May 1997
- pp 209-216
169
TL;DR: It is shown that this problem has a close relationship with the hypergraph transversal problem, and two algorithms that have been previously used in data mining are analyzed, proving upper bounds on their complexity.
read more
Abstract: Several data mining problems can be formulated as problems of nding maximally speci c sentences that are interesting in a database. We rst show that this problem has a close relationship with the hypergraph transversal problem. We then analyze two algorithms that have been previously used in data mining, proving upper bounds on their complexity. The rst algorithm is useful when the maximally speci c interesting sentences are \small". We show that this algorithm can also be used to e ciently solve a special case of the hypergraph transversal problem, improving on previous results. The second algorithm utilizes a subroutine for hypergraph transversals, and is applicable in more general situations, with complexity close to a lower bound for the problem. We also relate these problems to the model of exact learning in computational learning theory, and use the correspondence to derive some corollaries.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Automatic subspace clustering of high dimensional data for data mining applications
Rakesh Agrawal,Johannes Gehrke,Dimitrios Gunopulos,Prabhakar Raghavan +3 more
- 01 Jun 1998
TL;DR: CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.
Scalable algorithms for association mining
TL;DR: Efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the association mining task are presented and the effect of using different database layout schemes combined with the proposed decomposition and traverse techniques are presented.
Automatic subspace clustering of high dimensional data for data mining applications
TL;DR: Data mining applications place special requirements on clustering algorithms including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensiveness, and so on.
Automatic Subspace Clustering of High Dimensional Data
TL;DR: CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.
Summarizing itemset patterns: a profile-based approach
Xifeng Yan,Hong Cheng,Jiawei Han,Dong Xin +3 more
- 21 Aug 2005
TL;DR: This paper examines how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily, and proposes a quality measure function to determine the optimal value of parameter K.
References
Mining association rules between sets of items in large databases
Rakesh Agrawal,Tomasz Imielinski,Arun N. Swami +2 more
- 01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Mining association rules between sets of items in large databases
TL;DR: An efficient algorithm is presented that generates all significant transactions in a large database of customer transactions that consists of items purchased by a customer in a visit.
4.5K
•Proceedings Article
Fast discovery of association rules
Rakesh Agrawal,Heikki Mannila,Ramakrishnan Srikant,Hannu Toivonen,A. Inkeri Verkamo +4 more
- 01 Feb 1996
2.8K
Queries and Concept Learning
TL;DR: This work considers the problem of using queries to learn an unknown concept, and several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries.
Generalization as search
TL;DR: The problem of concept learning, or forming a general description of a class of objects given a set of examples and non-examples, is viewed here as a search problem.
1.6K