Finding Robust Itemsets under Subsampling
TL;DR: In this paper, the robustness of a property is defined as the probability that this property holds on random subsets of the original data, and the measure is computed analytically without actually sampling the data.
read more
Abstract: Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of a property of an itemset such as closedness or nonderivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties, namely an itemset being closed, free, non-derivable, or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and, in contrast to noise-tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Table IV. Top-45 free non-singleton itemsets from re0 (τ = 0.05) dataset. 
Table III. Top-45 closed itemsets from re0 (τ = 0.05) dataset. 
Fig. 1. Number of free, non-derivable, and totally shattered itemsets on Zoo (τ = 0.01) dataset as a function of α and ρ. 
Fig. 3. Number of free itemsets as a function of ρ 
Fig. 2. Average of the number of free/totally shattered/non-derivable itemsets as a function of ρ normalized by the number of itemsets for ρ = 0.1. Average is taken over all test datasets 
Fig. 4. Rank compliance of an itemset in a noisy data as a function of robustness in the original data. High compliance value imply that adding noise had little effect on the rank of an itemset. Median and quartiles are computed over all datasets.
Citations
•Journal Article
ACM Transactions on Database Systems
Dan Suciu,Gerhard Weikum +1 more
TL;DR: BLOCKIN BLOCKINÒ BLOCKin× ½¸ÔÔº ¾ßß¿º ¿ ¾ ¾ à ¼ à à 0
425
MINI: mining informative non-redundant itemsets
Arianna Gallo,Tijl De Bie,Nello Cristianini +2 more
- 01 Jan 2007
TL;DR: This paper presents and empirically validate a statistically founded approach called MINI, to compress the set of frequent itemsets down to a list of informative and non-redundant itemsets.
54
Interactive data exploration with smart drill-down
Manas Joglekar,Hector Garcia-Molina,Aditya Parameswaran +2 more
- 01 May 2016
TL;DR: It is demonstrated that the underlying optimization problems are NP-HARD, and an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down is described.
Interactive Data Exploration with Smart Drill-Down
TL;DR: Smart drill-down as mentioned in this paper is an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples, each of which is described by a rule.
30
Formal Concept Analysis: From Knowledge Discovery to Knowledge Processing
Sébastien Ferré,Marianne Huchard,Mehdi Kaytoue,Sergei O. Kuznetsov,Amedeo Napoli +4 more
- 08 May 2020
TL;DR: This chapter introduces Formal Concept Analysis (FCA), a formalism based on lattice theory aimed at data analysis and knowledge processing that allows the design of so-called concept lattices from binary and complex data.
References
Mining association rules between sets of items in large databases
Rakesh Agrawal,Tomasz Imielinski,Arun N. Swami +2 more
- 01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Discovering Frequent Closed Itemsets for Association Rules
Nicolas Pasquier,Yves Bastide,Rafik Taouil,Lotfi Lakhal +3 more
- 10 Jan 1999
TL;DR: This paper proposes a new algorithm, called A-Close, using a closure mechanism to find frequent closed itemsets, and shows that this approach is very valuable for dense and/or correlated data that represent an important part of existing databases.
Beyond market baskets: generalizing association rules to correlations
Sergey Brin,Rajeev Motwani,Craig Silverstein +2 more
- 01 Jun 1997
TL;DR: This work develops the notion of mining rules that identify correlations (generalizing associations), and proposes measuring significance of associations via the chi-squared test for correlation from classical statistics, enabling the mining problem to reduce to the search for a border between correlated and uncorrelated itemsets in the lattice.
Algorithms for association rule mining — a general survey and comparison
TL;DR: The fundamentals of asso iation rule mining are explained and a general framework is derived and it turns out that the runtime behavior of the algorithms is more similar as to be expe ted.
1.1K
Related Papers (5)
Nikolaj Tatti,Fabian Moerchen +1 more
- 11 Dec 2011
[...]
Jouni K. Seppänen,Heikki Mannila +1 more
- 22 Aug 2004
L. Greeshma,G. Pradeepini +1 more
- 01 Jan 2016