Finding Robust Itemsets under Subsampling

doi:10.1145/2656261

Open AccessJournal Article10.1145/2656261

Finding Robust Itemsets under Subsampling

Nikolaj Tatti, +2 more

- 07 Oct 2014

- ACM Transactions on Database Systems

- Vol. 39, Iss: 3, pp 20

20

TL;DR: In this paper, the robustness of a property is defined as the probability that this property holds on random subsets of the original data, and the measure is computed analytically without actually sampling the data.

Abstract: Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of a property of an itemset such as closedness or nonderivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties, namely an itemset being closed, free, non-derivable, or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and, in contrast to noise-tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Table IV. Top-45 free non-singleton itemsets from re0 (τ = 0.05) dataset.

Table III. Top-45 closed itemsets from re0 (τ = 0.05) dataset.

Fig. 1. Number of free, non-derivable, and totally shattered itemsets on Zoo (τ = 0.01) dataset as a function of α and ρ.

Fig. 3. Number of free itemsets as a function of ρ

Fig. 2. Average of the number of free/totally shattered/non-derivable itemsets as a function of ρ normalized by the number of itemsets for ρ = 0.1. Average is taken over all test datasets

Fig. 4. Rank compliance of an itemset in a noisy data as a function of robustness in the original data. High compliance value imply that adding noise had little effect on the rank of an itemset. Median and quartiles are computed over all datasets.

Citations

•Journal Article

ACM Transactions on Database Systems

Dan Suciu, +1 more

- 01 Jan 2005

- ACM Transactions on Database Systems

TL;DR: BLOCKIN BLOCKINÒ BLOCKin× ½¸ÔÔº ¾ßß¿º ¿ ¾ ¾ Ã ¼ Ã Ã 0

...read moreread less

425

MINI: mining informative non-redundant itemsets

Arianna Gallo, +2 more

- 01 Jan 2007

TL;DR: This paper presents and empirically validate a statistically founded approach called MINI, to compress the set of frequent itemsets down to a list of informative and non-redundant itemsets.

...read moreread less

54

•Proceedings Article•10.1109/ICDE.2016.7498300

Interactive data exploration with smart drill-down

Manas Joglekar, +2 more

- 01 May 2016

TL;DR: It is demonstrated that the underlying optimization problems are NP-HARD, and an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down is described.

...read moreread less

52

•Journal Article•10.1109/TKDE.2017.2685998

Interactive Data Exploration with Smart Drill-Down

Manas Joglekar, +2 more

- 01 Jan 2019

- IEEE Transactions on Knowledge and Data ...

TL;DR: Smart drill-down as mentioned in this paper is an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples, each of which is described by a rule.

...read moreread less

30

Book Chapter•10.1007/978-3-030-06167-8_13

Formal Concept Analysis: From Knowledge Discovery to Knowledge Processing

Sébastien Ferré, +4 more

- 08 May 2020

TL;DR: This chapter introduces Formal Concept Analysis (FCA), a formalism based on lattice theory aimed at data analysis and knowledge processing that allows the design of so-called concept lattices from binary and complex data.

...read moreread less

29

...

Expand

References

UCI Machine Learning Repository

A. Asuncion

- 01 Jan 2007

24.3K

Proceedings Article•10.1145/170035.170072

Mining association rules between sets of items in large databases

Rakesh Agrawal, +2 more

- 01 Jun 1993

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.

...read moreread less

17K

•Book Chapter•10.1007/3-540-49257-7_25

Discovering Frequent Closed Itemsets for Association Rules

Nicolas Pasquier, +3 more

- 10 Jan 1999

TL;DR: This paper proposes a new algorithm, called A-Close, using a closure mechanism to find frequent closed itemsets, and shows that this approach is very valuable for dense and/or correlated data that represent an important part of existing databases.

...read moreread less

1.6K

Proceedings Article•10.1145/253260.253327

Beyond market baskets: generalizing association rules to correlations

Sergey Brin, +2 more

- 01 Jun 1997

TL;DR: This work develops the notion of mining rules that identify correlations (generalizing associations), and proposes measuring significance of associations via the chi-squared test for correlation from classical statistics, enabling the mining problem to reduce to the search for a border between correlated and uncorrelated itemsets in the lattice.

...read moreread less

1.5K

Journal Article•10.1145/360402.360421

Algorithms for association rule mining — a general survey and comparison

Jochen Hipp, +2 more

- 01 Jun 2000

- Sigkdd Explorations

TL;DR: The fundamentals of asso iation rule mining are explained and a general framework is derived and it turns out that the runtime behavior of the algorithms is more similar as to be expe ted.

...read moreread less

1.1K