Proceedings Article10.1145/1081870.1081912
Reasoning about sets using redescription mining
Mohammed J. Zaki,Naren Ramakrishnan +1 more
- 21 Aug 2005
- pp 364-373
TL;DR: This paper outlines algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets and showcases a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.
read more
Abstract: Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
In-Close, a fast algorithm for computing formal concepts
Simon Andrews
- 01 Jan 2009
TL;DR: This paper shows that In-Close is in the order of 20 times faster than Krajca, which is small, straightforward, requires no matrix pre-processing and is simple to implement.
108
Summarizing data succinctly with the most informative itemsets
TL;DR: In this article, a probabilistic maximum entropy model is used to find the itemset that provides us the most novel information, that is, for which the frequency in the data can surprise us most, and in turn, update the model accordingly.
67
Algorithms for storytelling
Deept Kumar,Naren Ramakrishnan,Richard F. Helm,Malcolm Potts +3 more
- 20 Aug 2006
TL;DR: An efficient storytelling implementation that embeds the CARTwheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter, and which exploits the structure of partitions imposed by the given vocabulary.
Summarizing Data Succinctly with the Most Informative Itemsets.
TL;DR: In this article, a probabilistic maximum entropy model is used to find the itemset that provides us the most novel information, and in turn, the model is updated to update these itemsets accordingly.
62
BLOSOM: a framework for mining arbitrary boolean expressions
Lizhuang Zhao,Mohammed J. Zaki,Naren Ramakrishnan +2 more
- 20 Aug 2006
TL;DR: This work introduces a novel framework, called BLOSOM, for mining (frequent) boolean expressions over binary-valued datasets, and proposes a closure operator for each class that yields closed boolean expressions.
48
References
Genomic expression programs in the response of yeast cells to environmental changes.
Audrey P. Gasch,Paul T. Spellman,Camilla M. Kao,Orna Carmel-Harel,Michael B. Eisen,Gisela Storz,David Botstein,Patrick O. Brown +7 more
TL;DR: Analysis of genomic expression patterns in the yeast Saccharomyces cerevisiae implicated the transcription factors Yap1p, as well as Msn2p and Msn4p, in mediating specific features of the transcriptional response, while the identification of novel sequence elements provided clues to novel regulators.
•Book
Formal Concept Analysis: Mathematical Foundations
Bernhard Ganter,Rudolf Wille,C. Franzke +2 more
- 04 Dec 1998
TL;DR: This is the first textbook on formal concept analysis that gives a systematic presentation of the mathematical foundations and their relation to applications in computer science, especially in data analysis and knowledge processing.
5K
Genesis: cluster analysis of microarray data
TL;DR: Genesis integrates various tools for microarray data analysis such as filters, normalization and visualization tools, distance measures as well as common clustering algorithms including hierarchical clustering, self-organizing maps, k-means, principal component analysis, and support vector machines.
1.9K
•Proceedings Article
CHARM : An Efficient Algorithm for Closed Itemset Mining
Mohammed J. Zaki,Ching-Jiu Hsiao +1 more
- 01 Jan 2002
TL;DR: CHARM is an efficient algorithm for mining all frequent closed itemsets that enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels, and uses a technique called diffsets to reduce the memory footprint of intermediate computations.
MAFIA: A maximal frequent itemset algorithm for transactional databases
Douglas Burdick,Manuel Calimlim,Johannes Gehrke +2 more
- 01 Jan 2001
TL;DR: In this paper, a new algorithm for mining maximal frequent itemsets from a transactional database is presented, which integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms.
747