Interactive data exploration with smart drill-down
Manas Joglekar,Hector Garcia-Molina,Aditya Parameswaran +2 more
- 01 May 2016
- Vol. 2016, pp 906-917
TL;DR: It is demonstrated that the underlying optimization problems are NP-HARD, and an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down is described.
read more
Abstract: We present smart drill-down, an operator for interactively exploring a relational table to discover and summarize “interesting” groups of tuples. Each group of tuples is described by a rule. For instance, the rule (a, b, ★, 1000) tells us that there are a thousand tuples with value a in the first column and b in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are NP-HARD, and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets on our experimental prototype to demonstrate the usefulness of smart drill-down and study the performance of our algorithms.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Data Lifecycle Challenges in Production Machine Learning: A Survey
Neoklis Polyzotis,Sudip Roy,Steven Euijong Whang,Martin Zinkevich +3 more
- 11 Dec 2018
TL;DR: Challenges in data understanding, data validation and cleaning, and data preparation are explored - how different constraints are imposed on the solutions depending on where in the lifecycle of a model the problems are encountered and who encounters them are explored.
214
Automating Exploratory Data Analysis via Machine Learning: An Overview
Tova Milo,Amit Somech +1 more
- 11 Jun 2020
TL;DR: This tutorial reviews recent lines of work for automating EDA, starting from recommender systems for suggesting a single exploratory action, going through kNN-based classifiers and active-learning methods for predicting users' interestingness preferences, and finally to fully automates EDA using state-of-the-art methods such as deep reinforcement learning and sequence-to-sequence models.
91
Database Learning: Toward a Database that Becomes Smarter Every Time
Yongjoo Park,Ahmad Shahab Tajik,Michael Cafarella,Barzan Mozafari +3 more
- 09 May 2017
TL;DR: The principle of maximum entropy is exploited to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations and which lead to increasingly faster response times for future queries.
87
Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning
Ori Bar El,Tova Milo,Amit Somech +2 more
- 11 Jun 2020
TL;DR: This work presents ATENA, a system that takes an input dataset and auto-generates a compelling exploratory session, presented in an EDA notebook, using a novel Deep Reinforcement Learning (DRL) architecture to effectively optimize the notebook generation.
73
Database Learning: Toward a Database that Becomes Smarter Every Time
TL;DR: Verdict as mentioned in this paper exploits the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations, and conducts extensive experiments on real-world query traces from a large customer of a major database vendor.
59
References
•Proceedings Article
Fast Algorithms for Mining Association Rules in Large Databases
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 12 Sep 1994
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Mining frequent patterns without candidate generation
Jiawei Han,Jian Pei,Yiwen Yin +2 more
- 16 May 2000
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS
Jim Gray,A. Bosworth,A. Lyaman,Hamid Pirahesh +3 more
- 26 Feb 1996
TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.
Random sampling with a reservoir
TL;DR: Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin, and an efficient Pascal-like implementation is given that incorporates these modifications and that is suitable for general use.
Related Papers (5)
Gayatri Sathe,Sunita Sarawagi +1 more
- 11 Sep 2001
Sunita Sarawagi,Rakesh Agrawal,Nimrod Megiddo +2 more
- 23 Mar 1998