Data Makes Better Data Scientists
Jinjin Zhao,Avigdor Gal,Sanjay Krishnan +2 more
- 18 Jun 2023
2
TL;DR: In this paper , the authors propose a framework for logging and understanding incremental code executions in Jupyter notebooks, which aims to allow reasoning about how insights are generated in data science and extract key observations into best data science practices in the wild.
read more
Abstract: With the goal of identifying common practices in data science projects, this paper proposes a framework for logging and understanding incremental code executions in Jupyter notebooks. This framework aims to allow reasoning about how insights are generated in data science and extract key observations into best data science practices in the wild. In this paper, we show an early prototype of this framework and ran an experiment to log a machine learning project for 25 undergraduate students.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Growing a FLOWER: Building a Diagram Unifying Flow and ER Notation for Data Science
Carlos Ordońẽz,Robin Varghese,Nguyen Hong Phan,Wojciech Macyna +3 more
- 14 Jun 2024
TL;DR: A novel diagram is presented for data integration, cleaning and transformation, targeting machine learning model input data sets. The diagram is built from source code and its associated browser-based GUI, mixing diverse data sources and programming languages.
References
The theory of decision making.
TL;DR: This literature review of decision making (how people make choices among desirable alternatives), culled from the disciplines of psychology, economics, and mathematics, covers the theory of riskless choices, the application of the theory to welfare economics,The theory of risky choices, transitivity of choices, and the theories of games and statistical decision functions.
2.5K
Methods of coping with social desirability bias: A review.
TL;DR: In this article, two main modes of coping with social desirability bias are distinguished: self-deception and other deception, and the use of forced-choice items, the randomized response technique, the bogus pipeline, self-administration of the questionnaire, selection of interviewers, and use of proxy subjects.
2.2K
Data science and prediction
TL;DR: Big data promises automated actionable knowledge creation and predictive models for use by both humans and computers as discussed by the authors, and big data can be used for both human and computer to create knowledge.
Data Science and Prediction
TL;DR: Big data promises automated actionable knowledge creation and predictive models for use by both humans and computers and can help improve the quality of knowledge and decision-making in the rapidly changing environment.
716
Enterprise Data Analysis and Visualization: An Interview Study
TL;DR: This work characterize the process of industrial data analysis and document how organizational features of an enterprise impact it, and describes recurring pain points, outstanding challenges, and barriers to adoption for visual analytic tools.