Documenting Data Production Processes
Milagros Miceli,Tianling Yang,Adriana Alvarado Garcia,Julian Posada,Sonja Mei Wang,Marcin Pohl,Alex Hanna +6 more
- 07 Nov 2022
Vol. 6, Iss: CSCW2, pp 1-34
12
TL;DR: In this paper , a view of documentation as a boundary object, i.e., an object that can be used differently across organizations and teams but holds enough immutable content to maintain integrity, can be useful when designing documentation to retrieve heterogeneous, often distributed, contexts of data production.
read more
Abstract: The opacity of machine learning data is a significant threat to ethical data work and intelligible systems. Previous research has addressed this issue by proposing standardized checklists to document datasets. This paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets towards documenting data production. We draw on participatory design and collaborate with data workers at two companies located in Bulgaria and Argentina, where the collection and annotation of data for machine learning are outsourced. Our investigation comprises 2.5 years of research, including 33 semi-structured interviews, five co-design workshops, the development of prototypes, and several feedback instances with participants. We identify key challenges and requirements related to the integration of documentation practices in real-world data production scenarios. Our findings comprise important design considerations and highlight the value of designing data documentation based on the needs of data workers. We argue that a view of documentation as a boundary object, i.e., an object that can be used differently across organizations and teams but holds enough immutable content to maintain integrity, can be useful when designing documentation to retrieve heterogeneous, often distributed, contexts of data production.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice
Fernando Delgado,Stephen Yang,Michael Madaio,Qian Yang +3 more
- 30 Oct 2023
TL;DR: The participatory turn in AI design faces challenges in granting substantive agency to stakeholders. A conceptual framework and empirical findings are presented to guide researchers and practitioners in evaluating and improving participatory approaches.
52
Ground Truth Or Dare: Factors Affecting The Creation Of Medical Datasets For Training AI
Hubert Dariusz Zajac,Natalia-Rozalia Avlona,Finn Kensing,Tariq Osman Andersen,Irina Shklovski +4 more
- 08 Aug 2023
TL;DR: This work defines this work as the design of ground truth schema and explores the challenges involved in the creation of datasets in the medical domain even before any annotations are made, to ensure responsible AI design.
12
A toolbox for surfacing health equity harms and biases in large language models
Stephen Pfohl,Heather Cole-Lewis,Rory Sayres,Darlene Neal,Mercy Asiedu,Awa Dieng,Nenad Tomašev,Qazi Mamunur Rashid,Shekoofeh Azizi,Negar Rostamzadeh,Liam G. McCoy,Leo Anthony Celi,Yun Liu,Mike Schaekermann,Alanna Walton,Alicia Parrish,Chirag Nagpal,Rajesh Singh,Akeiylah Dewitt,P. Mansfield,Sushant Prakash,Katherine Heller,Alan Karthikesalingam,Christopher Semturs,Joëlle Barral,Greg S. Corrado,Yossi Matias,Jamila Smith-Loud,Ivor B. Horn,K. K. Singhal +29 more
TL;DR: Researchers develop a toolbox to identify health equity harms and biases in large language models, presenting a multifactorial framework and a dataset collection to surface potential biases in LLM-generated medical answers.
11
Representation in AI Evaluations
Adam S. Bergman,Lisa Anne Hendricks,Maribeth Rauh,Boxi Wu,William Agnew,Markus Kunesch,Iason Gabriel,William S. Isaac +7 more
- 12 Jun 2023
TL;DR: The authors untangle the benefits of representation in AI evaluations to develop a framework to guide an AI practitioner or auditor towards the creation of representative ML evaluations, and further lay out the limitations and tensions of instrumentally representative datasets, such as the necessity of data existence and access, surveillance vs expectations of privacy, implications for foundation models and power.
11
Decolonial AI as Disenclosure
Warmhold Jan Thomas Mollema
TL;DR: Decolonial AI as Disenclosure TLDR: Decolonizing AI requires the abolishment of political, ecological and epistemic borders erected and reinforced in the phases of its design, production, development and deployment.
References
Using thematic analysis in psychology
Virginia Braun,Victoria Clarke +1 more
TL;DR: Thematic analysis is a poorly demarcated, rarely acknowledged, yet widely used qualitative analytic method within psychology as mentioned in this paper, and it offers an accessible and theoretically flexible approach to analysing qualitative data.
145.8K
Thematic Analysis: Striving to Meet the Trustworthiness Criteria
TL;DR: The process of conducting a thematic analysis is illustrated through the presentation of an auditable decision trail, guiding interpreting and representing textual data and exploring issues of rigor and trustworthiness.
17.9K
Constructing Grounded Theory: A Practical Guide through Qualitative Analysis
ชวิตรา ตันติมาลา
- 20 Jan 2017
TL;DR: The Grounded Theory: A Practical Guide through Qualitative Analysis as mentioned in this paper, a practical guide through qualitative analysis through quantitative analysis, is a good starting point for such a study.
14K
Constructing grounded theory : A practical guide through qualitative analysis
TL;DR: Charmaz as mentioned in this paper presented a practical guide through qualitative analysis to construct grounded theory, using qualitative analysis, and showed that qualitative analysis can be used to understand grounded theory in a practical way.
12.3K
Institutional Ecology, `Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, 1907-39:
TL;DR: In this article, the authors present a model of how one group of actors managed the tension between divergent viewpoints and the need for generalizable findings in scientific work, and distinguish four types of boundary objects: repositories, ideal types, coincident boundaries and standardized forms.
9.1K