TL;DR: It is shown how the combined strength and wisdom of the crowds can be used to generate a large, high‐quality, word–emotion and word–polarity association lexicon quickly and inexpensively.
Abstract: Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper, we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word–emotion and word–polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help to identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help to obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion-annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher interannotator agreement than that obtained by asking if a term evokes an emotion.
TL;DR: Although micro-task markets have great potential for rapidly collecting user measurements at low costs, it is found that special care is needed in formulating tasks in order to harness the capabilities of the approach.
Abstract: User studies are important for many aspects of the design process and involve techniques ranging from informal surveys to rigorous laboratory studies. However, the costs involved in engaging users often requires practitioners to trade off between sample size, time requirements, and monetary costs. Micro-task markets, such as Amazon's Mechanical Turk, offer a potential paradigm for engaging a large number of users for low time and monetary costs. Here we investigate the utility of a micro-task market for collecting user measurements, and discuss design considerations for developing remote micro user evaluation tasks. Although micro-task markets have great potential for rapidly collecting user measurements at low costs, we found that special care is needed in formulating tasks in order to harness the capabilities of the approach.
TL;DR: An introduction to crowdsourcing is provided, both its theoretical grounding and exemplar cases, taking care to distinguish crowdsourcing from open source production.
Abstract: Crowdsourcing is an online, distributed problem-solving and production model that has emerged in recent years. Notable examples of the model include Threadless, iStockphoto, InnoCentive, the Goldcorp Challenge, and user-generated advertising contests. This article provides an introduction to crowdsourcing, both its theoretical grounding and exemplar cases, taking care to distinguish crowdsourcing from open source production. This article also explores the possibilities for the model, its potential to exploit a crowd of innovators, and its potential for use beyond forprofit sectors. Finally, this article proposes an agenda for research into crowdsourcing.
TL;DR: RAPPOR as discussed by the authors is a system for crowdsourcing statistics from end-user client software, anonymously, with strong privacy guarantees, allowing the forest of client data to be studied, without permitting the possibility of looking at individual trees.
Abstract: Randomized Aggregatable Privacy-Preserving Ordinal Response, or RAPPOR, is a technology for crowdsourcing statistics from end-user client software, anonymously, with strong privacy guarantees. In short, RAPPORs allow the forest of client data to be studied, without permitting the possibility of looking at individual trees. By applying randomized response in a novel manner, RAPPOR provides the mechanisms for such collection as well as for efficient, high-utility analysis of the collected data. In particular, RAPPOR permits statistics to be collected on the population of client-side strings with strong privacy guarantees for each client, and without linkability of their reports. This paper describes and motivates RAPPOR, details its differential-privacy and utility guarantees, discusses its practical deployment and properties in the face of different attack models, and, finally, gives results of its application to both synthetic and real-world data.
TL;DR: In this article, existing definitions of crowdsourcing are analysed to extract common elements and to establish the basic characteristics of any crowdsourcing initiative.
Abstract: 'Crowdsourcing' is a relatively recent concept that encompasses many practices. This diversity leads to the blurring of the limits of crowdsourcing that may be identified virtually with any type of internet-based collaborative activity, such as co-creation or user innovation. Varying definitions of crowdsourcing exist, and therefore some authors present certain specific examples of crowdsourcing as paradigmatic, while others present the same examples as the opposite. In this article, existing definitions of crowdsourcing are analysed to extract common elements and to establish the basic characteristics of any crowdsourcing initiative. Based on these existing definitions, an exhaustive and consistent definition for crowdsourcing is presented and contrasted in 11 cases.