Top 73 papers presented at Web Science in 2019

Showing papers presented at "Web Science in 2019"

Proceedings Article•10.1145/3292522.3326034•

Spread of Hate Speech in Online Social Media

[...]

Binny Mathew¹, Ritam Dutt¹, Pawan Goyal¹, Animesh Mukherjee¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

26 Jun 2019

TL;DR: In this paper, the diffusion dynamics of the posts made by hateful and non-hateful users on Gab (Gab.com) were studied. And the authors observed that the content generated by the hateful users tend to spread faster, farther and reach a much wider audience as compared to normal users.

...read moreread less

Abstract: Hate speech is considered to be one of the major issues currently plaguing the online social media. With online hate speech culminating in gruesome scenarios like the Rohingya genocide in Myanmar, anti-Muslim mob violence in Sri Lanka, and the Pittsburgh synagogue shooting, there is a dire need to understand the dynamics of user interaction that facilitate the spread of such hateful content. In this paper, we perform the first study that looks into the diffusion dynamics of the posts made by hateful and non-hateful users on Gab (Gab.com). We collect a massive dataset of 341K users with 21M posts and investigate the diffusion of the posts generated by hateful and non-hateful users. We observe that the content generated by the hateful users tend to spread faster, farther and reach a much wider audience as compared to the content generated by normal users. We further analyze the hateful and non-hateful users on the basis of their account and network characteristics. An important finding is that the hateful users are far more densely connected among themselves. Overall, our study provides the first cross-sectional view of how hateful users diffuse hate content in online social media.

...read moreread less

291 citations

Proceedings Article•10.1145/3292522.3326045•

Exploring Misogyny across the Manosphere in Reddit

[...]

Tracie Farrell¹, Miriam Fernandez¹, Jakub Novotny, Harith Alani¹•Institutions (1)

Open University¹

26 Jun 2019

TL;DR: Investigating the flow of extreme language across seven online communities on Reddit shows increasing patterns on misogynistic content and users as well as violent attitudes, corroborating existing theories of feminist studies that the amount of misogyny, hostility and violence is steadily increasing in the manosphere.

...read moreread less

Abstract: The 'manosphere' has been a recent subject of feminist scholarship on the web. Serious accusations have been levied against it for its role in encouraging misogyny and violent threats towards women online, as well as for potentially radicalising lonely or disenfranchised men. Feminist scholars evidence this through a shift in the language and interests of some men's rights activists on the manosphere, away from traditional subjects of family law or mental health and towards more sexually explicit, violent, racist and homophobic language. In this paper, we study this phenomenon by investigating the flow of extreme language across seven online communities on Reddit, with openly misogynistic members (e.g., Men Going Their Own Way, Involuntarily Celibates), and investigate if and how misogynistic ideas spread within and across these communities. Grounded on feminist critiques of language, we created nine lexicons capturing specific misogynistic rhetoric (Physical Violence, Sexual Violence, Hostility, Patriarchy, Stoicism, Racism, Homophobia, Belittling, and Flipped Narrative) and used these lexicons to explore how language evolves within and across misogynistic groups. This analysis was conducted on 6 million posts, from 300K conversations created between 2011 and December 2018. Our results shows increasing patterns on misogynistic content and users as well as violent attitudes, corroborating existing theories of feminist studies that the amount of misogyny, hostility and violence is steadily increasing in the manosphere.

...read moreread less

223 citations

Proceedings Article•10.1145/3292522.3326028•

A Unified Deep Learning Architecture for Abuse Detection

[...]

Antigoni Maria Founta, Despoina Chatzakou, Nicolas Kourtellis¹, Jeremy Blackburn², Athena Vakali, Ilias Leontiadis¹ - Show less +2 more•Institutions (2)

Telefónica¹, University of Alabama²

26 Jun 2019

TL;DR: A deep learning architecture is proposed, which utilizes a wide variety of available metadata, and combines it with automatically-extracted hidden patterns within the text of the tweets, to detect multiple abusive behavioral norms which are highly inter-related.

...read moreread less

Abstract: Hate speech, offensive language, sexism, racism, and other types of abusive behavior have become a common phenomenon in many online social media platforms. In recent years, such diverse abusive behaviors have been manifesting with increased frequency and levels of intensity. Despite social media's efforts to combat online abusive behaviors this problem is still apparent. In fact, up to now, they have entered an arms race with the perpetrators, who constantly change tactics to evade the detection algorithms deployed by these platforms. Such algorithms, not disclosed to the public for obvious reasons, are typically custom-designed and tuned to detect only one specific type of abusive behavior, but usually miss other related behaviors. In the present paper, we study this complex problem by following a more holistic approach, which considers the various aspects of abusive behavior. We focus on Twitter, due to its popularity, and analyze user and textual properties from different angles of abusive posting behavior. We propose a deep learning architecture, which utilizes a wide variety of available metadata, and combines it with automatically-extracted hidden patterns within the text of the tweets, to detect multiple abusive behavioral norms which are highly inter-related. The proposed unified architecture is applied in a seamless and transparent fashion without the need for any change of the architecture but only training a model for each task (i.e., different types of abusive behavior). We test the proposed approach with multiple datasets addressing different abusive behaviors on Twitter. Our results demonstrate high performance across all datasets, with the AUC value to range from 92% to 98%.

...read moreread less

202 citations

Proceedings Article•10.1145/3292522.3326015•

RTbust: Exploiting Temporal Patterns for Botnet Detection on Twitter

[...]

Michele Mazza, Stefano Cresci, Marco Avvenuti¹, Walter Quattrociocchi², Maurizio Tesconi - Show less +1 more•Institutions (2)

University of Pisa¹, Ca' Foscari University of Venice²

26 Jun 2019

TL;DR: In this paper, an LSTM autoencoder converts the retweet time series into compact and informative latent feature vectors, which are then clustered with a hierarchical density-based algorithm.

...read moreread less

Abstract: Within OSNs, many of our supposedly online friends may instead be fake accounts called social bots, part of large groups that purposely re-share targeted content. Here, we study retweeting behaviors on Twitter, with the ultimate goal of detecting retweeting social bots.We collect a dataset of 10M retweets. We design a novel visualization that we leverage to highlight benign and malicious patterns of retweeting activity. In this way, we uncover a ?normal" retweeting pattern that is peculiar of human-operated accounts, and suspicious patterns related to bot activities. Then, we propose a bot detection technique that stems from the previous exploration of retweeting behaviors. Our technique, called Retweet-Buster (RTbust), leverages unsupervised feature extraction and clustering. An LSTM autoencoder converts the retweet time series into compact and informative latent feature vectors, which are then clustered with a hierarchical density-based algorithm. Accounts belonging to large clusters characterized by malicious retweeting patterns are labeled as bots. RTbust obtains excellent detection results, with F1=0.87, whereas competitors achieve F1?0.76.Finally, we apply RTbust to a large dataset of retweets, uncovering 2 previously unknown active botnets with hundreds of accounts.

...read moreread less

194 citations

Proceedings Article•10.1145/3292522.3326032•

Prevalence and Psychological Effects of Hateful Speech in Online College Communities

[...]

Koustuv Saha¹, Eshwar Chandrasekharan¹, Munmun De Choudhury¹•Institutions (1)

Georgia Institute of Technology¹

26 Jun 2019

TL;DR: This work lays the foundation for studying the psychological impacts of hateful speech in online communities in general, and situated communities in particular (the ones that have both an offline and an online analog).

...read moreread less

Abstract: Background. Hateful speech bears negative repercussions and is particularly damaging in college communities. The efforts to regulate hateful speech on college campuses pose vexing socio-political problems, and the interventions to mitigate the effects require evaluating the pervasiveness of the phenomenon on campuses as well the impacts on students' psychological state. Data and Methods. Given the growing use of social media among college students, we target the above issues by studying the online aspect of hateful speech in a dataset of 6 million Reddit comments shared in 174 college communities. To quantify the prevelence of hateful speech in an online college community, we devise College Hate Index (CHX). Next, we examine its distribution across the categories of hateful speech,behavior, class, disability, ethnicity, gender, physical appearance, race, religion, andsexual orientation. We then employ a causal-inference framework to study the psychological effects of hateful speech, particularly in the form of individuals' online stress expression. Finally, we characterize their psychological endurance to hateful speech by analyzing their language -- their discriminatory keyword use, and their personality traits. Results. We find that hateful speech is prevalent in college subreddits, and 25% of them show greater hateful speech than non-college subreddits. We also find that the exposure to hate leads to greater stress expression. However, everybody exposed is not equally affected; some show lower psychological endurance than others. Low endurance individuals are more vulnerable to emotional outbursts, and are more neurotic than those with higher endurance. Discussion. Our work bears implications for policy-making and intervention efforts to tackle the damaging effects of online hateful speech in colleges. From technological perspective, our work caters to mental health support provisions on college campuses, and to moderation efforts in online college communities. In addition, given the charged aspect of speech dilemma, we highlight the ethical implications of our work. Our work lays the foundation for studying the psychological impacts of hateful speech in online communities in general, and situated communities in particular (the ones that have both an offline and an online analog).

...read moreread less

164 citations

Proceedings Article•10.1145/3292522.3326016•

Who Let The Trolls Out?: Towards Understanding State-Sponsored Trolls

[...]

Savvas Zannettou¹, Tristan Caulfield², William Setzer³, Michael Sirivianos¹, Gianluca Stringhini⁴, Jeremy Blackburn³ - Show less +2 more•Institutions (4)

Cyprus University of Technology¹, University College London², University of Alabama at Birmingham³, Boston University⁴

26 Jun 2019

TL;DR: This article analyzed 10M posts by 5.5k Twitter and Reddit users identified as Russian and Iranian state-sponsored trolls and compared the behavior of each group of trolls with a focus on how their strategies change over time, the different campaigns they embark on and differences between the trolls operated by Russia and Iran.

...read moreread less

Abstract: Recent evidence has emerged linking coordinated campaigns by state-sponsored actors to manipulate public opinion on the Web. Campaigns revolving around major political events are enacted via mission-focused ?trolls." While trolls are involved in spreading disinformation on social media, there is little understanding of how they operate, what type of content they disseminate, how their strategies evolve over time, and how they influence the Web's in- formation ecosystem. In this paper, we begin to address this gap by analyzing 10M posts by 5.5K Twitter and Reddit users identified as Russian and Iranian state-sponsored trolls. We compare the behavior of each group of state-sponsored trolls with a focus on how their strategies change over time, the different campaigns they embark on, and differences between the trolls operated by Russia and Iran. Among other things, we find: 1) that Russian trolls were pro-Trump while Iranian trolls were anti-Trump; 2) evidence that campaigns undertaken by such actors are influenced by real-world events; and 3) that the behavior of such actors is not consistent over time, hence detection is not straightforward. Using Hawkes Processes, we quantify the influence these accounts have on pushing URLs on four platforms: Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab. In general, Russian trolls were more influential and efficient in pushing URLs to all the other platforms with the exception of /pol/ where Iranians were more influential. Finally, we release our source code to ensure the reproducibility of our results and to encourage other researchers to work on understanding other emerging kinds of state-sponsored troll accounts on Twitter.

...read moreread less

153 citations

Proceedings Article•10.1145/3292522.3326041•

Incentivized Blockchain-based Social Media Platforms: A Case Study of Steemit

[...]

Chao Li¹, Balaji Palanisamy¹•Institutions (1)

University of Pittsburgh¹

26 Jun 2019

TL;DR: In this paper, the authors present an empirical analysis of Steemit, a key representative of the emerging incentivized social media platforms over Blockchains, to understand and evaluate the actual level of decentralization and the practical effects of cryptocurrency-driven reward system in these modern social platforms.

...read moreread less

Abstract: Advances in Blockchain and distributed ledger technologies are driving the rise of incentivized social media platforms over Blockchains, where no single entity can take control of the information and users can receive cryptocurrency as rewards for creating or curating high-quality contents. This paper presents an empirical analysis of Steemit, a key representative of the emerging incentivized social media platforms over Blockchains, to understand and evaluate the actual level of decentralization and the practical effects of cryptocurrency-driven reward system in these modern social media platforms. Similar to Bitcoin, Steemit is operated by a decentralized community, where 21 members are periodically elected to cooperatively operate the platform through the Delegated Proof-of-Stake (DPoS) consensus protocol. Our study performed on 539 million operations performed by 1.12 million Steemit users during the period 2016/03 to 2018/08 reveals that the actual level of decentralization in Steemit is far lower than the ideal level, indicating that the DPoS consensus protocol may not be a desirable approach for establishing a highly decentralized social media platform. In Steemit, users create contents as posts which get curated based on votes from other users. The platform periodically issues cryptocurrency as rewards to creators and curators of popular posts. Although such a reward system is originally driven by the desire to incentivize users to contribute to high-quality contents, our analysis of the underlying cryptocurrency transfer network on the blockchain reveals that more than 16% transfers of cryptocurrency in Steemit are sent to curators suspected to be bots and also finds the existence of an underlying supply network for the bots, both suggesting a significant misuse of the current reward system in Steemit. Our study is designed to provide insights on the current state of this emerging blockchain-based social media platform including the effectiveness of its design and the operation of the consensus protocols and the reward system.

...read moreread less

82 citations

Proceedings Article•10.1145/3292522.3326030•

Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection

[...]

Stefano Cresci, Marinella Petrocchi, Angelo Spognardi¹, Stefano Tognazzi²•Institutions (2)

Sapienza University of Rome¹, IMT Institute for Advanced Studies Lucca²

26 Jun 2019

TL;DR: In this article, a genetic algorithm for the synthesis of online accounts was proposed and experiment with a novel genetic algorithm that allows to create synthetic evolved versions of current state-of-the-art social bots.

...read moreread less

Abstract: The arm race between spambots and spambot-detectors is made of several cycles (or generations): a new wave of spambots is created (and new spam is spread), new spambot filters are derived and old spambots mutate (or evolve) to new species. Recently, with the diffusion of the adversarial learning approach, a new practice is emerging: to manipulate on purpose target samples in order to make stronger detection models. Here, we manipulate generations of Twitter social bots, to obtain - and study - their possible future evolutions, with the aim of eventually deriving more effective detection techniques. In detail, we propose and experiment with a novel genetic algorithm for the synthesis of online accounts. The algorithm allows to create synthetic evolved versions of current state-of-the-art social bots. Results demonstrate that synthetic bots really escape current detection techniques. However, they give all the needed elements to improve such techniques, making possible a proactive approach for the design of social bot detection systems.

...read moreread less

71 citations

Proceedings Article•10.1145/3292522.3326018•

Characterizing Attention Cascades in WhatsApp Groups

[...]

Josemar Alves Caetano¹, Gabriel Magno¹, Marcos André Gonçalves¹, Jussara M. Almeida¹, Humberto Torres Marques-Neto², Virgilio Almeida¹ - Show less +2 more•Institutions (2)

Universidade Federal de Minas Gerais¹, Pontifícia Universidade Católica de Minas Gerais²

26 Jun 2019

TL;DR: This paper characterize and analyze how attention propagates among the participants of a WhatsApp group, and finds specific characteristics in cascades associated with groups that discuss political subjects and false information.

...read moreread less

Abstract: An important political and social phenomena discussed in several countries, like India and Brazil, is the use of WhatsApp to spread false or misleading content. However, little is known about the information dissemination process in WhatsApp groups. Attention affects the dissemination of information in WhatsApp groups, determining what topics or subjects are more attractive to participants of a group. In this paper, we characterize and analyze how attention propagates among the participants of a WhatsApp group. An attention cascade begins when a user asserts a topic in a message to the group, which could include written text, photos, or links to articles online. Others then propagate the information by responding to it. We analyzed attention cascades in more than 1.7 million messages posted in 120 groups over one year. Our analysis focused on the structural and temporal evolution of attention cascades as well as on the behavior of users that participate in them. We found specific characteristics in cascades associated with groups that discuss political subjects and false information. For instance, we observe that cascades with false information tend to be deeper, reach more users, and last longer in political groups than in non-political groups.

...read moreread less

44 citations

Proceedings Article•10.1145/3292522.3326047•

Auditing Autocomplete: Suggestion Networks and Recursive Algorithm Interrogation

[...]

Ronald E. Robertson¹, Shan Jiang¹, David Lazer¹, Christo Wilson¹•Institutions (1)

Northeastern University¹

26 Jun 2019

TL;DR: Recursion algorithm interrogation (RAI) is introduced, a breadth-first search method for auditing autocomplete by recursively submitting a root query and its child suggestions to create a network of algorithmic associations.

...read moreread less

Abstract: Autocomplete algorithms, by design, steer inquiry. When a user provides a root input, such as a search query, these algorithms dynamically retrieve, curate, and present a list of related inputs, such as search suggestions. Although ubiquitous in online platforms, a lack of research addressing the ephemerality of their outputs and the opacity of their functioning raises concerns of transparency and accountability on where inquiry is steered. Here, we introduce recursive algorithm interrogation (RAI), a breadth-first search method for auditing autocomplete by recursively submitting a root query and its child suggestions to create a network of algorithmic associations. We used RAI to conduct a longitudinal audit of autocomplete on Google and Bing using a focused set of root queries -- the names of 38 US governors who were up for reelection -- during the summer of 2018. Comparing across search engines, we found a higher turnover rate among longer and lower ranked suggestions on both search engines, a higher prevalence of social media websites in Google's suggestions, a higher prevalence of words classified as a swear or a negative emotion in Bing's suggestions, and periodic shocks that spanned across most of our root queries. We open source our code for conducting RAI and discuss how it could be applied to other platforms, topics, and settings.

...read moreread less

32 citations

Proceedings Article•10.1145/3292522.3326055•

How Gullible Are You?: Predicting Susceptibility to Fake News

[...]

Tracy Jia Shen¹, Robert Cowell¹, Aditi Gupta¹, Thai Le¹, Amulya Yadav¹, Dongwon Lee¹ - Show less +2 more•Institutions (1)

Pennsylvania State University¹

26 Jun 2019

TL;DR: Building on the crowdsourced annotations of 5 types of susceptible users in Twitter, it is found that susceptible users are correlated with a combination of user, network, and content features, and one can build a reasonably accurate prediction model with 0.82 in AUC-ROC for the multinomial classification task.

...read moreread less

Abstract: In this research, we hypothesize that some social users are more gullible to fake news than others, and accordingly investigate on the susceptibility of users to fake news--i.e., how to identify susceptible users, what are their characteristics, and if one can build a prediction model.Building on the crowdsourced annotations of 5 types of susceptible users in Twitter, we found out that: (1) susceptible users are correlated with a combination of user, network, and content features; (2) one can build a reasonably accurate prediction model with 0.82 in AUC-ROC for the multinomial classification task; and (3) there exists a correlation between the dominant susceptibility level of center nodes and that of the entire network.

...read moreread less

Proceedings Article•10.34962/JWS-70•

Radicalisation Influence in Social Media

[...]

Miriam Fernandez¹, Antonio Gonzalez-Pardo², Harith Alani¹•Institutions (2)

Open University¹, Autonomous University of Madrid²

25 Jun 2019

TL;DR: This paper proposes a computational approach for detecting and predicting the radicalisation influence that a user is subjected to, grounded on the notion of ‘roots of radicalisation’ from social science theories and uses this approach to analyse and compare theradicalisation influence of 112 pro-ISIS and 112 “general” Twitter users.

...read moreread less

Abstract: Identifying signs of online extremism is one of the top priorities for counter-extremist agencies. Social media platforms have become prime locations for radicalisation content and behaviour, and therefore much research and practice nowadays are focused on detecting radicalisation material, and accounts that publish such material, on these platforms. However, there is currently a limited understanding of how people on social media platforms are influenced by such content and behaviour, and what are the dynamics of this influence. In this paper, we propose a computational approach for detecting and predicting the radicalisation influence that a user is subjected to. Our approach is grounded on the notion of ‘roots of radicalisation’ from social science theories. We use our approach to analyse and compare the radicalisation influence of 112 pro-ISIS and 112 “general” Twitter users. Our results show the effectiveness of our proposed algorithms in detecting and predicting radicalisation influence, obtaining up to 0.9 F-1 measure for detection and between 0.7 and 0.8 precision for prediction. We have also conducted an in-depth analysis of the social influence received by the 112 pro-ISIS accounts, and reported on the origin, frequency and topical diversity of this influence. While this is an initial attempt towards the effective combination of social and computational perspectives, more work is needed to bridge these disciplines, and to build on their strengths to target the problem of online radicalisation.

...read moreread less

Proceedings Article•10.34962/JWS-84•

Partisanship, Propaganda and Post-Truth Politics: Quantifying Impact in Online Debate

[...]

Genevieve Gorrell¹, Mehmet E. Bakir¹, Ian Roberts¹, Mark A. Greenwood¹, Benedetta Iavarone, Kalina Bontcheva¹ - Show less +2 more•Institutions (1)

University of Sheffield¹

5 Feb 2019

TL;DR: The authors studied the role of politically-motivated actors and their strategies for influencing and manipulating public opinion online: partisan media, state-backed propaganda, and post-truth politics in the run up to the UK EU membership referendum.

...read moreread less

Abstract: The recent past has highlighted the influential role of social networks and online media in shaping public debate on current affairs and political issues. This paper is focused on studying the role of politically-motivated actors and their strategies for influencing and manipulating public opinion online: partisan media, state-backed propaganda, and post-truth politics. In particular, we present quantitative research on the presence and impact of these three `Ps' in online Twitter debates in two contexts: (i) the run up to the UK EU membership referendum (`Brexit'); and (ii) the information operations of Russia-backed online troll accounts. We first compare the impact of highly partisan versus mainstream media during the Brexit referendum, specifically comparing tweets by half a million `leave' and `remain' supporters. Next, online propaganda strategies are examined, specifically left- and right-wing troll accounts. Lastly, we study the impact of misleading claims made by the political leaders of the leave and remain campaigns. This is then compared to the impact of the Russia-backed partisan media and propaganda accounts during the referendum. In particular, just two of the many misleading claims made by politicians during the referendum were found to be cited in 4.6 times more tweets than the 7,103 tweets related to Russia Today and Sputnik and in 10.2 times more tweets than the 3,200 Brexit-related tweets by the Russian troll accounts.

...read moreread less

Proceedings Article•10.1145/3328413.3328415•

Understanding Demographic Bias and Representation in Social Media Health Data

[...]

Nina Cesare¹, Christan Grant², Elaine O. Nsoesie¹•Institutions (2)

Boston University¹, University of Oklahoma²

26 Jun 2019

TL;DR: It is highlighted that understanding the strengths and limitations of these data sources would enable a rigorous assessment of their usefulness for public health research and provide a means for quantifying uncertainty in research findings.

...read moreread less

Abstract: Text, images, geotags and other data from social media sites lend researchers a unique window into population health trends and disease spread. While these data provide the opportunity to track and measure health outcomes across geographic regions, over extended periods of time, and through complex social networks, they also present challenges. Most notably, these data carry significant biases due to demographic differences in who chooses to use each platform, and what they choose to share. While several publications have discussed the limitations of leveraging social media data for public health research, the amount of literature systematically investigating their demographic bias and exploring mitigation strategies is limited and ripe for interdisciplinary contributions. In this discussion paper, we highlight that understanding the strengths and limitations of these data sources would enable a rigorous assessment of their usefulness for public health research and provide a means for quantifying uncertainty in research findings.

...read moreread less

Proceedings•10.1145/3292522•

WebSci '19 : Proceedings

[...]

Paolo Boldi¹, Brooke Foucault Welles, K. Katharina, W. Christo•Institutions (1)

University of Milan¹

1 Jun 2019

Proceedings Article•10.1145/3292522.3326031•

A Broad Evaluation of the Tor English Content Ecosystem

[...]

Mahdieh Zabihimayvan¹, Reza Sadeghi¹, Derek Doran¹, Mehdi Allahyari²•Institutions (2)

Wright State University¹, Georgia Southern University²

26 Jun 2019

TL;DR: Wu et al. as discussed by the authors performed a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterized the 'types' of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure.

...read moreread less

Abstract: Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the 'types' of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. The domain structure is publicly available as a dataset at \urlhttps://github.com/wsu-wacs/TorEnglishContent.

...read moreread less

Proceedings Article•10.1145/3292522.3326057•

How Representative is an Abortion Debate on Twitter

[...]

Eduardo Graells-Garrido¹, Ricardo Baeza-Yates², Mounia Lalmas•Institutions (2)

Universidad del Desarrollo¹, Northeastern University²

26 Jun 2019

TL;DR: It is found that Twitter has strong biases in population representation, but when carefully paired with demographic attributes, Twitter-based insights on the characteristics of political discussion match those from national-level surveys.

...read moreread less

Abstract: Today, more than ever, social networks and micro-blogging platforms are used as tools for political exchange. However, these platforms are biased in several aspects, from their algorithms to the population participating in them. With respect to the latter, we analyze the discussion on Twitter about an abortion bill in Chile, proposed in January 2015, and approved as law in September 2017. We find that Twitter has strong biases in population representation. Still, when carefully paired with demographic attributes, Twitter-based insights on the characteristics of political discussion match those from national-level surveys.

...read moreread less

Proceedings Article•10.1145/3292522.3326024•

Minority Report: Cyberbullying Prediction on Instagram

[...]

Charalampos Chelmis¹, Mengfan Yao¹•Institutions (1)

State University of New York System¹

26 Jun 2019

TL;DR: Beyond cyberbullying prediction, this work is the first to provide insights on the forecasting performance of multi-task regression as a function of the prediction horizon and the length of available historical data.

...read moreread less

Abstract: Introduction. Cyberbullying, as a form of abusive online behavior, although not well-defined, is a repetitive process, i.e., a sequence of harassing messages sent from a bully to a victim over a period of time with the intent to harm the victim. Numerous automated, data-driven approaches have been developed for the automatic classification of cyberbullying instances, with emphasis on classification accuracy. While the importance of highly accurate classifiers is undoubted, a key pitfall of existing cyberbullying detection methods is that (i) they disregard the repetitive nature of the harassing process, and (ii) they work retrospectively (i.e., after a cyberbullying incident has occurred), making it difficult to intervene before an interaction escalates. Motivated by the scarcity of methods to anticipate cyberbullying, we focus on cyberbullying prediction with the goal of reducing the time from detection to intervention. Methods. We formulate the prediction of the number of harassing comments a media session will receive over a period of time as a regularized multi-task regression problem. In our formulation, we consider two settings where (i) the progression of cyberbullying behavior from some time point in the near future to subsequent time points further into the future is modeled given limited knowledge of the recent past, and (ii) increasingly more historical data is accumulated to improve prediction accuracy. To validate our approach, we conduct an extensive experimental evaluation on a real-world dataset from Instagram, the online social media platform with the highest percentage of users reporting experiencing cyberbullying. Results. Intuitively, the larger the number of observed comments in the recent past of a media session, the better the predictive power of our approach. The downside to using more historical data is that decisions must be postponed until more comments are collected. Therefore, the trade-off between accuracy and decision speed is examined. In general, our approach outperforms competing approaches by up to 31.4% and 46.2% in Recall and Mathew correlation coefficient respectively. Discussion. Our approach can be used to effectively prioritize media sessions for increased monitoring as time goes by or for immediate intervention before a conversation escalates. In future work, we plan to incorporate additional features and investigate the generalizability of our approach on other key social networking venues where users frequently become victims of cyberbullying. Beyond cyberbullying prediction, our work is, to the best of our knowledge, the first to provide insights on the forecasting performance of multi-task regression as a function of the prediction horizon and the length of available historical data. We thus believe that our work can serve as a reference point on the forecasting performance of multi-task regression both for researchers and practitioners.

...read moreread less

Proceedings Article•10.1145/3292522.3326040•

Harnessing Collective Intelligence in P2P Lending

[...]

Henry K. Dambanemuya¹, Emőke-Ágnes Horvát¹•Institutions (1)

Northwestern University¹

26 Jun 2019

TL;DR: It is found that the wisdom of the lending crowd is most prominent in the auto loan category, but it is statistically significant for all other categories except student debt, which contributes new insights on how signals deduced from lending behaviour can improve the efficiency of crowd financing.

...read moreread less

Abstract: Crowd financing is a burgeoning phenomenon that promises to improve access to capital by enabling borrowers with limited financial opportunities to receive small contributions from individual lenders towards unsecured loan requests. Faced with information asymmetry about borrowers' credibility, individual lenders bear the entire loss in case of loan default. Predicting loan payment is therefore crucial for lenders and for the sustainability of these platforms. To this end, we examine whether the ''wisdom'' of the lending crowd can provide reliable decision support with respect to projects' long-term success. Using data from Prosper.com, we investigate the association between the dynamics of lending behaviour and successful loan payment through interpretable classification models. We find evidence for collective intelligence signals in lending behaviour and observe variability in crowd wisdom across loan categories. We find that the wisdom of the lending crowd is most prominent in the auto loan category, but it is statistically significant for all other categories except student debt. Our study contributes new insights on how signals deduced from lending behaviour can improve the efficiency of crowd financing thereby contributing to economic growth and societal development.

...read moreread less

Proceedings Article•10.1145/3292522.3326013•

As the Tweet, so the Reply?: Gender Bias in Digital Communication with Politicians

[...]

Armin Mertens, Franziska Pradel, Ayjeren Rozyjumayeva¹, Jens Wäckerle•Institutions (1)

University of Cologne¹

26 Jun 2019

TL;DR: It is found that politicians' communication on Twitter is driven by party identity rather than gender, but female politicians are significantly more likely to be reduced to their gender rather than to their profession compared to male politicians.

...read moreread less

Abstract: This study investigates gender bias in political interactions on digital platforms by considering how politicians present themselves on Twitter and how they are approached by others. Incorporating social identity theory, we use dictionary analyses to detect biases in individual tweets connected to the German federal elections in 2017. Besides sentiment analysis, we introduce a new measure of personal- vs. job-related content in text data, that is validated with structural topic models. Our results indicate that politicians' communication on Twitter is driven by party identity rather than gender. However, we find systematic gender differences in tweets directed at politicians: female politicians are significantly more likely to be reduced to their gender rather than to their profession compared to male politicians.

...read moreread less

Proceedings Article•10.1145/3292522.3326014•

EAN: Event Attention Network for Stock Price Trend Prediction based on Sentimental Embedding

[...]

Yaowei Wang¹, Qing Li², Zhexue Huang³, Junjie Li³•Institutions (3)

City University of Hong Kong¹, Hong Kong Polytechnic University², Shenzhen University³

26 Jun 2019

TL;DR: An event attention network (EAN) is proposed to exploit sentimental event-embedding for stock price trend prediction and shows that this model performs significantly better in terms of short-term stock trend prediction.

...read moreread less

Abstract: It is only natural that events related to a listed company may cause its stock price to move (either up or down), and the trend of the price movement will be very much determined by the public opinions towards such events. With the help of the Internet and advanced natural language processing techniques, it becomes possible to predict the stock trend by analyzing great amount of online textual resources like news from websites and posts on social media. In this paper, we propose an event attention network (EAN) to exploit sentimental event-embedding for stock price trend prediction. Specially, this model combines the merits from both event-driven prediction and sentiment-driven prediction models, in addition to exploiting sentimental event-embedding. Furthermore, we employ attention mechanism to figure out which event contributes the most to the result or, in another word, which event is the main cause of the price fluctuation. In our model, a convolution neural network (CNN) layer is used to extract salient features from transformed event representations, and the latter are originated from a bi-directional long short-term memory (BiLSTM) layer. We conduct extensive experiments on a manually collected real-world dataset. Experimental results show that our model performs significantly better in terms of short-term stock trend prediction.

...read moreread less

Proceedings Article•10.1145/3292522.3326036•

Characterizing Transport Perception using Social Media: Differences in Mode and Gender

[...]

Paula Vasquez-Henriquez¹, Eduardo Graells-Garrido¹, Diego Caro¹•Institutions (1)

Universidad del Desarrollo¹

26 Jun 2019

TL;DR: This work analyzed 300K tweets about transportation in Santiago, Chile, and estimated the associations between mode of transportation, gender, and the categories of a psycho-linguistic lexicon to provide evidence on which aspects of transportation are relevant in the daily experience.

...read moreread less

Abstract: Transport planners face the growing need to understand the behavior of their users, who base their mobility decisions on several factors, including travel time, quality of service, and security. However, transportation is usually designed with an average user in mind, without considering the needs of important groups, such as women. In this context, we analyzed 300K tweets about transportation in Santiago, Chile. We classified users into modes of transportation, and then we estimated the associations between mode of transportation, gender, and the categories of a psycho-linguistic lexicon. Our results include that women express more anger and sadness than expected, and are worried about sexual harassment. Conversely, men focus more on the spatial aspects of transportation, leisure, and work. Thus, our work provides evidence on which aspects of transportation are relevant in the daily experience, enabling the measurement of the travel experience using social media.

...read moreread less

Proceedings Article•10.1145/3292522.3326044•

An Automated Cyclic Planning Framework Based on Plan-Do-Check-Act for Web of Things Composition

[...]

Mahda Noura¹, Martin Gaedke¹•Institutions (1)

Chemnitz University of Technology¹

26 Jun 2019

TL;DR: A cyclic planning system which adopted a PDCA (Plan-Do-Check-Act) process solution to deal with the existing shortcomings for continuous improvement and enhances the ease of use for end users in the context of the goal-oriented approach GrOWTH.

...read moreread less

Abstract: Empowering end users to be directly involved in the development and composition of their smart devices surrounding them that achieves their goals is a major challenge for End User Development (EUD) in the context of Web of Things (WoT). This can be achieved through Artificial Intelligence (AI) planning. Planning is intended as the ability of a WoT system to construct a sequence of actions, that when executed by the smart devices, achieves an effect on the environment in response to an end user issued goal. The problem of planning specifically for the WoT domain has not been sufficiently dealt with in the existing literature. The existing planning approaches do not deal with one or more of the following important factors in the context of WoT: (1) random unexpected events (2) unpredictable device effects leading to side effects at runtime, and (3) durative effects. In this work, we propose a cyclic planning system which adopted a PDCA (Plan-Do-Check-Act) process solution to deal with the existing shortcomings for continuous improvement. The planner employs domain knowledge based on the WoTDL (Web of Things Description Language) ontology.The cyclic planner enables continuous plan monitoring to cope with inconsistencies with user issued goals. We demonstrate the feasibility of the proposed approach on our smart home testbed. The proposed planner further enhances the ease of use for end users in the context of our goal-oriented approach GrOWTH.

...read moreread less

Proceedings Article•10.1145/3328413.3329402•

Institutional Repositories as a Data Trust Infrastructure

[...]

Arwah Alsaad¹, Kieron O'Hara¹, Leslie Carr¹•Institutions (1)

University of Southampton¹

26 Jun 2019

TL;DR: The institutional repository is proposed as a candidate technology for data trust infrastructure and the potential use of data trusts to solve the problem of data sharing in multi-partner research activities is examined.

...read moreread less

Abstract: This paper examines the potential use of data trusts to solve the problem of data sharing in multi-partner research activities and proposes the institutional repository as a candidate technology for data trust infrastructure.

...read moreread less

Proceedings Article•10.34962/JWS-77•

Network, Text, and Image Analysis of Anti-Muslim Groups on Facebook

[...]

Megan Squire¹•Institutions (1)

Elon University¹

28 Aug 2019

TL;DR: The authors used the public Facebook Graph API to create a large dataset of 700,204 members of 1,870 Facebook groups spanning 10 different far-right ideologies during the time period June 2017 - March 2018 and applied social network analysis techniques to discover which groups and ideologies shared members with anti-Muslim groups during this period.

...read moreread less

Abstract: Islamophobic attitudes and overt acts of hostility toward Muslims in the United States are increasingly commonplace. The goal of this research is to begin to understand how anti-Muslim political groups use the Facebook social network to build their online communities and perpetuate their beliefs. We used the public Facebook Graph API to create a large dataset of 700,204 members of 1,870 Facebook groups spanning 10 different far-right ideologies during the time period June 2017 - March 2018. We first applied social network analysis techniques to discover which groups and ideologies shared members with anti-Muslim groups during this period. Our results show that the anti-Muslim Facebook network has unique characteristics when compared to other categories of far-right extremism. We then assessed 202 anti-Muslim Facebook group cover photos and descriptions for evidence of Islamophobic content. Results indicate that these anti-Muslim groups rely on a predictable collection of visual and linguistic cues to propagate negative stereotypes about Muslims, and that the vast majority of these groups rely heavily on symbols and language that portray Islam as a violent enemy which is deserving of violence and hostility in return. By understanding the important role Islamophobia plays in the hate ecosystem on Facebook, social media users and platform providers can be better prepared to confront and condemn anti-Muslim bias.

...read moreread less

Proceedings Article•10.1145/3292522.3326037•

Investor Retention in Equity Crowdfunding

[...]

Igor Zakhlebin¹, Emőke-Ágnes Horvát¹•Institutions (1)

Northwestern University¹

26 Jun 2019

TL;DR: The role of past successes and diversity of investment decisions for novice vs. serial investors is uncovered and potential strategies for increasing the retention of investors and improving their decisions on crowdfunding platforms are considered.

...read moreread less

Abstract: Crowdfunding platforms promise to disrupt investing as they bypass traditional financial institutions through peer-to-peer transactions. To stay functional, these platforms require a supply of investors who are willing to contribute to campaigns. Yet, little is known about the retention of investors in this setting. Using four years of data from a leading equity crowdfunding platform, we empirically study the length and success of investor activity on the platform. We analyze temporal variations in these outcomes and explain patterns using statistical modeling. Our models are based on information about user's past and current investment decisions, i.e., content-based and structural similarities between the campaigns they invest in. We uncover the role of past successes and diversity of investment decisions for novice vs. serial investors. Our results inform potential strategies for increasing the retention of investors and improving their decisions on crowdfunding platforms.

...read moreread less

Proceedings Article•10.1145/3328413.3328416•

On Bias in Social Reviews of University Courses

[...]

Taha Hassan¹•Institutions (1)

Virginia Tech¹

26 Jun 2019

TL;DR: This paper examined the connection between course outcomes (student-reported GPA) and the overall ranking of the primary course instructor, as well as rating disparity by nature of course outcomes, for several hundred courses taught at Virginia Tech based on data collected from a popular academic rating forum.

...read moreread less

Abstract: University course ranking forums are a popular means of disseminating information about satisfaction with the quality of course content and instruction, especially with undergraduate students. A variety of policy decisions by university administrators, instructional designers and teaching staff affect how students perceive the efficacy of pedagogies employed in a given course, in class and online. While there is a large body of research on qualitative driving factors behind the use of academic rating sites, there is little investigation of the (potential) implicit student bias on said forums towards desirable course outcomes at the institution level. To that end, we examine the connection between course outcomes (student-reported GPA) and the overall ranking of the primary course instructor, as well as rating disparity by nature of course outcomes, for several hundred courses taught at Virginia Tech based on data collected from a popular academic rating forum. We also replicate our analysis for several public universities across the US. Our experiments indicate that there is a discernible albeit complex bias towards course outcomes in the professor ratings registered by students.

...read moreread less

Proceedings Article•10.1145/3292522.3326042•

A Reverse Turing Test for Detecting Machine-Made Texts

[...]

Jialin Shao¹, Adaku Uchendu², Dongwon Lee²•Institutions (2)

Beijing University of Technology¹, Pennsylvania State University²

26 Jun 2019

TL;DR: In this paper, the Reverse Turing Test (RTT) was used to classify man-made vs. machine-made texts in three domains of financial earning reports, research articles, and chatbot dialogues.

...read moreread less

Abstract: As AI technologies rapidly advance, the artifacts created by machines will become prevalent. As recent incidents by the Deepfake illustrate, then, being able to differentiate man-made vs. machine-made artifacts, especially in social media space, becomes more important. In this preliminary work, in this regard, we formulate such a classification task as the Reverse Turing Test (RTT) and investigate on the contemporary status to be able to classify man-made vs. machine-made texts. Studying real-life machine-made texts in three domains of financial earning reports, research articles, and chatbot dialogues, we found that the classification of man-made vs. machine-made texts can be done at least as accurate as 0.84 in F1 score. We also found some differences between man-made and machine-made in sentiment, readability, and textual features, which can help differentiate them.

...read moreread less

Proceedings Article•10.1145/3292522.3326046•

Pwned: The Risk of Exposure From Data Breaches

[...]

Gaurav Sood¹, Ken Cor²•Institutions (2)

Microsoft¹, University of Alberta²

26 Jun 2019

TL;DR: The better educated, the middle-aged, women, and Whites are more likely to have had their accounts breached than the complementary groups.

...read moreread less

Abstract: News about massive data breaches is increasingly common. But what proportion of Americans are exposed in these breaches is still unknown. We combine data from a large, representative sample of American adults (n = 5,000), recruited by YouGov, with data from Have I Been Pwned to estimate the lower bound of the number of times Americans' private information has been exposed. We find that at least 82.84% of Americans have had their private information, such as account credentials, Social Security Number, etc., exposed. On average, Americans' private information has been exposed in at least three breaches. The better educated, the middle-aged, women, and Whites are more likely to have had their accounts breached than the complementary groups.

...read moreread less

Proceedings Article•10.1145/3328413.3329405•

Search and Justification Behavior During Multimedia Web Search for Procedural Knowledge

[...]

Georg Pardi¹, Yvonne Kammerer¹, Peter Gerjets¹•Institutions (1)

Leibniz Institute for Neurobiology¹

26 Jun 2019

TL;DR: It is indicated that the modality of information resources at least to some extent plays a role during web search for procedural learning resources.

...read moreread less

Abstract: In an eye-tracking study, N = 38 participants performed two procedural-knowledge search tasks by using a mockup multimedia search engine results page (SERP). By presenting both conventional websites and videos as results on the SERP, we aimed at examining the role of the modality of information resources in individuals' retrieval behavior as well in their final recommendation of one most suitable information resource. Across both tasks, the results of this study indicate that participants who finally recommended a video resource spent a greater proportion of time inspecting video results on the SERP as well as on the video resources themselves. Furthermore, participants' written justifications for the recommended information resource revealed that in both tasks about one third of the participants mentioned the modality of the information resource to justify their recommendation decision. Our findings indicate that the modality of information resources at least to some extent plays a role during web search for procedural learning resources.

...read moreread less