Top 67 papers presented at Web Science in 2018

Showing papers presented at "Web Science in 2018"

Journal Article•10.1007/S10462-016-9516-4•

Categorizing feature selection methods for multi-label classification

[...]

Rafael B. Pereira¹, Alexandre Plastino¹, Bianca Zadrozny², Luiz Henrique de Campos Merschmann³•Institutions (3)

Federal Fluminense University¹, IBM², Universidade Federal de Ouro Preto³

1 Jan 2018

TL;DR: This work provides a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi- label classification setting and concludes with concrete suggestions for future research in multi-label feature selection.

...read moreread less

Abstract: In many important application domains such as text categorization, biomolecular analysis, scene classification and medical diagnosis, examples are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research on feature selection methods that allow the identification of relevant and informative features for multi-label classification. However, the methods proposed for this task are scattered in the literature, with no common framework to describe them and to allow an objective comparison. Here, we revisit a categorization of existing multi-label classification methods and, as our main contribution, we provide a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi-label classification setting. We conclude this work with concrete suggestions for future research in multi-label feature selection which have been derived from our categorization and analysis.

...read moreread less

191 citations

Proceedings Article•10.1145/3201064.3201100•

Fake News vs Satire: A Dataset and Analysis

[...]

Jennifer Golbeck¹, Matthew Louis Mauriello¹, Brooke E. Auxier¹, Keval H. Bhanushali¹, Christopher Bonk¹, Mohamed Amine Bouzaghrane¹, Cody Buntain¹, Riya Chanduka¹, Paul Cheakalos¹, Jennine B. Everett¹, Waleed Falak¹, Carl Gieringer¹, Jack Graney¹, Kelly M. Hoffman¹, Lindsay Huth¹, Zhenya Ma¹, Mayanka Jha¹, Misbah Khan¹, Varsha Kori¹, Elo Lewis¹, George Mirano¹, William T. Mohn¹, Sean Mussenden¹, Tammie M. Nelson¹, Sean Mcwillie¹, Akshat Pant¹, Priya Shetye¹, Rusha Shrestha¹, Alexandra Steinheimer¹, Aditya Subramanian¹, Gina Visnansky¹ - Show less +27 more•Institutions (1)

University of Maryland, College Park¹

15 May 2018

TL;DR: This work presents a dataset of fake news and satire stories that are hand coded, verifiable, and, in the case offake news, include rebutting stories, and includes a thematic content analysis of the articles, identifying major themes that include hyperbolic support or con- demnation of a gure, conspiracy theories, racist themes, and dis- crediting of reliable sources.

...read moreread less

Abstract: Fake news has become a major societal issue and a technical chal- lenge for social media companies to identify. This content is dif- cult to identify because the term "fake news" covers intention- ally false, deceptive stories as well as factual errors, satire, and sometimes, stories that a person just does not like. Addressing the problem requires clear de nitions and examples. In this work, we present a dataset of fake news and satire stories that are hand coded, veri ed, and, in the case of fake news, include rebutting stories. We also include a thematic content analysis of the articles, identifying major themes that include hyperbolic support or con- demnation of a gure, conspiracy theories, racist themes, and dis- crediting of reliable sources. In addition to releasing this dataset for research use, we analyze it and show results based on language that are promising for classi cation purposes. Overall, our contri- bution of a dataset and initial analysis are designed to support fu- ture work by fake news researchers.

...read moreread less

147 citations

Proceedings Article•10.1145/3201064.3201081•

Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination

[...]

Raphael Ottoni¹, Evandro Cunha¹, Gabriel Magno¹, Pedro Dalla Bernardina¹, Wagner Meira¹, Virgilio Almeida¹ - Show less +2 more•Institutions (1)

Universidade Federal de Minas Gerais¹

15 May 2018

TL;DR: Analysis of issues related to hate, violence and discriminatory bias in a dataset containing more than 7,000 videos and 17 million comments shows that right-wing channels tend to contain a higher degree of words from "negative'' semantic fields.

...read moreread less

Abstract: As of 2018, YouTube, the major online video sharing website, hosts multiple channels promoting right-wing content. In this paper, we observe issues related to hate, violence and discriminatory bias in a dataset containing more than 7,000 videos and 17 million comments. We investigate similarities and differences between users' comments and video content in a selection of right-wing channels and compare it to a baseline set using a three-layered approach, in which we analyze (a) lexicon, (b) topics and (c) implicit biases present in the texts. Among other results, our analyses show that right-wing channels tend to (a) contain a higher degree of words from "negative'' semantic fields, (b) raise more topics related to war and terrorism, and (c) demonstrate more discriminatory bias against Muslims (in videos) and towards LGBT people (in comments). Our findings shed light not only into the collective conduct of the YouTube community promoting and consuming right-wing content, but also into the general behavior of YouTube users.

...read moreread less

87 citations

Proceedings Article•10.1145/3201064.3201082•

Understanding the Roots of Radicalisation on Twitter

[...]

Miriam Fernandez¹, Moizzah Asif¹, Harith Alani¹•Institutions (1)

Open University¹

15 May 2018

TL;DR: This paper proposes a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of 'roots of radicalisation' from social science models, and results show the effectiveness of the proposed algorithms.

...read moreread less

Abstract: In an increasingly digital world, identifying signs of online extremism sits at the top of the priority list for counter-extremist agencies. Researchers and governments are investing in the creation of advanced information technologies to identify and counter extremism through intelligent large-scale analysis of online data. However, to the best of our knowledge, these technologies are neither based on, nor do they take advantage of, the existing theories and studies of radicalisation. In this paper we propose a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of 'roots of radicalisation' from social science models. This approach has been applied to analyse and compare the radicalisation level of 112 pro-ISIS vs.112 "general" Twitter users. Our results show the effectiveness of our proposed algorithms in detecting and predicting radicalisation influence, obtaining up to 0.9 F-1 measure for detection and between 0.7 and 0.8 precision for prediction. While this is an initial attempt towards the effective combination of social and computational perspectives, more work is needed to bridge these disciplines, and to build on their strengths to target the problem of online radicalisation.

...read moreread less

73 citations

Proceedings Article•10.1145/3201064.3201105•

Worth its Weight in Likes: Towards Detecting Fake Likes on Instagram

[...]

Indira Sen¹, Anupama Aggarwal¹, Shiven Mian¹, Siddharth Singh¹, Ponnurangam Kumaraguru¹, Anwitaman Datta² - Show less +2 more•Institutions (2)

Indraprastha Institute of Information Technology¹, Nanyang Technological University²

15 May 2018

TL;DR: This work builds an automated mechanism to detect fake likes on Instagram which achieves a high precision of 83.5% and serves an important first step in reducing the effect of fake like on Instagram influencer market.

...read moreread less

Abstract: Instagram is a significant platform for users to share media; reflecting their interests. It is used by marketers and brands to reach their potential audience for advertisement. The number of likes on posts serves as a proxy for social reputation of the users, and in some cases, social media influencers with an extensive reach are compensated by marketers to promote products. This emerging market has led to users artificially bolstering the likes they get to project an inflated social worth. In this study, we enumerate the potential factors which contribute towards a genuine like on Instagram. Based on our analysis of liking behaviour, we build an automated mechanism to detect fake likes on Instagram which achieves a high precision of 83.5%. Our work serves an important first step in reducing the effect of fake likes on Instagram influencer market.

...read moreread less

54 citations

Journal Article•10.1561/106.00000014•

On the Ubiquity of Web Tracking: Insights from a Billion-Page Web Crawl

[...]

Sebastian Schelter¹, Jérôme Kunegis²•Institutions (2)

Technical University of Berlin¹, University of Koblenz and Landau²

26 Jan 2018

TL;DR: It is confirmed that trackers are widespread, and that a small number of trackers dominates the web (Google, Facebook and Twitter), and that Google still operates services on Chinese websites, despite its proclaimed retreat from the Chinese market.

...read moreread less

Abstract: We perform a large-scale analysis of third-party trackers on the World Wide Web. We extract third-party embeddings from more than 3.5~billion web pages of the CommonCrawl 2012 corpus, and aggregate those to a dataset containing more than 140 million third-party embeddings in over 41 million domains. To the best of our knowledge, this constitutes the largest empirical web tracking dataset collected so far, and exceeds related studies by more than an order of magnitude in the number of domains and web pages analyzed. Due to the enormous size of the dataset, we are able to perform a large-scale study of online tracking, on three levels: (1) On a global level, we give a precise figure for the extent of tracking, give insights into the structural properties of the `online tracking sphere' and analyse which trackers (and subsequently, which companies) are used by how many websites. (2) On a country-specific level, we analyse which trackers are used by websites in different countries, and identify the countries in which websites choose significantly different trackers than in the rest of the world. (3) We answer the question whether the content of websites influences the choice of trackers they use, leveraging more than ninety thousand categorized domains. In particular, we analyse whether highly privacy-critical websites about health and addiction make different choices of trackers than other websites. Based on the performed analyses, we confirm that trackers are widespread (as expected), and that a small number of trackers dominates the web (Google, Facebook and Twitter). In particular, the three tracking domains with the highest PageRank are all owned by Google. The only exception to this pattern are a few countries such as China and Russia. Our results suggest that this dominance is strongly associated with country-specific political factors such as freedom of the press. Furthermore, our data confirms that Google still operates services on Chinese websites, despite its proclaimed retreat from the Chinese market. We also confirm that websites with highly privacy-critical content are less likely to contain trackers (60\% vs 90\% for other websites), even though the majority of them still do contain trackers.

...read moreread less

45 citations

Journal Article•10.20964/2018.02.58•

Glycerol and Ethanol Oxidation in Alkaline Medium Using PtCu/C Electrocatalysts

[...]

Cristiane Angélica Ottoni¹, Carlos Eduardo Domingues Ramos², R. F. B. de Souza³, S. G. da Silva², Estevam V. Spinacé², Almir Oliveira Neto² - Show less +2 more•Institutions (3)

Sao Paulo State University¹, National Nuclear Energy Commission², McGill University³

1 Feb 2018

TL;DR: Aolivei et al. as discussed by the authors proposed a method to solve the problem of energy-efficient nuclear power plant design in Brazil by using IPEN/CNEN-SP.

...read moreread less

Abstract: 1 Bioscience Institute, São Paulo State University, 11380-972 São Vicente, SP, Brazil. 2 Instituto de Pesquisas Energéticas e Nucleares, IPEN/CNEN-SP, Av. Prof. Lineu Prestes, 2242 Cidade Universitária, CEP 05508-900 São Paulo, SP, Brazil. 3 Department of Chemistry, Federal University of Amazonas, Av. General Rodrigo Octávio, 6200, Coroado I CEP: 69080-900, Manaus, AM. Brazil. * E-mail: aolivei@ipen.br

...read moreread less

34 citations

Proceedings Article•10.1145/3201064.3201103•

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

[...]

Mohammadreza Rezvan, Saeedeh Shekarpour¹, Lakshika Balasuriya, Krishnaprasad Thirunarayan, Valerie L. Shalin, Amit P. Sheth - Show less +2 more•Institutions (1)

University of Dayton¹

15 May 2018

TL;DR: This article crawled data from Twitter using a content-tailored lexicon and annotated 25,000 tweets for the different types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political).

...read moreread less

Abstract: A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2.

...read moreread less

33 citations

Proceedings Article•

Lost in the Dream? Measuring the effects of Operation Bayonet on vendors migrating to Dream Market

[...]

R.S. van Wegberg, Thijmen Verburgh¹•Institutions (1)

Netherlands Organisation for Applied Scientific Research¹

1 Jan 2018

TL;DR: In this article, the authors investigate the effects of the operation on all newly registered vendors on Dream Market (n=220) during and shortly after Operation Bayonet by mapping their individual and historic characteristics to discern migration patterns and changes in vendor behavior.

...read moreread less

Abstract: In the summer of 2017, an international policing effort - named Operation Bayonet - led by the Federal Bureau of Investigation (FBI) and the Dutch National High Tech Crime Unit (NHTCU) targeted two prominent online anonymous markets. On the one hand, the FBI succeeded in the take-down of AlphaBay, on the other hand the NHTCU took over, operated and shut down Hansa Market. By coordinating these efforts and planning these actions sequentially, both agencies expected users active on AlphaBay to make their way to Hansa Market - which at that moment was in complete control and operated by the NHTCU. To assess the effects of Operation Bayonet, we leverage measurements of the user-base of current market leader, and then safe haven: Dream Market. We investigate the effects of the operation on all newly registered vendors on Dream Market (n=220) during and shortly after Operation Bayonet by mapping their individual and historic characteristics to discern migration patterns and changes in vendor behavior. Compared to ‘simple’ take-downs, like the AlphaBay take-down, the effects of the Hansa Market shut down on vendors seem remarkably different. Vendors do not just simply move on after the Hansa Market shutdown. Few simply migrate, some take precautions like changing their username and/or PGP-key, but many start over with a clean slate - erasing their past reputation completely - and are truly ‘Lost in the Dream’

...read moreread less

28 citations

Proceedings Article•10.1145/3201064.3201085•

Focused Crawl of Web Archives to Build Event Collections

[...]

Martin Klein¹, Lyudmila Balakireva¹, Herbert Van de Sompel¹•Institutions (1)

Los Alamos National Laboratory¹

15 May 2018

TL;DR: In this paper, the authors investigate the feasibility of performing focused crawls on the archived web by utilizing the Memento infrastructure, and compare the relevance of their resources to collections built from crawling the live web as well as from a manually curated collection.

...read moreread less

Abstract: Event collections are frequently built by crawling the live web on the basis of seed URIs nominated by human experts. Focused web crawling is a technique where the crawler is guided by reference content pertaining to the event. Given the dynamic nature of the web and the pace with which topics evolve, the timing of the crawl is a concern for both approaches. We investigate the feasibility of performing focused crawls on the archived web. By utilizing the Memento infrastructure, we obtain resources from 22 web archives that contribute to building event collections. We create collections on four events and compare the relevance of their resources to collections built from crawling the live web as well as from a manually curated collection. Our results show that focused crawling on the archived web can be done and indeed results in highly relevant collections, especially for events that happened further in the past

...read moreread less

22 citations

Journal Article•10.1590/S1679-45082018AO3856•

Bronchial hygiene techniques in patients on mechanical ventilation: what are used and why?

[...]

Isabela Naiara Evangelista Matilde¹, Raquel Afonso Caserta Eid¹, Andréia Ferreira Nunes¹, Alexandre Ricardo Pepe Ambrozin², Renata Henn Moura¹, Denise Carnieli-Cazati¹, Karina T. Timenetsky¹ - Show less +3 more•Institutions (2)

Albert Einstein Hospital¹, Sao Paulo State University²

1 Jan 2018

TL;DR: Physical therapy is mostly based on individual experience acquired in the clinical practice, and not on the scientific literature, according to a prospective multicenter study using a questionnaire.

...read moreread less

Abstract: Objective To analyze and describe the maneuvers most commonly used in clinical practice by physical therapists and the reasons for choosing them. Methods A prospective multicenter study using a questionnaire. The sample consisted of physical therapists from five hospitals (three private hospitals, a teaching hospital and a public hospital). Results A total of 185 questionnaires were filled in. Most professionals had graduated 6 to 10 years before and over had over 10 years of intensive care unit experience. The most often used maneuvers were vibrocompression, hyperinflation, postural drainage, tracheal suction and motor mobilization. The most frequent reason for choosing these maneuvers was "I notice they are more efficient in clinical practice." Conclusion Physical therapy is mostly based on individual experience acquired in the clinical practice, and not on the scientific literature.

...read moreread less

Journal Article•10.3390/S18072324•

Behavior of an Inductive Loop Sensor in the Measurement of Partial Discharge Pulses with Variations in Its Separation from the Primary Conductor

[...]

Jorge Alfredo Ardila-Rey¹, Aldo Barrueto¹, Alvaro Zerene¹, Bruno Albuquerque de Castro², José Alfredo Covolan Ulson², Abdullahi Abubakar Mas'ud³, Patricio Valdivia¹ - Show less +3 more•Institutions (3)

Federico Santa María Technical University¹, Sao Paulo State University², Jubail Industrial College³

18 Jul 2018

TL;DR: The inductive loop sensor has experimentally been demonstrated to be capable of properly measuring different types of partial discharges, but because of its current design, there are several practical limitations on its use in real devices or environments.

...read moreread less

Abstract: Ideally, an insulation system must be capable of electrically insulating the active components of a machine or device subjected to high voltages. However, due to the presence of polluting agents or imperfections inside or on the surface of the insulation, small current pulses called partial discharges (PDs) are common, which partially short-circuit the insulation and cause it to lose its insulating properties, and thus its insulation capacity, over time. In some cases, measurements of this phenomenon are limited by the type of sensor used; if it is not adequate, it can distort the obtained results, which can lead to a misdiagnosis of the state of the device. The inductive loop sensor has experimentally been demonstrated to be capable of properly measuring different types of PDs. However, because of its current design, there are several practical limitations on its use in real devices or environments. An example is the presence of a primary conductor located at a fixed distance from the sensor, through which PD pulses must flow for the sensor to capture them. In this article, the sensor's behavior is studied at different separation distances from the line through which the PD pulses flow. In addition, the measuring capacity of the sensor is tested by removing the presence of the primary conductor and placing the sensor directly over the line through which the PD pulses of a real device flow.

...read moreread less

Proceedings Article•10.1145/3201064.3201076•

Viewpoint Discovery and Understanding in Social Networks

[...]

Mainul Quraishi¹, Pavlos Fafalios¹, Eelco Herder²•Institutions (2)

Leibniz University of Hanover¹, Radboud University Nijmegen²

15 May 2018

TL;DR: In this article, a graph partitioning method is proposed to discover different communities discussing about a controversial topic in a social network like Twitter, which allows detecting descriptive terms that characterize the different viewpoints as well as understanding how a specific term is related to a viewpoint.

...read moreread less

Abstract: The Web has evolved to a dominant platform where everyone has the opportunity to express their opinions, to interact with other users, and to debate on emerging events happening around the world. On the one hand, this has enabled the presence of different viewpoints and opinions about a - usually controversial - topic (like Brexit), but at the same time, it has led to phenomena like media bias, echo chambers and filter bubbles, where users are exposed to only one point of view on the same topic. Therefore, there is the need for methods that are able to detect and explain the different viewpoints. In this paper, we propose a graph partitioning method that exploits social interactions to enable the discovery of different communities (representing different viewpoints) discussing about a controversial topic in a social network like Twitter. To explain the discovered viewpoints, we describe a method, called Iterative Rank Difference (IRD), which allows detecting descriptive terms that characterize the different viewpoints as well as understanding how a specific term is related to a viewpoint (by detecting other related descriptive terms). The results of an experimental evaluation showed that our approach outperforms state-of-the-art methods on viewpoint discovery, while a qualitative analysis of the proposed IRD method on three different controversial topics showed that IRD provides comprehensive and deep representations of the different viewpoints.

...read moreread less

Journal Article•10.1108/JOCM-05-2017-0159•

“In sickness and in health, in poverty and in wealth?”: economic crises and CSR change management in difficult times

[...]

Bruno Michel Roman Pais Seles¹, Ana Beatriz Lopes de Sousa Jabbour, Charbel José Chiappetta Jabbour, Daniel Jugend¹•Institutions (1)

Sao Paulo State University¹

12 Feb 2018

TL;DR: In this paper, an integrative literature review was conducted, considering: the economic and geographical context in which the research was conducted; the focus of each piece of research; the adopted research methods; organisational theories of analytical support; the sectors analysed; and the effects of economic crises on CSR initiatives and environmental management.

...read moreread less

Abstract: “Economic crises” and “corporate social responsibility (CSR) initiatives” are two issues that dominate the modern business agenda. Although related, these issues have been analysed separately, and so a significant gap is perpetuated between the two. What are the effects of economic crises on CSR initiatives? Can organisational social initiatives withstand economic crises? The purpose of this paper is to answer these questions.,An integrative literature review was conducted, considering: the economic and geographical context in which the research was conducted; the focus of each piece of research; the adopted research methods; organisational theories of analytical support; the sectors analysed; and the effects of economic crises on CSR initiatives and environmental management.,Some of the findings were as follows: most of the studies analysed reported that CSR helps companies to cope with economic crises by increasing the efficiency of investments and establishing better relations with stakeholders and markets; environmental practices are related to negative environmental performance in periods of economic crises; and CSR relates positively to financial performance in periods of economic crises.,This is one of the first integrative literature reviews to investigate what happens to the relationship between businesses and sustainable change management in periods of crises. This paper also offers a future research agenda for the issue, with 12 questions still unanswered by the latest research.

...read moreread less

Proceedings Article•10.1145/3201064.3201071•

Predicting Email and Article Clickthroughs with Domain-adaptive Language Models

[...]

Kokil Jaidka¹, Tanya Goyal², Niyati Chhaya³•Institutions (3)

University of Pennsylvania¹, University of Texas at Austin², Adobe Systems³

15 May 2018

TL;DR: Differences in recipients' preferences for subject lines of marketing emails from different industries, in terms of their clickthrough rates on marketing emails sent by different businesses in Finance, Cosmetics and Television industries, are explored.

...read moreread less

Abstract: Marketing practices have adopted the use of computational approaches in order to optimize the performance of their promotional emails and site advertisements. In the case of promotional emails, subject lines have been found to offer a reliable signal of whether the recipient will open an email or not. Clickbait headlines are also known to drive reader engagement. In this study, we explore the differences in recipients' preferences for subject lines of marketing emails from different industries, in terms of their clickthrough rates on marketing emails sent by different businesses in Finance, Cosmetics and Television industries. Different stylistic strategies of subject lines characterize high clickthroughs in different commercial verticals. For instance, words providing insight and signaling cognitive processing lead to more clickthroughs for the Finance industry; on the other hand, social words yield more clickthroughs for the Movies and Television industry. Domain adaptation can further improve predictive performance for unseen businesses by an average of 16.52% over generic industry-specific predictive models. We conclude with a discussion on the implications of our findings and suggestions for future work.

...read moreread less

Proceedings Article•10.1145/3201064.3201072•

Web Access Literacy Scale to Evaluate How Critically Users Can Browse and Search for Web Information

[...]

Yusuke Yamamoto¹, Takehiro Yamamoto², Hiroaki Ohshima³, Hiroshi Kawakami²•Institutions (3)

Shizuoka University¹, Kyoto University², University of Hyogo³

15 May 2018

TL;DR: An online study with participants recruited through a crowdsourcing service confirmed that the proposed web access literacy scale is reliable and valid and is expected to contribute to the design of information access systems or educational classes to encourage users to reflect on and improve theirweb access literacy relative to critical information seeking.

...read moreread less

Abstract: We propose a web access literacy scale to assess user ability to scrutinize web information and gather accurate information using information access systems, such as web search engines. We conducted an online study with participants recruited through a crowdsourcing service. Analysis of the questionnaire responses confirmed that the proposed web access literacy scale is reliable and valid. We also noted the following pointers: (1) Web users may not pay significant attention to web page authors and their expertise when judging information credibility. (2) Users may have weaknesses relative to the use of web search engines and tolerance for cognitive bias that appears in credibility assessment of web information. The results of this study are expected to contribute to the design of information access systems or educational classes to encourage users to reflect on and improve their web access literacy relative to critical information seeking.

...read moreread less

Proceedings Article•10.1145/3201064.3201086•

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

[...]

Adina Nerghes, Peter Kerkhof¹, Iina Hellsten²•Institutions (2)

VU University Amsterdam¹, University of Amsterdam²

15 May 2018

TL;DR: The results show that 12 out of the 35 videos in the data set focused on conspiracy theories, but no statistical differences were found in the number of user activity and sentiment between the two types of videos.

...read moreread less

Abstract: In this paper, we analyze the content of the most popular videos posted on YouTube in the first phase of the Zika-virus outbreak in 2016, and the user responses to those videos. More specifically, we examine the extent to which informational and conspiracy theory videos differ in terms of user activity (number of comments, shares, likes and dislikes), and the sentiment and content of the user responses. Our results show that 12 out of the 35 videos in our data set focused on conspiracy theories, but no statistical differences were found in the number of user activity and sentiment between the two types of videos. The content of the user responses shows that users respond differently to sub-topics related to Zika-virus. The implications of the results for future online health promotion campaigns are discussed.

...read moreread less

Proceedings Article•10.1145/3201064.3201094•

Social Gamification in Enterprise Crowdsourcing

[...]

Gregory Afentoulidis¹, Zoltán Szlávik², Jie Yang³, Alessandro Bozzon¹•Institutions (3)

Delft University of Technology¹, IBM², University of Fribourg³

15 May 2018

TL;DR: This work highlights the importance of the competitive and collaborative social dynamics within the enterprise and underline the contradictory nature of those dynamics, which combined might lead to detrimental effects towards the engagement to crowdsourcing activities.

...read moreread less

Abstract: Enterprise crowdsourcing capitalises on the availability of employees for in-house data processing. Gamification techniques can help aligning employees' motivation to the crowdsourcing endeavour. Although hitherto, research efforts were able to unravel the wide arsenal of gamification techniques to construct engagement loops, little research has shed light into the social game dynamics that those foster and how those impact crowdsourcing activities. This work reports on a study that involved 101 employees from two multinational enterprises. We adopt a user-centric approach to apply and experiment with gamification for enterprise crowdsourcing purposes. Through a qualitative study, we highlight the importance of the competitive and collaborative social dynamics within the enterprise. By engaging the employees with a mobile crowdsourcing application, we showcase the effectiveness of competitiveness towards higher levels of engagement and quality of contributions. Moreover, we underline the contradictory nature of those dynamics, which combined might lead to detrimental effects towards the engagement to crowdsourcing activities.

...read moreread less

Proceedings Article•10.1145/3201064.3201066•

Guidelines for Online Network Crawling: A Study of Data Collection Approaches and Network Properties

[...]

Katchaguy Areekijseree¹, Ricky Laishram¹, Sucheta Soundarajan¹•Institutions (1)

Syracuse University¹

15 May 2018

TL;DR: This paper performs a detailed, hypothesis-driven analysis of several online crawling algorithms, ranging from classical crawling methods to modern, state-of-the-art algorithms, with respect to the task of collecting as much data as possible given a fixed query budget, and shows that the performance of these methods depends strongly on the network structure.

...read moreread less

Abstract: Over the past two decades, online social networks have attracted a great deal of attention from researchers. However, before one can gain insight into the properties or structure of a network, one must first collect appropriate data. Data collection poses several challenges, such as API or bandwidth limits, which require the data collector to carefully consider which queries to make. Many online network crawling methods have been proposed, but it is not always clear which method should be used for a given network. In this paper, we perform a detailed, hypothesis-driven analysis of several online crawling algorithms, ranging from classical crawling methods to modern, state-of-the-art algorithms, with respect to the task of collecting as much data (nodes or edges) as possible given a fixed query budget. We show that the performance of these methods depends strongly on the network structure. We identify three relevant network characteristics: community separation, average community size, and average node degree. We present experiments on both real and synthetic networks, and provide guidelines to researchers regarding selection of an appropriate sampling method.

...read moreread less

Proceedings Article•10.1145/3201064.3201090•

The Shape of Arab Feminism on Facebook

[...]

Nada Al Bunni¹, David E. Millard¹, Jeff Vass¹•Institutions (1)

University of Southampton¹

15 May 2018

TL;DR: By crawling Arabic feminist pages over Facebook, this paper builds a dataset that can be analysed using social network analysis tools and reveals the map of influence between Arabic feminist network and the western, transnational, and Global feminist networks.

...read moreread less

Abstract: Much has been said about the influence of Western culture on social movements worldwide, and this claimed influence has caused some to accuse Arabic feminism of being merely an alien import to the Arab world. New waves of feminism have arisen as a reaction to the claimed prevalent western culture. Global Feminism argues that women worldwide experience similar subjugation in many social constructs because many cultures are based on a patriarchal past, but other waves reject the concept of a universal womens experience and stresses the significance of diversity in women s experiences and see their activities as transnational rather than global. Others expect that the confrontation of secular and Islamist paradigms will dominate. Social Media has global reach, and there are signs that Facebook pages are used by feminists worldwide to boost their social and political activism. Facebook gives public pages' owners the ability to associate their pages with pages with similar ideologies. This provides a global space where feminist pages are clustered and exposes clues about their patterns of influence. By crawling Arabic feminist pages over Facebook, this paper builds a dataset that can be analysed using social network analysis tools and reveals the map of influence between Arabic feminist network and the western, transnational, and Global feminist networks. The map shows that Arabic womens pages are clustered in two segments: Arab feminism, and Sect feminism. The later consists of pages which distance themselves from associating with secular feminism pages whether they are Arabic or not, and in contrary to the former, they are less likely to restrict themselves with national Identity.

...read moreread less

Proceedings Article•10.1145/3201064.3201084•

Decay of Relevance in Exponentially Growing Networks

[...]

Jun Sun¹, Steffen Staab¹, Fariba Karimi²•Institutions (2)

University of Koblenz and Landau¹, Leibniz Institute for Neurobiology²

15 May 2018

TL;DR: In this article, a new preferential attachment-based network growth model was proposed to explain two properties of growing networks: (1) the power-law growth of node degrees and (2) the decay of node relevance.

...read moreread less

Abstract: We propose a new preferential attachment-based network growth model in order to explain two properties of growing networks: (1) the power-law growth of node degrees and (2) the decay of node relevance. In preferential attachment models, the ability of a node to acquire links is affected by its degree, its fitness, as well as its relevance which typically decays over time. After a review of existing models, we argue that they cannot explain the above-mentioned two properties (1) and (2) at the same time. We have found that apart from being empirically observed in many systems, the exponential growth of the network size over time is the key to sustain the power-law growth of node degrees when node relevance decays. We therefore make a clear distinction between the event time and the physical time in our model, and show that under the assumption that the relevance of a node decays with its age τ, there exists an analytical solution of the decay function f_R with the form f_R(τ) = ?^(τ1). Other properties of real networks such as power-law alike degree distributions can still be preserved, as supported by our experiments. This makes our model useful in explaining and analysing many real systems such as citation networks.

...read moreread less

Journal Article•10.1002/JEMT.22970•

ATR-FTIR spectroscopy and μ-EDXRF spectrometry monitoring of enamel erosion caused by medicaments used in the treatment of respiratory diseases

[...]

Raimundo Nonato Silva Gomes¹, Tanmoy Bhattacharjee¹, Luis Felipe das Chagas e Silva de Carvalho¹, Luís Eduardo Silva Soares¹•Institutions (1)

University of Paraíba Valley¹

1 Feb 2018

TL;DR: ATR‐FTIR may be used to rapidly and nondestructively investigate erosive effects of medicaments and shows a detectable shift in the phosphate (PO4) antisymmetric stretching mode (ν3) at ∼985 cm−1 for AM, BR, and SS, indicating erosion.

...read moreread less

Abstract: Medicaments essential for alleviation of diseases may sometime adversely affect dental health by eroding the enamel, owing to their acidic nature. It is therefore highly desirable to be able to detect these effects quickly and reliably. In this study, we evaluated the erosive capacity of four most commonly prescribed respiratory disease syrup medicaments on enamel using micro-energy-dispersive X-ray fluorescence spectrometry (µ-EDXRF) and attenuated total reflection Fourier transform infrared spectroscopy (ATR-FTIR). Fifty-five enamel fragments obtained from 30 bovine teeth were treated with artificial saliva (S), acebrofilin hydrochloride (AC), ambroxol hydrochloride (AM), bromhexine hydrochloride (BR), and salbutamol sulfate (SS); by immersing in 3 mL of respective solutions for 1 min, three times a day at intervals of 1 hr, for 5 days. µ-EDXRF analysis of enamel surface did not reveal significant erosion caused by the medications. However, ATR-FTIR showed a detectable shift in the phosphate (PO4) antisymmetric stretching mode (ν3) at ∼985 cm−1 for AM, BR, and SS, indicating erosion. Multivariate statistical analysis showed that AC, AM, SS, and BR could be classified with 70%, 80%, 100%, and 100% efficiency from S (control), further highlighting the ability of ATR-FTIR to identify degree of erosion. This suggests ATR-FTIR may be used to rapidly and nondestructively investigate erosive effects of medicaments.

...read moreread less

Proceedings Article•10.1145/3201064.3201083•

DistrustRank: Spotting False News Domains

[...]

Vinicius Woloszyn¹, Wolfgang Nejdl•Institutions (1)

Universidade Federal do Rio Grande do Sul¹

15 May 2018

TL;DR: A semi-supervised learning strategy to automatically separate fake News from reliable News sources: DistrustRank, which outperforms the supervised approaches in either ranking and classification task.

...read moreread less

Abstract: In this paper we propose a semi-supervised learning strategy to automatically separate fake News from reliable News sources: DistrustRank. We first select a small set of unreliable News, manually evaluated and classified by experts on fact checking portals. Once this set is created, DistrustRank constructs a weighted graph where nodes represent websites, connected by edges based on a minimum similarity between a pair of websites. Next it computes the centrality using a biased PageRank, where a bias is applied to the selected set of seeds. As an output of the proposed model we obtain a trust (or distrust) rank that can be used in two ways: a) as a counter-bias to be applied when News about a specific subject is ranked, in order to discount possible boosts achieved by false claims; and b) to assist humans to identify sources that are likely to be source of fake News (or that are likely to be reputable), suggesting websites that should be examined more closely or to be avoided. In our experiments, DistrustRank outperforms the supervised approaches in either ranking and classification task.

...read moreread less

Journal Article•10.1590/S0034-89102010005000006•

Brazilian abortion law: the opinion of judges and prosecutors

[...]

Graciana Alves Duarte, Mjd Osis, Anibal Faúndes, MH de Sousa

29 Mar 2018

TL;DR: A cross-sectional study was performed with 1,493 judges and 2,614 prosecutors in Brazil between 2005 and 2006 as mentioned in this paper, where participants completed a structured questionnaire approaching sociodemographic characteristics, opinions about abortion law, and circumstances in which abortion is considered lawful.

...read moreread less

Abstract: OBJECTIVE: To analyze the opinion of judges and prosecutors concerning Brazilian abortion law and situations in which the abortion should be allowed. METHODS: A cross-sectional study was performed with 1,493 judges and 2,614 prosecutors in Brazil between 2005 and 2006. Participants completed a structured questionnaire approaching sociodemographic characteristics, opinions about abortion law, and circumstances in which abortion is considered lawful. Bivariate and multivariate analyses of data were carried out through Poisson regression. RESULTS: The majority of participants (78%) found that the circumstances in which abortion is considered lawful should be broadened, or even that abortion should not be criminalized. The highest rates of pro-abortion opinions resulted from: risk to the life of the mother (84%), anencephaly (83%), severe congenital malformation of fetus (82%), and pregnancy resulting from rape (82%). Variables related to religion were strongly associated to the opinion of participants. CONCLUSIONS: There is a trend in considering the need of changing the current abortion law, in the sense of widening the circumstances in which abortion is considered lawful, or even toward decriminalizing abortion, regardless of the circumstances in which it takes place.

...read moreread less

Proceedings Article•10.1145/3201064.3201077•

Domain-Independent Detection of Emergency Situations Based on Social Activity Related to Geolocations

[...]

Hernan Sarmiento¹, Barbara Poblete¹, Jaime Campos¹•Institutions (1)

University of Chile¹

15 May 2018

TL;DR: Using an off-the-shelf classifier that is independent of domain-specific features, this work study and describe emergency situations based solely on location-based features in messages, indicating that anomalies in location-related social media user activity indeed provide information for automatically detecting emergency situations independent of their domain.

...read moreread less

Abstract: In general, existing methods for automatically detecting emergency situations using Twitter rely on features based on domain-specific keywords found in messages. This type of keyword-based methods usually require training on domain-specific labeled data, using multiple languages, and for different types of events (e.g., earthquakes, floods, wildfires, etc.). In addition to being costly, these approaches may fail to detect previously unexpected situations, such as uncommon catastrophes or terrorist attacks. However, collective mentions of certain keywords are not the only type of self-organizing phenomena that may arise in social media when a real-world extreme situation occurs. Just as nearby physical sensors become activated when stimulated, localized citizen sensors (i.e., users) will also react in a similar manner. To leverage this information, we propose to use self-organized activity related to geolocations to identify emergency situations. We propose to detect such events by tracking the frequencies, and probability distributions of the interarrival time of the messages related to specific locations. Using an off-the-shelf classifier that is independent of domain-specific features, we study and describe emergency situations based solely on location-based features in messages. Our findings indicate that anomalies in location-related social media user activity indeed provide information for automatically detecting emergency situations independent of their domain.

...read moreread less

Proceedings Article•10.1145/3201064.3201070•

Not Every Remix is an Innovation: A Network Perspective on the 3D-Printing Community

[...]

Christian Voigt¹•Institutions (1)

Centre for Social Innovation¹

15 May 2018

TL;DR: It is shown that the remixing of existing 3d models is substantially influenced by bots, customizers and self-referential designs, and it is concluded that remixing patterns cannot be taken as direct indicators of innovative behavior on sharing platforms.

...read moreread less

Abstract: A better understanding of how information in networks is reused or mixed, has the potential to significantly contribute to the way value is exchanged under a market- or commons-based paradigm. Data as collaborative commons, distributed under creative commons licenses, can generate novel business models and significantly spur the continuing development of the knowledge society. However, looking at data reuse in a large 3d-printing community, we show that the remixing of existing 3d models is substantially influenced by bots, customizers and self-referential designs. Linking these phenomena to a more fine-grained understanding of the process and product dimensions of innovations, we conclude that remixing patterns cannot be taken as direct indicators of innovative behavior on sharing platforms. A further exploration of remixing networks in terms of their topological characteristics is suggested as a way forward. For the empirical underpinning of our arguments, we analyzed 893,383 three-dimensional designs shared by 193,254 members.

...read moreread less

Proceedings Article•10.1145/3201064.3201092•

Query for Architecture, Click through Military: Comparing the Roles of Search and Navigation on Wikipedia

[...]

Dimitar Dimitrov¹, Florian Lemmerich², Fabian Flöck³, Markus Strohmaier²•Institutions (3)

University of Koblenz and Landau¹, RWTH Aachen University², Leibniz Association³

15 May 2018

TL;DR: This paper studies large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks.

...read moreread less

Abstract: As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and employ two main metrics, namely (i) searchshare -- the relative amount of views an article received by search --, and (ii) resistance -- the ability of an article to relay traffic to other Wikipedia articles -- to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a "dead end'' for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach. The results of this paper constitute a step towards understanding human information seeking behavior on the Web.

...read moreread less

Journal Article•10.1007/S13278-017-0483-9•

Reconstructing news spread networks and studying its dynamics

[...]

Elisa Mussumeci¹, Flávio Codeço Coelho¹•Institutions (1)

Fundação Getúlio Vargas¹

17 Jan 2018

TL;DR: From the reconstructed network, it is shown that the dynamics of the news spread can be approximated by a classical SIR epidemiological dynamics upon the network.

...read moreread less

Abstract: News spread in internet media outlets can be seen as a contagious process forming temporal networks representing the influence between published articles. In this article we propose a methodology based on the application of natural language analysis of the articles to reconstruct the news spread network. From the reconstructed network, we show that the dynamics of the news spread can be approximated by a classical SIR epidemiological dynamics upon the network.

...read moreread less

Proceedings Article•10.1145/3201064.3201104•

Public Opinion Spamming: A Model for Content and Users on Sina Weibo

[...]

Ziyu Guo¹, Liqiang Wang¹, Yafang Wang¹, Guohua Zeng², Shijun Liu¹, Gerard de Melo³ - Show less +2 more•Institutions (3)

Shandong University¹, Chinese Academy of Social Sciences², Rutgers University³

15 May 2018

TL;DR: This work proposes a model that is unsupervised and adopts a Bayesian framework to distinguish spammers from other classes of users and demonstrates the effectiveness of the proposed approach in detecting public opinion spammers.

...read moreread less

Abstract: Microblogs serve hundreds of millions of active users, but have also attracted large numbers of spammers. While traditional spam often seeks to endorse specific products or services, nowadays there are increasingly also paid posters intent on promoting particular views on hot topics and influencing public opinion. In this work, we fill an important research gap by studying how to detect such opinion spammers and their micro-manipulation of public opinion. Our model is unsupervised and adopts a Bayesian framework to distinguish spammers from other classes of users. Experiments on a Sina Weibo hot topic dataset demonstrate the effectiveness of the proposed approach. A further diachronic analysis of the collected data demonstrates that public opinion spammers have developed sophisticated techniques and have seen success in subtly manipulating the public sentiment.

...read moreread less

Proceedings Article•10.1145/3201064.3201065•

Under the Shadow of Sunshine: Characterizing Spam Campaigns Abusing Phone Numbers Across Online Social Networks

[...]

Srishti Gupta¹, Dhruv Kuchhal², Payas Gupta, Mustaque Ahamad³, Manish Gupta⁴, Ponnurangam Kumaraguru¹ - Show less +2 more•Institutions (4)

Indraprastha Institute of Information Technology¹, Maharaja Agrasen Institute of Technology², Georgia Institute of Technology³, Microsoft⁴

15 May 2018

TL;DR: In this paper, the authors present the first data-driven characterization of cross-platform campaigns that use multiple OSN platforms to reach their victims and use phone numbers for monetization, collecting -23M posts containing -1.8M unique phone numbers from Twitter, Facebook, GooglePlus, Youtube, and Flickr over a period of six months.

...read moreread less

Abstract: Cybercriminals abuse Online Social Networks (OSNs) to lure victims into a variety of spam. Among different spam types, a less explored area is OSN abuse that leverages the telephony channel to defraud users. Phone numbers are advertized via OSNs, and users are tricked into calling these numbers. To expand the reach of such scam / spam campaigns, phone numbers are advertised across multiple platforms like Facebook, Twitter, GooglePlus, Flickr, and YouTube. In this paper, we present the first data-driven characterization of cross-platform campaigns that use multiple OSN platforms to reach their victims and use phone numbers for monetization. We collect -23M posts containing -1.8M unique phone numbers from Twitter, Facebook, GooglePlus, Youtube, and Flickr over a period of six months. Clustering these posts helps us identify 202 campaigns operating across the globe with Indonesia, United States, India, and United Arab Emirates being the most prominent originators. We find that even though Indonesian campaigns generate highest volume (-3.2M posts), only 1.6% of the accounts propagating Indonesian campaigns have been suspended so far. By examining campaigns running across multiple OSNs, we discover that Twitter detects and suspends -93% more accounts than Facebook. Therefore, sharing intelligence about abuse-related user accounts across OSNs can aid in spam detection. According to our dataset, around -35K victims and -$8.8M could have been saved if intelligence was shared across the OSNs. By analyzing phone number based spam campaigns running on OSNs, we highlight the unexplored variety of phone-based attacks surfacing on OSNs.

...read moreread less