TL;DR: This work provides a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi- label classification setting and concludes with concrete suggestions for future research in multi-label feature selection.
Abstract: In many important application domains such as text categorization, biomolecular analysis, scene classification and medical diagnosis, examples are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research on feature selection methods that allow the identification of relevant and informative features for multi-label classification. However, the methods proposed for this task are scattered in the literature, with no common framework to describe them and to allow an objective comparison. Here, we revisit a categorization of existing multi-label classification methods and, as our main contribution, we provide a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi-label classification setting. We conclude this work with concrete suggestions for future research in multi-label feature selection which have been derived from our categorization and analysis.
TL;DR: This work presents a dataset of fake news and satire stories that are hand coded, verifiable, and, in the case offake news, include rebutting stories, and includes a thematic content analysis of the articles, identifying major themes that include hyperbolic support or con- demnation of a gure, conspiracy theories, racist themes, and dis- crediting of reliable sources.
Abstract: Fake news has become a major societal issue and a technical chal- lenge for social media companies to identify. This content is dif- cult to identify because the term "fake news" covers intention- ally false, deceptive stories as well as factual errors, satire, and sometimes, stories that a person just does not like. Addressing the problem requires clear de nitions and examples. In this work, we present a dataset of fake news and satire stories that are hand coded, veri ed, and, in the case of fake news, include rebutting stories. We also include a thematic content analysis of the articles, identifying major themes that include hyperbolic support or con- demnation of a gure, conspiracy theories, racist themes, and dis- crediting of reliable sources. In addition to releasing this dataset for research use, we analyze it and show results based on language that are promising for classi cation purposes. Overall, our contri- bution of a dataset and initial analysis are designed to support fu- ture work by fake news researchers.
TL;DR: Analysis of issues related to hate, violence and discriminatory bias in a dataset containing more than 7,000 videos and 17 million comments shows that right-wing channels tend to contain a higher degree of words from "negative'' semantic fields.
Abstract: As of 2018, YouTube, the major online video sharing website, hosts multiple channels promoting right-wing content. In this paper, we observe issues related to hate, violence and discriminatory bias in a dataset containing more than 7,000 videos and 17 million comments. We investigate similarities and differences between users' comments and video content in a selection of right-wing channels and compare it to a baseline set using a three-layered approach, in which we analyze (a) lexicon, (b) topics and (c) implicit biases present in the texts. Among other results, our analyses show that right-wing channels tend to (a) contain a higher degree of words from "negative'' semantic fields, (b) raise more topics related to war and terrorism, and (c) demonstrate more discriminatory bias against Muslims (in videos) and towards LGBT people (in comments). Our findings shed light not only into the collective conduct of the YouTube community promoting and consuming right-wing content, but also into the general behavior of YouTube users.
TL;DR: This paper proposes a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of 'roots of radicalisation' from social science models, and results show the effectiveness of the proposed algorithms.
Abstract: In an increasingly digital world, identifying signs of online extremism sits at the top of the priority list for counter-extremist agencies. Researchers and governments are investing in the creation of advanced information technologies to identify and counter extremism through intelligent large-scale analysis of online data. However, to the best of our knowledge, these technologies are neither based on, nor do they take advantage of, the existing theories and studies of radicalisation. In this paper we propose a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of 'roots of radicalisation' from social science models. This approach has been applied to analyse and compare the radicalisation level of 112 pro-ISIS vs.112 "general" Twitter users. Our results show the effectiveness of our proposed algorithms in detecting and predicting radicalisation influence, obtaining up to 0.9 F-1 measure for detection and between 0.7 and 0.8 precision for prediction. While this is an initial attempt towards the effective combination of social and computational perspectives, more work is needed to bridge these disciplines, and to build on their strengths to target the problem of online radicalisation.
TL;DR: This work builds an automated mechanism to detect fake likes on Instagram which achieves a high precision of 83.5% and serves an important first step in reducing the effect of fake like on Instagram influencer market.
Abstract: Instagram is a significant platform for users to share media; reflecting their interests. It is used by marketers and brands to reach their potential audience for advertisement. The number of likes on posts serves as a proxy for social reputation of the users, and in some cases, social media influencers with an extensive reach are compensated by marketers to promote products. This emerging market has led to users artificially bolstering the likes they get to project an inflated social worth. In this study, we enumerate the potential factors which contribute towards a genuine like on Instagram. Based on our analysis of liking behaviour, we build an automated mechanism to detect fake likes on Instagram which achieves a high precision of 83.5%. Our work serves an important first step in reducing the effect of fake likes on Instagram influencer market.
TL;DR: It is confirmed that trackers are widespread, and that a small number of trackers dominates the web (Google, Facebook and Twitter), and that Google still operates services on Chinese websites, despite its proclaimed retreat from the Chinese market.
Abstract: We perform a large-scale analysis of third-party trackers on the World Wide Web. We extract third-party embeddings from more than 3.5~billion web pages of the CommonCrawl 2012 corpus, and aggregate those to a dataset containing more than 140 million third-party embeddings in over 41 million domains. To the best of our knowledge, this constitutes the largest empirical web tracking dataset collected so far, and exceeds related studies by more than an order of magnitude in the number of domains and web pages analyzed. Due to the enormous size of the dataset, we are able to perform a large-scale study of online tracking, on three levels: (1) On a global level, we give a precise figure for the extent of tracking, give insights into the structural properties of the `online tracking sphere' and analyse which trackers (and subsequently, which companies) are used by how many websites. (2) On a country-specific level, we analyse which trackers are used by websites in different countries, and identify the countries in which websites choose significantly different trackers than in the rest of the world. (3) We answer the question whether the content of websites influences the choice of trackers they use, leveraging more than ninety thousand categorized domains. In particular, we analyse whether highly privacy-critical websites about health and addiction make different choices of trackers than other websites. Based on the performed analyses, we confirm that trackers are widespread (as expected), and that a small number of trackers dominates the web (Google, Facebook and Twitter). In particular, the three tracking domains with the highest PageRank are all owned by Google. The only exception to this pattern are a few countries such as China and Russia. Our results suggest that this dominance is strongly associated with country-specific political factors such as freedom of the press. Furthermore, our data confirms that Google still operates services on Chinese websites, despite its proclaimed retreat from the Chinese market. We also confirm that websites with highly privacy-critical content are less likely to contain trackers (60\% vs 90\% for other websites), even though the majority of them still do contain trackers.
TL;DR: Aolivei et al. as discussed by the authors proposed a method to solve the problem of energy-efficient nuclear power plant design in Brazil by using IPEN/CNEN-SP.
Abstract: 1 Bioscience Institute, São Paulo State University, 11380-972 São Vicente, SP, Brazil. 2 Instituto de Pesquisas Energéticas e Nucleares, IPEN/CNEN-SP, Av. Prof. Lineu Prestes, 2242 Cidade Universitária, CEP 05508-900 São Paulo, SP, Brazil. 3 Department of Chemistry, Federal University of Amazonas, Av. General Rodrigo Octávio, 6200, Coroado I CEP: 69080-900, Manaus, AM. Brazil. * E-mail: aolivei@ipen.br
TL;DR: This article crawled data from Twitter using a content-tailored lexicon and annotated 25,000 tweets for the different types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political).
Abstract: A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2.
TL;DR: In this article, the authors investigate the effects of the operation on all newly registered vendors on Dream Market (n=220) during and shortly after Operation Bayonet by mapping their individual and historic characteristics to discern migration patterns and changes in vendor behavior.
Abstract: In the summer of 2017, an international policing effort - named Operation Bayonet - led by the Federal Bureau of Investigation (FBI) and the Dutch National High Tech Crime Unit (NHTCU) targeted two prominent online anonymous markets. On the one hand, the FBI succeeded in the take-down of AlphaBay, on the other hand the NHTCU took over, operated and shut down Hansa Market. By coordinating these efforts and planning these actions sequentially, both agencies expected users active on AlphaBay to make their way to Hansa Market - which at that moment was in complete control and operated by the NHTCU. To assess the effects of Operation Bayonet, we leverage measurements of the user-base of current market leader, and then safe haven: Dream Market. We investigate the effects of the operation on all newly registered vendors on Dream Market (n=220) during and shortly after Operation Bayonet by mapping their individual and historic characteristics to discern migration patterns and changes in vendor behavior. Compared to ‘simple’ take-downs, like the AlphaBay take-down, the effects of the Hansa Market shut down on vendors seem remarkably different. Vendors do not just simply move on after the Hansa Market shutdown. Few simply migrate, some take precautions like changing their username and/or PGP-key, but many start over with a clean slate - erasing their past reputation completely - and are truly ‘Lost in the Dream’
TL;DR: In this paper, the authors investigate the feasibility of performing focused crawls on the archived web by utilizing the Memento infrastructure, and compare the relevance of their resources to collections built from crawling the live web as well as from a manually curated collection.
Abstract: Event collections are frequently built by crawling the live web on the basis of seed URIs nominated by human experts. Focused web crawling is a technique where the crawler is guided by reference content pertaining to the event. Given the dynamic nature of the web and the pace with which topics evolve, the timing of the crawl is a concern for both approaches. We investigate the feasibility of performing focused crawls on the archived web. By utilizing the Memento infrastructure, we obtain resources from 22 web archives that contribute to building event collections. We create collections on four events and compare the relevance of their resources to collections built from crawling the live web as well as from a manually curated collection. Our results show that focused crawling on the archived web can be done and indeed results in highly relevant collections, especially for events that happened further in the past
TL;DR: Physical therapy is mostly based on individual experience acquired in the clinical practice, and not on the scientific literature, according to a prospective multicenter study using a questionnaire.
Abstract: Objective To analyze and describe the maneuvers most commonly used in clinical practice by physical therapists and the reasons for choosing them. Methods A prospective multicenter study using a questionnaire. The sample consisted of physical therapists from five hospitals (three private hospitals, a teaching hospital and a public hospital). Results A total of 185 questionnaires were filled in. Most professionals had graduated 6 to 10 years before and over had over 10 years of intensive care unit experience. The most often used maneuvers were vibrocompression, hyperinflation, postural drainage, tracheal suction and motor mobilization. The most frequent reason for choosing these maneuvers was "I notice they are more efficient in clinical practice." Conclusion Physical therapy is mostly based on individual experience acquired in the clinical practice, and not on the scientific literature.
TL;DR: The inductive loop sensor has experimentally been demonstrated to be capable of properly measuring different types of partial discharges, but because of its current design, there are several practical limitations on its use in real devices or environments.
Abstract: Ideally, an insulation system must be capable of electrically insulating the active components of a machine or device subjected to high voltages. However, due to the presence of polluting agents or imperfections inside or on the surface of the insulation, small current pulses called partial discharges (PDs) are common, which partially short-circuit the insulation and cause it to lose its insulating properties, and thus its insulation capacity, over time. In some cases, measurements of this phenomenon are limited by the type of sensor used; if it is not adequate, it can distort the obtained results, which can lead to a misdiagnosis of the state of the device. The inductive loop sensor has experimentally been demonstrated to be capable of properly measuring different types of PDs. However, because of its current design, there are several practical limitations on its use in real devices or environments. An example is the presence of a primary conductor located at a fixed distance from the sensor, through which PD pulses must flow for the sensor to capture them. In this article, the sensor's behavior is studied at different separation distances from the line through which the PD pulses flow. In addition, the measuring capacity of the sensor is tested by removing the presence of the primary conductor and placing the sensor directly over the line through which the PD pulses of a real device flow.
TL;DR: In this article, a graph partitioning method is proposed to discover different communities discussing about a controversial topic in a social network like Twitter, which allows detecting descriptive terms that characterize the different viewpoints as well as understanding how a specific term is related to a viewpoint.
Abstract: The Web has evolved to a dominant platform where everyone has the opportunity to express their opinions, to interact with other users, and to debate on emerging events happening around the world. On the one hand, this has enabled the presence of different viewpoints and opinions about a - usually controversial - topic (like Brexit), but at the same time, it has led to phenomena like media bias, echo chambers and filter bubbles, where users are exposed to only one point of view on the same topic. Therefore, there is the need for methods that are able to detect and explain the different viewpoints. In this paper, we propose a graph partitioning method that exploits social interactions to enable the discovery of different communities (representing different viewpoints) discussing about a controversial topic in a social network like Twitter. To explain the discovered viewpoints, we describe a method, called Iterative Rank Difference (IRD), which allows detecting descriptive terms that characterize the different viewpoints as well as understanding how a specific term is related to a viewpoint (by detecting other related descriptive terms). The results of an experimental evaluation showed that our approach outperforms state-of-the-art methods on viewpoint discovery, while a qualitative analysis of the proposed IRD method on three different controversial topics showed that IRD provides comprehensive and deep representations of the different viewpoints.
TL;DR: In this paper, an integrative literature review was conducted, considering: the economic and geographical context in which the research was conducted; the focus of each piece of research; the adopted research methods; organisational theories of analytical support; the sectors analysed; and the effects of economic crises on CSR initiatives and environmental management.
Abstract: “Economic crises” and “corporate social responsibility (CSR) initiatives” are two issues that dominate the modern business agenda. Although related, these issues have been analysed separately, and so a significant gap is perpetuated between the two. What are the effects of economic crises on CSR initiatives? Can organisational social initiatives withstand economic crises? The purpose of this paper is to answer these questions.,An integrative literature review was conducted, considering: the economic and geographical context in which the research was conducted; the focus of each piece of research; the adopted research methods; organisational theories of analytical support; the sectors analysed; and the effects of economic crises on CSR initiatives and environmental management.,Some of the findings were as follows: most of the studies analysed reported that CSR helps companies to cope with economic crises by increasing the efficiency of investments and establishing better relations with stakeholders and markets; environmental practices are related to negative environmental performance in periods of economic crises; and CSR relates positively to financial performance in periods of economic crises.,This is one of the first integrative literature reviews to investigate what happens to the relationship between businesses and sustainable change management in periods of crises. This paper also offers a future research agenda for the issue, with 12 questions still unanswered by the latest research.
TL;DR: Differences in recipients' preferences for subject lines of marketing emails from different industries, in terms of their clickthrough rates on marketing emails sent by different businesses in Finance, Cosmetics and Television industries, are explored.
Abstract: Marketing practices have adopted the use of computational approaches in order to optimize the performance of their promotional emails and site advertisements. In the case of promotional emails, subject lines have been found to offer a reliable signal of whether the recipient will open an email or not. Clickbait headlines are also known to drive reader engagement. In this study, we explore the differences in recipients' preferences for subject lines of marketing emails from different industries, in terms of their clickthrough rates on marketing emails sent by different businesses in Finance, Cosmetics and Television industries. Different stylistic strategies of subject lines characterize high clickthroughs in different commercial verticals. For instance, words providing insight and signaling cognitive processing lead to more clickthroughs for the Finance industry; on the other hand, social words yield more clickthroughs for the Movies and Television industry. Domain adaptation can further improve predictive performance for unseen businesses by an average of 16.52% over generic industry-specific predictive models. We conclude with a discussion on the implications of our findings and suggestions for future work.
TL;DR: An online study with participants recruited through a crowdsourcing service confirmed that the proposed web access literacy scale is reliable and valid and is expected to contribute to the design of information access systems or educational classes to encourage users to reflect on and improve theirweb access literacy relative to critical information seeking.
Abstract: We propose a web access literacy scale to assess user ability to scrutinize web information and gather accurate information using information access systems, such as web search engines. We conducted an online study with participants recruited through a crowdsourcing service. Analysis of the questionnaire responses confirmed that the proposed web access literacy scale is reliable and valid. We also noted the following pointers: (1) Web users may not pay significant attention to web page authors and their expertise when judging information credibility. (2) Users may have weaknesses relative to the use of web search engines and tolerance for cognitive bias that appears in credibility assessment of web information. The results of this study are expected to contribute to the design of information access systems or educational classes to encourage users to reflect on and improve their web access literacy relative to critical information seeking.
TL;DR: The results show that 12 out of the 35 videos in the data set focused on conspiracy theories, but no statistical differences were found in the number of user activity and sentiment between the two types of videos.
Abstract: In this paper, we analyze the content of the most popular videos posted on YouTube in the first phase of the Zika-virus outbreak in 2016, and the user responses to those videos. More specifically, we examine the extent to which informational and conspiracy theory videos differ in terms of user activity (number of comments, shares, likes and dislikes), and the sentiment and content of the user responses. Our results show that 12 out of the 35 videos in our data set focused on conspiracy theories, but no statistical differences were found in the number of user activity and sentiment between the two types of videos. The content of the user responses shows that users respond differently to sub-topics related to Zika-virus. The implications of the results for future online health promotion campaigns are discussed.
TL;DR: This work highlights the importance of the competitive and collaborative social dynamics within the enterprise and underline the contradictory nature of those dynamics, which combined might lead to detrimental effects towards the engagement to crowdsourcing activities.
Abstract: Enterprise crowdsourcing capitalises on the availability of employees for in-house data processing. Gamification techniques can help aligning employees' motivation to the crowdsourcing endeavour. Although hitherto, research efforts were able to unravel the wide arsenal of gamification techniques to construct engagement loops, little research has shed light into the social game dynamics that those foster and how those impact crowdsourcing activities. This work reports on a study that involved 101 employees from two multinational enterprises. We adopt a user-centric approach to apply and experiment with gamification for enterprise crowdsourcing purposes. Through a qualitative study, we highlight the importance of the competitive and collaborative social dynamics within the enterprise. By engaging the employees with a mobile crowdsourcing application, we showcase the effectiveness of competitiveness towards higher levels of engagement and quality of contributions. Moreover, we underline the contradictory nature of those dynamics, which combined might lead to detrimental effects towards the engagement to crowdsourcing activities.
TL;DR: This paper performs a detailed, hypothesis-driven analysis of several online crawling algorithms, ranging from classical crawling methods to modern, state-of-the-art algorithms, with respect to the task of collecting as much data as possible given a fixed query budget, and shows that the performance of these methods depends strongly on the network structure.
Abstract: Over the past two decades, online social networks have attracted a great deal of attention from researchers. However, before one can gain insight into the properties or structure of a network, one must first collect appropriate data. Data collection poses several challenges, such as API or bandwidth limits, which require the data collector to carefully consider which queries to make. Many online network crawling methods have been proposed, but it is not always clear which method should be used for a given network. In this paper, we perform a detailed, hypothesis-driven analysis of several online crawling algorithms, ranging from classical crawling methods to modern, state-of-the-art algorithms, with respect to the task of collecting as much data (nodes or edges) as possible given a fixed query budget. We show that the performance of these methods depends strongly on the network structure. We identify three relevant network characteristics: community separation, average community size, and average node degree. We present experiments on both real and synthetic networks, and provide guidelines to researchers regarding selection of an appropriate sampling method.
TL;DR: By crawling Arabic feminist pages over Facebook, this paper builds a dataset that can be analysed using social network analysis tools and reveals the map of influence between Arabic feminist network and the western, transnational, and Global feminist networks.
Abstract: Much has been said about the influence of Western culture on social movements worldwide, and this claimed influence has caused some to accuse Arabic feminism of being merely an alien import to the Arab world. New waves of feminism have arisen as a reaction to the claimed prevalent western culture. Global Feminism argues that women worldwide experience similar subjugation in many social constructs because many cultures are based on a patriarchal past, but other waves reject the concept of a universal womens experience and stresses the significance of diversity in women s experiences and see their activities as transnational rather than global. Others expect that the confrontation of secular and Islamist paradigms will dominate. Social Media has global reach, and there are signs that Facebook pages are used by feminists worldwide to boost their social and political activism. Facebook gives public pages' owners the ability to associate their pages with pages with similar ideologies. This provides a global space where feminist pages are clustered and exposes clues about their patterns of influence. By crawling Arabic feminist pages over Facebook, this paper builds a dataset that can be analysed using social network analysis tools and reveals the map of influence between Arabic feminist network and the western, transnational, and Global feminist networks. The map shows that Arabic womens pages are clustered in two segments: Arab feminism, and Sect feminism. The later consists of pages which distance themselves from associating with secular feminism pages whether they are Arabic or not, and in contrary to the former, they are less likely to restrict themselves with national Identity.
TL;DR: In this article, a new preferential attachment-based network growth model was proposed to explain two properties of growing networks: (1) the power-law growth of node degrees and (2) the decay of node relevance.
Abstract: We propose a new preferential attachment-based network growth model in order to explain two properties of growing networks: (1) the power-law growth of node degrees and (2) the decay of node relevance. In preferential attachment models, the ability of a node to acquire links is affected by its degree, its fitness, as well as its relevance which typically decays over time. After a review of existing models, we argue that they cannot explain the above-mentioned two properties (1) and (2) at the same time. We have found that apart from being empirically observed in many systems, the exponential growth of the network size over time is the key to sustain the power-law growth of node degrees when node relevance decays. We therefore make a clear distinction between the event time and the physical time in our model, and show that under the assumption that the relevance of a node decays with its age τ, there exists an analytical solution of the decay function f_R with the form f_R(τ) = ?^(τ1). Other properties of real networks such as power-law alike degree distributions can still be preserved, as supported by our experiments. This makes our model useful in explaining and analysing many real systems such as citation networks.
TL;DR: ATR‐FTIR may be used to rapidly and nondestructively investigate erosive effects of medicaments and shows a detectable shift in the phosphate (PO4) antisymmetric stretching mode (ν3) at ∼985 cm−1 for AM, BR, and SS, indicating erosion.
Abstract: Medicaments essential for alleviation of diseases may sometime adversely affect dental health by eroding the enamel, owing to their acidic nature. It is therefore highly desirable to be able to detect these effects quickly and reliably. In this study, we evaluated the erosive capacity of four most commonly prescribed respiratory disease syrup medicaments on enamel using micro-energy-dispersive X-ray fluorescence spectrometry (µ-EDXRF) and attenuated total reflection Fourier transform infrared spectroscopy (ATR-FTIR). Fifty-five enamel fragments obtained from 30 bovine teeth were treated with artificial saliva (S), acebrofilin hydrochloride (AC), ambroxol hydrochloride (AM), bromhexine hydrochloride (BR), and salbutamol sulfate (SS); by immersing in 3 mL of respective solutions for 1 min, three times a day at intervals of 1 hr, for 5 days. µ-EDXRF analysis of enamel surface did not reveal significant erosion caused by the medications. However, ATR-FTIR showed a detectable shift in the phosphate (PO4) antisymmetric stretching mode (ν3) at ∼985 cm−1 for AM, BR, and SS, indicating erosion. Multivariate statistical analysis showed that AC, AM, SS, and BR could be classified with 70%, 80%, 100%, and 100% efficiency from S (control), further highlighting the ability of ATR-FTIR to identify degree of erosion. This suggests ATR-FTIR may be used to rapidly and nondestructively investigate erosive effects of medicaments.
TL;DR: A semi-supervised learning strategy to automatically separate fake News from reliable News sources: DistrustRank, which outperforms the supervised approaches in either ranking and classification task.
Abstract: In this paper we propose a semi-supervised learning strategy to automatically separate fake News from reliable News sources: DistrustRank. We first select a small set of unreliable News, manually evaluated and classified by experts on fact checking portals. Once this set is created, DistrustRank constructs a weighted graph where nodes represent websites, connected by edges based on a minimum similarity between a pair of websites. Next it computes the centrality using a biased PageRank, where a bias is applied to the selected set of seeds. As an output of the proposed model we obtain a trust (or distrust) rank that can be used in two ways: a) as a counter-bias to be applied when News about a specific subject is ranked, in order to discount possible boosts achieved by false claims; and b) to assist humans to identify sources that are likely to be source of fake News (or that are likely to be reputable), suggesting websites that should be examined more closely or to be avoided. In our experiments, DistrustRank outperforms the supervised approaches in either ranking and classification task.
TL;DR: A cross-sectional study was performed with 1,493 judges and 2,614 prosecutors in Brazil between 2005 and 2006 as mentioned in this paper, where participants completed a structured questionnaire approaching sociodemographic characteristics, opinions about abortion law, and circumstances in which abortion is considered lawful.
Abstract: OBJECTIVE: To analyze the opinion of judges and prosecutors concerning Brazilian abortion law and situations in which the abortion should be allowed. METHODS: A cross-sectional study was performed with 1,493 judges and 2,614 prosecutors in Brazil between 2005 and 2006. Participants completed a structured questionnaire approaching sociodemographic characteristics, opinions about abortion law, and circumstances in which abortion is considered lawful. Bivariate and multivariate analyses of data were carried out through Poisson regression. RESULTS: The majority of participants (78%) found that the circumstances in which abortion is considered lawful should be broadened, or even that abortion should not be criminalized. The highest rates of pro-abortion opinions resulted from: risk to the life of the mother (84%), anencephaly (83%), severe congenital malformation of fetus (82%), and pregnancy resulting from rape (82%). Variables related to religion were strongly associated to the opinion of participants. CONCLUSIONS: There is a trend in considering the need of changing the current abortion law, in the sense of widening the circumstances in which abortion is considered lawful, or even toward decriminalizing abortion, regardless of the circumstances in which it takes place.
TL;DR: Using an off-the-shelf classifier that is independent of domain-specific features, this work study and describe emergency situations based solely on location-based features in messages, indicating that anomalies in location-related social media user activity indeed provide information for automatically detecting emergency situations independent of their domain.
Abstract: In general, existing methods for automatically detecting emergency situations using Twitter rely on features based on domain-specific keywords found in messages. This type of keyword-based methods usually require training on domain-specific labeled data, using multiple languages, and for different types of events (e.g., earthquakes, floods, wildfires, etc.). In addition to being costly, these approaches may fail to detect previously unexpected situations, such as uncommon catastrophes or terrorist attacks. However, collective mentions of certain keywords are not the only type of self-organizing phenomena that may arise in social media when a real-world extreme situation occurs. Just as nearby physical sensors become activated when stimulated, localized citizen sensors (i.e., users) will also react in a similar manner. To leverage this information, we propose to use self-organized activity related to geolocations to identify emergency situations. We propose to detect such events by tracking the frequencies, and probability distributions of the interarrival time of the messages related to specific locations. Using an off-the-shelf classifier that is independent of domain-specific features, we study and describe emergency situations based solely on location-based features in messages. Our findings indicate that anomalies in location-related social media user activity indeed provide information for automatically detecting emergency situations independent of their domain.
TL;DR: It is shown that the remixing of existing 3d models is substantially influenced by bots, customizers and self-referential designs, and it is concluded that remixing patterns cannot be taken as direct indicators of innovative behavior on sharing platforms.
Abstract: A better understanding of how information in networks is reused or mixed, has the potential to significantly contribute to the way value is exchanged under a market- or commons-based paradigm. Data as collaborative commons, distributed under creative commons licenses, can generate novel business models and significantly spur the continuing development of the knowledge society. However, looking at data reuse in a large 3d-printing community, we show that the remixing of existing 3d models is substantially influenced by bots, customizers and self-referential designs. Linking these phenomena to a more fine-grained understanding of the process and product dimensions of innovations, we conclude that remixing patterns cannot be taken as direct indicators of innovative behavior on sharing platforms. A further exploration of remixing networks in terms of their topological characteristics is suggested as a way forward. For the empirical underpinning of our arguments, we analyzed 893,383 three-dimensional designs shared by 193,254 members.
TL;DR: This paper studies large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks.
Abstract: As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and employ two main metrics, namely (i) searchshare -- the relative amount of views an article received by search --, and (ii) resistance -- the ability of an article to relay traffic to other Wikipedia articles -- to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a "dead end'' for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach. The results of this paper constitute a step towards understanding human information seeking behavior on the Web.
TL;DR: From the reconstructed network, it is shown that the dynamics of the news spread can be approximated by a classical SIR epidemiological dynamics upon the network.
Abstract: News spread in internet media outlets can be seen as a contagious process forming temporal networks representing the influence between published articles. In this article we propose a methodology based on the application of natural language analysis of the articles to reconstruct the news spread network.
From the reconstructed network, we show that the dynamics of the news spread can be approximated by a classical SIR epidemiological dynamics upon the network.
TL;DR: This work proposes a model that is unsupervised and adopts a Bayesian framework to distinguish spammers from other classes of users and demonstrates the effectiveness of the proposed approach in detecting public opinion spammers.
Abstract: Microblogs serve hundreds of millions of active users, but have also attracted large numbers of spammers. While traditional spam often seeks to endorse specific products or services, nowadays there are increasingly also paid posters intent on promoting particular views on hot topics and influencing public opinion. In this work, we fill an important research gap by studying how to detect such opinion spammers and their micro-manipulation of public opinion. Our model is unsupervised and adopts a Bayesian framework to distinguish spammers from other classes of users. Experiments on a Sina Weibo hot topic dataset demonstrate the effectiveness of the proposed approach. A further diachronic analysis of the collected data demonstrates that public opinion spammers have developed sophisticated techniques and have seen success in subtly manipulating the public sentiment.
TL;DR: In this paper, the authors present the first data-driven characterization of cross-platform campaigns that use multiple OSN platforms to reach their victims and use phone numbers for monetization, collecting -23M posts containing -1.8M unique phone numbers from Twitter, Facebook, GooglePlus, Youtube, and Flickr over a period of six months.
Abstract: Cybercriminals abuse Online Social Networks (OSNs) to lure victims into a variety of spam. Among different spam types, a less explored area is OSN abuse that leverages the telephony channel to defraud users. Phone numbers are advertized via OSNs, and users are tricked into calling these numbers. To expand the reach of such scam / spam campaigns, phone numbers are advertised across multiple platforms like Facebook, Twitter, GooglePlus, Flickr, and YouTube. In this paper, we present the first data-driven characterization of cross-platform campaigns that use multiple OSN platforms to reach their victims and use phone numbers for monetization. We collect -23M posts containing -1.8M unique phone numbers from Twitter, Facebook, GooglePlus, Youtube, and Flickr over a period of six months. Clustering these posts helps us identify 202 campaigns operating across the globe with Indonesia, United States, India, and United Arab Emirates being the most prominent originators. We find that even though Indonesian campaigns generate highest volume (-3.2M posts), only 1.6% of the accounts propagating Indonesian campaigns have been suspended so far. By examining campaigns running across multiple OSNs, we discover that Twitter detects and suspends -93% more accounts than Facebook. Therefore, sharing intelligence about abuse-related user accounts across OSNs can aid in spam detection. According to our dataset, around -35K victims and -$8.8M could have been saved if intelligence was shared across the OSNs. By analyzing phone number based spam campaigns running on OSNs, we highlight the unexplored variety of phone-based attacks surfacing on OSNs.