On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications
91
TL;DR: A Scalable and Robust Truth Discovery (SRTD) scheme is developed that jointly quantifies both the reliability of sources and the credibility of claims using a principled approach and significantly outperforms the state-of-the-art truth discovery methods in terms of both effectiveness and efficiency.
read more
Abstract: Identifying trustworthy information in the presence of noisy data contributed by numerous unvetted sources from online social media (e.g., Twitter, Facebook, and Instagram) has been a crucial task in the era of big data. This task, referred to as truth discovery, targets at identifying the reliability of the sources and the truthfulness of claims they make without knowing either a priori. In this work, we identified three important challenges that have not been well addressed in the current truth discovery literature. The first one is “misinformation spread” where a significant number of sources are contributing to false claims, making the identification of truthful claims difficult. For example, on Twitter, rumors, scams, and influence bots are common examples of sources colluding, either intentionally or unintentionally, to spread misinformation and obscure the truth. The second challenge is “data sparsity” or the “long-tail phenomenon” where a majority of sources only contribute a small number of claims, providing insufficient evidence to determine those sources’ trustworthiness. For example, in the Twitter datasets that we collected during real-world events, more than 90 percent of sources only contributed to a single claim. Third, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. In this paper, we develop a Scalable and Robust Truth Discovery (SRTD) scheme to address the above three challenges. In particular, the SRTD scheme jointly quantifies both the reliability of sources and the credibility of claims using a principled approach. We further develop a distributed framework to implement the proposed truth discovery scheme using Work Queue in an HTCondor system. The evaluation results on three real-world datasets show that the SRTD scheme significantly outperforms the state-of-the-art truth discovery methods in terms of both effectiveness and efficiency.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Real-Time and Non-Cooperative Task Allocation Framework for Social Sensing Applications in Edge Computing Systems
Daniel Zhang,Yue Ma,Yang Zhang,Suwen Lin,X. Sharon Hu,Dong Wang +5 more
- 11 Apr 2018
TL;DR: This work develops a Bottom-up Game-theoretic Task Allocation (BGTA) framework to solve the critical problem of allocating real-time social sensing tasks to self-aware and non-cooperative edge computing nodes and develops a decentralized Fictitious Play scheme to allow each edge node to make its own decision on which task to execute in a non- cooperative context.
75
EdgeBatch: Towards AI-Empowered Optimal Task Batching in Intelligent Edge Systems
Daniel Zhang,Nathan Vance,Yang Zhang,Tahmid Rashid,Dong Wang +4 more
- 01 Dec 2019
TL;DR: EdgeBatch is proposed, a collaborative intelligent edge computing framework that minimizes the delay and energy consumption of executing DNN tasks at the edge by sharing idle GPU resources among privately owned IoT devices.
46
Research Progress and Development Trend of Social Media Big Data (SMBD): Knowledge Mapping Analysis Based on CiteSpace
TL;DR: Web of Science core collection was taken as the data source, and traditional statistical methods and CiteSpace software were used to carry out the scientometrics analysis of SMBD, which showed the research status, hotspots and trends in this field.
46
Towards Reliability in Online High-Churn Edge Computing: A Deviceless Pipelining Approach
Nathan Vance,Tahmid Rashid,Daniel Zhang,Dong Wang +3 more
- 12 Jun 2019
TL;DR: A deviceless pipeline based approach (DPA) to establish workflows in which stages of the analysis pipeline are completed on edge devices, and any devices that leave the system can be replaced without data loss.
39
RiskSens: A Multi-view Learning Approach to Identifying Risky Traffic Locations in Intelligent Transportation Systems Using Social and Remote Sensing
Yang Zhang,Yiwen Lu,Daniel Zhang,Lanyu Shang,Dong Wang +4 more
- 01 Dec 2018
TL;DR: The RiskSens, a multi-view learning approach to identifying the risky traffic locations in a city by jointly exploring the social and remote sensing data, is developed and evaluated using a real world dataset from New York.
38
References
The web as a graph: measurements, models, and methods
Jon Kleinberg,Ravi Kumar,Prabhakar Raghavan,Sridhar Rajagopalan,Andrew Tomkins +4 more
- 26 Jul 1999
TL;DR: This paper describes two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery, and proposes a new family of random graph models that point to a rich new sub-field of the study of random graphs, and raises questions about the analysis of graph algorithms on the Internet.
Using Humans As Sensors: an Estimation-Theoretic Perspective
Dong Wang,Md Tanvir Amin,Shen Li,Tarek Abdelzaher,Lance Kaplan,Siyu Gu,Chenji Pan,Hengchang Liu,Charu C. Aggarwal,Raghu Ganti,Xinlei Wang,Prasant Mohapatra,Boleslaw Szymanski,Hieu Le +13 more
TL;DR: This paper explores using social networks as sensor networks, developing a model to determine the correctness of reported observations, considering uncertain data provenance and information sharing, and evaluates its effectiveness using Twitter-based case-studies.
Reliable social sensing with physical constraints: analytic bounds and performance evaluation
TL;DR: This paper tackles the emerging topic of assessing correctness of input data in social sensing by adopting a cyber-physical approach, where assessment of correctness of individual observations is aided by knowledge of physical constraints on sources and observed variables to compensate for the lack of information on source reliability.
Vocal Minority Versus Silent Majority: Discovering the Opionions of the Long Tail
Eni Mustafaraj,Samantha Finn,Carolyn Whitlock,Panagiotis Takis Metaxas +3 more
- 01 Oct 2011
TL;DR: This paper presents results of data analysis that compares two groups of different users: the vocal minority ( users who tweet very often) and the silent majority (users who tweeted only once), and discovers that the content generated by these two groups is significantly different.
Multimedia Summarization for Social Events in Microblog Stream
TL;DR: This paper proposes a multimedia social event summarization framework to automatically generate visualized summaries from the microblog stream of multiple media types and conducts extensive experiments on two real-world microblog datasets to demonstrate the superiority of the proposed framework as compared to the state-of-the-art approaches.