On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

doi:10.1109/TBDATA.2018.2824812

Open AccessJournal Article10.1109/TBDATA.2018.2824812

On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

Daniel Zhang, +4 more

- 01 Jun 2019

- IEEE Transactions on Big Data

- Vol. 5, Iss: 2, pp 195-208

91

Abstract: Identifying trustworthy information in the presence of noisy data contributed by numerous unvetted sources from online social media (e.g., Twitter, Facebook, and Instagram) has been a crucial task in the era of big data. This task, referred to as truth discovery, targets at identifying the reliability of the sources and the truthfulness of claims they make without knowing either a priori. In this work, we identified three important challenges that have not been well addressed in the current truth discovery literature. The first one is “misinformation spread” where a significant number of sources are contributing to false claims, making the identification of truthful claims difficult. For example, on Twitter, rumors, scams, and influence bots are common examples of sources colluding, either intentionally or unintentionally, to spread misinformation and obscure the truth. The second challenge is “data sparsity” or the “long-tail phenomenon” where a majority of sources only contribute a small number of claims, providing insufficient evidence to determine those sources’ trustworthiness. For example, in the Twitter datasets that we collected during real-world events, more than 90 percent of sources only contributed to a single claim. Third, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. In this paper, we develop a Scalable and Robust Truth Discovery (SRTD) scheme to address the above three challenges. In particular, the SRTD scheme jointly quantifies both the reliability of sources and the credibility of claims using a principled approach. We further develop a distributed framework to implement the proposed truth discovery scheme using Work Queue in an HTCondor system. The evaluation results on three real-world datasets show that the SRTD scheme significantly outperforms the state-of-the-art truth discovery methods in terms of both effectiveness and efficiency.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/RTAS.2018.00039

A Real-Time and Non-Cooperative Task Allocation Framework for Social Sensing Applications in Edge Computing Systems

Daniel Zhang, +5 more

- 11 Apr 2018

TL;DR: This work develops a Bottom-up Game-theoretic Task Allocation (BGTA) framework to solve the critical problem of allocating real-time social sensing tasks to self-aware and non-cooperative edge computing nodes and develops a decentralized Fictitious Play scheme to allow each edge node to make its own decision on which task to execute in a non- cooperative context.

...read moreread less

75

Proceedings Article•10.1109/RTSS46320.2019.00040

EdgeBatch: Towards AI-Empowered Optimal Task Batching in Intelligent Edge Systems

Daniel Zhang, +4 more

- 01 Dec 2019

TL;DR: EdgeBatch is proposed, a collaborative intelligent edge computing framework that minimizes the delay and energy consumption of executing DNN tasks at the edge by sharing idle GPU resources among privately owned IoT devices.

...read moreread less

46

•Journal Article•10.3390/IJGI9110632

Research Progress and Development Trend of Social Media Big Data (SMBD): Knowledge Mapping Analysis Based on CiteSpace

Ziyi Wang, +5 more

- 26 Oct 2020

- ISPRS international journal of geo-infor...

TL;DR: Web of Science core collection was taken as the data source, and traditional statistical methods and CiteSpace software were used to carry out the scientometrics analysis of SMBD, which showed the research status, hotspots and trends in this field.

...read moreread less

46

Proceedings Article•10.1109/SMARTCOMP.2019.00066

Towards Reliability in Online High-Churn Edge Computing: A Deviceless Pipelining Approach

Nathan Vance, +3 more

- 12 Jun 2019

TL;DR: A deviceless pipeline based approach (DPA) to establish workflows in which stages of the analysis pipeline are completed on edge devices, and any devices that leave the system can be replaced without data loss.

...read moreread less

39

Proceedings Article•10.1109/BIGDATA.2018.8621996

RiskSens: A Multi-view Learning Approach to Identifying Risky Traffic Locations in Intelligent Transportation Systems Using Social and Remote Sensing

Yang Zhang, +4 more

- 01 Dec 2018

TL;DR: The RiskSens, a multi-view learning approach to identifying the risky traffic locations in a city by jointly exploring the social and remote sensing data, is developed and evaluated using a real world dataset from New York.

...read moreread less

38

...

Expand

References

Book Chapter•10.1007/3-540-48686-0_1

The web as a graph: measurements, models, and methods

Jon Kleinberg, +4 more

- 26 Jul 1999

TL;DR: This paper describes two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery, and proposes a new family of random graph models that point to a rich new sub-field of the study of random graphs, and raises questions about the analysis of graph algorithms on the Internet.

...read moreread less

10.1109/ipsn.2014.6846739

Using Humans As Sensors: an Estimation-Theoretic Perspective

Dong Wang, +13 more

TL;DR: This paper explores using social networks as sensor networks, developing a model to determine the correctness of reported observations, considering uncertain data provenance and information sharing, and evaluates its effectiveness using Twitter-based case-studies.

...read moreread less

Journal Article•10.1007/S11241-015-9238-8

Reliable social sensing with physical constraints: analytic bounds and performance evaluation

Dong Wang, +5 more

- 01 Nov 2015

- Real-time Systems

TL;DR: This paper tackles the emerging topic of assessing correctness of input data in social sensing by adopting a cyber-physical approach, where assessment of correctness of individual observations is aided by knowledge of physical constraints on sources and observed variables to compensate for the lack of information on source reliability.

...read moreread less

Proceedings Article•10.1109/PASSAT/SOCIALCOM.2011.188

Vocal Minority Versus Silent Majority: Discovering the Opionions of the Long Tail

Eni Mustafaraj, +3 more

- 01 Oct 2011

TL;DR: This paper presents results of data analysis that compares two groups of different users: the vocal minority ( users who tweet very often) and the silent majority (users who tweeted only once), and discovers that the content generated by these two groups is significantly different.

...read moreread less

Journal Article•10.1109/TMM.2014.2384912

Multimedia Summarization for Social Events in Microblog Stream

Jingwen Bian, +3 more

- 01 Feb 2015

- IEEE Transactions on Multimedia

TL;DR: This paper proposes a multimedia social event summarization framework to automatically generate visualized summaries from the microblog stream of multiple media types and conducts extensive experiments on two real-world microblog datasets to demonstrate the superiority of the proposed framework as compared to the state-of-the-art approaches.

...read moreread less