Estimating Group Properties in Online Social Networks with a Classifier
George Berry,Antonio D. Sirianni,Nathan High,Agrippa Kellum,Ingmar Weber,Michael W. Macy +5 more
- 25 Sep 2018
- pp 67-85
TL;DR: Simulated and empirical graphs show that this procedure performs well compared to optimal baselines in a variety of circumstances, while indicating that variance increases can be large for low-recall classifiers.
read more
Abstract: We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: the network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a three step procedure which entails: (1) walking the graph starting from an arbitrary node; (2) learning a classifier on the nodes in the walk; and (3) applying a post-hoc adjustment to classification labels. The walk step provides the information necessary to make inferences over the nodes and edges, while the adjustment step corrects for classifier bias in estimating group proportions. This process provides de-biased estimates at the cost of additional variance. We evaluate AdjustedWalk on four tasks: the proportion of nodes belonging to a minority group, the proportion of the minority group among high degree nodes, the proportion of within-group edges, and Coleman’s homophily index. Simulated and empirical graphs show that this procedure performs well compared to optimal baselines in a variety of circumstances, while indicating that variance increases can be large for low-recall classifiers.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
FITNet: Identifying Fashion Influencers on Twitter
Jinda Han,Qinglin Chen,Xilun Jin,Weikai Xu,Wanxian Yang,Suhansanu Kumar,Li Zhao,Hari Sundaram,Ranjitha Kumar +8 more
- 22 Apr 2021
TL;DR: In this article, a network of the top 10k influencers of the larger Twitter fashion graph was constructed by using a content-based classifier to identify fashion-relevant Twitter accounts.
17
•Posted Content
Going beyond accuracy: Estimating homophily in social networks using predictions
TL;DR: It is shown that estimating homophily in a network can be viewed as a dyadic prediction problem, and thathomophily estimates are unbiased when dyad-level residuals sum to zero in the network.
Estimating Distributions of Large Graphs from Incomplete Sampled Data
Shiju Li,Xin Huang,Chul-Ho Lee +2 more
- 21 Jun 2021
TL;DR: In this article, the problem of estimating the latent in-degree distribution of large directed graphs from random samples was formulated as a maximum-likelihood estimation problem, and the expectation-maximization algorithm was employed to solve it.
2
Estimating homophily in social networks using dyadic predictions
TL;DR: In this paper, the authors examined three methods for estimating homophily: predicting node categories, predicting dyad categories, and a hybrid "ego-alter" approach, concluding that only the dyadic prediction approach is unbiased, whereas the node-level approach produces both high bias and high overall error.
1
References
{SNAP Datasets}: {Stanford} Large Network Dataset Collection
Jure Leskovec,Andrej Krevl +1 more
- 01 Jun 2014
TL;DR: A collection of more than 50 large network datasets from tens of thousands of node and edges to tens of millions of nodes and edges that includes social networks, web graphs, road networks, internet networks, citation networks, collaboration networks, and communication networks.
4.2K
Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling
TL;DR: This paper develops a sampling and estimation technique called respondent-driven sampling, which allows researchers to make asymptotically unbiased estimates about the characteristics of hidden populations such as injection drug users, the homeless, and artists.
Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations
TL;DR: Inertial energy storage apparatus having two contrarotating rotors the fellies of which include a number of thin rings of glass or embedded fiber composite material supported by elastic support means so that the radial separations between adjacent rings produced by centrifugal force do not cause failure of the rotors by mechanical rupture of the ring support means.
1.9K
Learning and evaluating classifiers under sample selection bias
Bianca Zadrozny
- 04 Jul 2004
TL;DR: This paper formalizes the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classifier learning methods are affected by it.
1K
Relational Analysis: The Study of Social Organizations with Survey Methods
TL;DR: In this paper, a survey method is used to study social structure and the relationships that make up that structure, without neglecting the relationships which make up the structure of the structure.
820