TL;DR: This comment summarizes the development of the RDS method, distinguishing among seven forms of the estimator, and offers a clarification of a related set of issues.
Abstract: Leo Goodman (2011) provided a useful service with his clarification of the differences among snowball sampling as originally introduced by Coleman (1958–1959) and Goodman (1961) as a means for studying the structure of social networks; snowball sampling as a convenience method for studying hard-to-reach populations (Biernacki and Waldorf 1981); and respondent-driven sampling (RDS), a sampling method with good estimability for studying hard-to-reach populations (Heckathorn 1997, 2002, 2007; Salganik and Heckathorn 2004; Volz and Heckathorn 2008).
This comment offers a clarification of a related set of issues. One is confusion between the latter form of snowball sampling, and RDS. A second is confusion resulting from multiple forms of the RDS estimator that derives from the incremental manner in which the method was developed. This comment summarizes the development of the method, distinguishing among seven forms of the estimator.
TL;DR: The concept of snowball sampling has been in informal use for a long time, but it certainly predates Coleman (1958) and Trow (1957) as mentioned in this paper, and the earliest systematic work dates to the 1940s from the Columbia Bureau of Applied Social Research, led by Paul Lazarsfeld.
Abstract: COMMENT: ON THE CONCEPT OF SNOWBALL SAMPLING Mark S. Handcock* Krista J. Gile † The need for notes by Goodman (2011) and Heckathorn (2011) re- flects a phenomenon in the sociology of science: that multidisciplinary fields tend to produce a plethora of inconsistent terminology. Often the meaning of a term evolves over time, or different terms are used for the same concept. More confusing is the use of the same term for different concepts. As the two notes point out, the term “snowball sampling” suffers from this treatment. The term “snowball sampling” has likely been in informal use for a long time, but it certainly predates Coleman (1958) and Trow (1957). The earliest systematic work dates to the 1940s from the Columbia Bureau of Applied Social Research, led by Paul Lazarsfeld. The bu- reau became interested in the empirical study of personal influence via media (Barton 2001). This led to the consideration of interpersonal en- vironments and to the identification of opinion leaders and followers. However, standard sampling of individuals was regarded as ineffective in studying the relations between opinion leaders and followers as pairs related in this way were seldom both selected in the sample (Lazarsfeld et al. 1944:49–50). To address this, Robert Merton asked individuals in an initial diverse sample to name the people who influenced them. From these, a second wave of influential people were interviewed as a *University of California, Los Angeles University of Massachusetts, Amherst
TL;DR: In this article, the difference between snowball sampling not in hard-to-reach populations and snowball sampling and respondent-driven sampling in hard to reach populations was pointed out and discussed.
Abstract: In this commentary attention is drawn to the difference between snowball sampling not in hard-to-reach populations and snowball sampling and respondent-driven sampling in hard-to-reach populations. The approach to sampling design and inference called snowball sampling (not in hard-to-reach populations) was introduced in Coleman (1958– 1959) and Goodman (1961); and respondent-driven sampling in hardto-reach populations was introduced more recently in Heckathorn (1997, 2002, 2007). Still more recently, Gile and Handcock (2010) sounded a cautionary note for the users of respondent-driven sampling in hard-to-reach populations. Coleman (1958–1959) notes that snowball sampling in survey research is amenable to the same scientific procedures as ordinary random sampling, and Goodman (1961) introduces statistical methods with snowball sampling for the estimation
TL;DR: The *Hindex as discussed by the authors is a normalized entropy index that captures the notion of segregation as departures from evenness, and it is shown that applied researchers may do better using the M index than using either H or *H in two circumstances: (i) if they are interested in the decomposability of segregation measures for any partition of organizational units into larger clusters and of demographic groups into supergroups.
Abstract: Recent research has shown that two entropy-based segregation indices possess an appealing mixture of basic and subsidiary but useful properties. It would appear that the only fundamental difference between the mutual information, or M index, and the Entropy, Information or H index, is that the second is a normalized version of the first. This paper introduces another normalized index in that family, the *Hindex that, contrary to what is often asserted in the literature, is the normalized entropy index that captures the notion of segregation as departures from evenness. More importantly, the paper shows that applied researchers may do better using the M index than using either H or *
H in two circumstances: (i) if they are interested in the decomposability of segregation measures for any partition of organizational units into larger clusters and of demographic groups into supergroups, and (ii) if they are interested in the invariance properties of segregation measures to changes in the marginal distributions by demographic groups and by organizational units.
TL;DR: In this article, the latent class factor analysis (LCFA) approach is used to detect and correct extreme response style (ERS), which is one of the well-documented response styles.
TL;DR: In this article, the problem of quantifying the degree to which parameter estimates in a structural equation model can be biased when structural relationships were not specified correctly by the researcher is addressed.
Abstract: We deal with the problem of quantifying the degree to which parameter estimates in a structural equation model can be biased when structural relationships were not specified correctly by the researcher. We propose a framework to relate moment residuals to biases of parameter estimates and the overall noncentrality of the model. For each parameter in the model, an impact of either particular moment residual or the overall model noncentrality can be evaluated, although the latter tends to give error bounds that are rather conservative. We provide illustrative analytical and empirical examples to demonstrate the steps in application of the proposed procedures. The first example is a mildly misspecified model with causal indicators mistaken to be effect indicators. The resulting biases can be approximated very accurately by accounting for the effect of a single misfitted residual moment. The second example is a grossly misspecified model in which a mediating latent variable was erroneously omitted. In this ca...
TL;DR: This paper introduces a distinction between two sequence types—namely, common ancestors and unfolding processes and presents a new way of coding sequences as an extension to conventional OM analyses and demonstrates its usefulness in simulated and empirical examples.
Abstract: Optimal matching (OM) is a method that assesses sequence similarity. It was originally developed to study protein and DNA sequences and was later transferred to the social sciences where it was applied accordingly. However, there is an ongoing debate on the adequacy of its use in the social sciences, as a superficial transfer might not respond to the significant differences between typical sequences in biological and social settings. In this paper, I elaborate on these differences and introduce a distinction between two sequence types—namely, common ancestors and unfolding processes. While the first sequence type is typically found in biological settings (e.g., DNA sequences), the latter applies to most sequences studied in the social sciences (e.g., careers). Based on this distinction, I present a new way of coding sequences as an extension to conventional OM analyses and demonstrate its usefulness in simulated and empirical examples. The paper concludes with a discussion of this new approach and its integration into previous extensions of OM.
TL;DR: The authors decompose group differences in the mean of a variable into various within-group and between-group components with respect to group categories of intermediary variables, and demonstrate that despite the large difference in the mechanisms, they yield highly congruent outcomes.
Abstract: This paper introduces a new method for decomposing group differences in the mean of a variable into various within-group and between-group components with respect to group categories of intermediary variables. This is accomplished by considering counterfactual outcomes that would be realized by social interventions that change the relationship among variables. Because such a change does not by itself determine the counterfactual outcome, the paper introduces and juxtaposes two different mechanisms—the mechanism of realizing the counterfactual state that deviates least from the existing state, and the mechanism of holding relations among variables other than those that are modified by a given intervention unchanged—and demonstrates that despite the large difference in the mechanisms, they yield highly congruent outcomes. As an illustrative example, the paper analyzes gender inequality in hourly wages in Japan and thereby demonstrates the usefulness of the new method for deriving policy implications.
TL;DR: A family of techniques is introduced that combines an existing approach to the identification of structural biases in network data (the use of conditional uniform graph quantiles) with strategies drawn from nonparametric Bayesian analysis, making them well-suited to meta-analytic applications for which complete network data is often unavailable.
Abstract: Many basic questions in the social network literature center on the distribution of aggregate structural properties within and across populations of networks. Such questions are of increasing relevance given the growing availability of network data suitable for meta-analytic studies, as well as the rise of study designs that involve the collection of data on multiple networks drawn from a larger population. Despite this, little work has been done on model-based inference for the properties of graph populations, or on methods for comparing such populations. Here, we attempt to rectify this gap by introducing a family of techniques that combines an existing approach to the identification of structural biases in network data (the use of conditional uniform graph quantiles) with strategies drawn from nonparametric Bayesian analysis. Conditional uniform graph quantiles are the quantiles of an observed structural property in the reference distribution produced by evaluating that property over all graphs with certain fixed characteristics (e.g., size or density). These quantiles have long been used to measure the extent to which a property of interest on a single network deviates from what would be expected given that network’s other characteristics. The methods introduced here employ such quantile information to allow for principled inference regarding the distribution of structural biases within (and comparison across) populations of networks, given data sampled at the network level. The data requirements of these methods are minimal, thus making them well-suited to meta-analytic applications for which complete network data (as opposed to summary statistics) are often unavailable. The structural biases inferred using these methods can be expressed in terms of posterior predictives for familiar and easily communicated quantities, such as p-values. In addition to the methods themselves, we present algorithms for posterior simulation from this model class, illustrating their use with applications to the analysis of social structure within urban communes and radio communications among emergency personnel. We also discuss how this approach may applied to quantiles arising from other reference distributions, such as those obtained using general exponential-family random graph models.
TL;DR: In this article, a method of correcting for misclassification bias that relies solely on the primary survey data is presented, which is particularly suited to analyses of surveys where external validation of survey responses is unavailable but where there is strong reason to suspect contaminated data.
Abstract: The theoretical consequences of measurement error in outcome variables that are continuous are widely known by practitioners, at least for the classical model: purely random errors will lead to a loss of efficiency but not to bias in regression coefficients. When the outcome variable is binary, however, regression coefficients, both linear and nonlinear, will contain bias, even if the measurement error (in this setting more commonly referred to as classification error) is purely random. This paper illustrates a method of correcting for misclassification bias that relies solely on the primary survey data. It is particularly suited to analyses of surveys where external validation of survey responses is unavailable but where there is strong reason to suspect contaminated data.This situation is common in observational studies of the health of populations. The technique is applied to a model of the antecedents of post-traumatic stress disorder (PTSD) using data from a large-scale cross-sectional survey of Viet...
TL;DR: In this paper, the authors examined several approaches for inferring logit models from empirical margins of predictor covariates and conditional margins containing the means of a binary response for each covariate margin.
Abstract: We examine several approaches for inferring logit models from empirical margins of predictor covariates and conditional margins containing the means of a binary response for each covariate margin. One method is to fit proxy data to the conditional response using the beta distribution, a process we call “margin analysis.” Proxy data can obtained using three approaches: (1) implementing the iterative proportional fitting (IPF) procedure on the margin totals, (2) sampling from a larger relevant data source such as the census, and (3) enumerating, or sampling from, the combinatoric space of all possible tables constrained by the margins. The first procedure is a well-studied approach for estimating contingency tables from margins, but it does not necessarily maintain the associations between the covariates unless seeded with an initial table containing those associations. In the second approach, which is appropriate for analyzing sociodemographic covariates, we can use a large census sample adjusting for sampling biases observed in the empirical margins. However, the appropriateness of using a census proxy depends substantially on how similar the sampling pools are. Our third approach entails exploring the combinatoric space of all contingency tables constrained by the margins while considering the associations among the covariates. We aggregate the logit models estimated from each table in that space into a single model. This approach is more robust than the first two as it considers multiple proxies. While the estimated logit models from each approach are generally similar to one another, for the low-dimensional tables we explore in this paper, the combinatoric approach incurs wider standard errors, which renders potentially significant coefficients insignificant. Finally, we suggest weighting the combinatoric models with evidence-relevant probabilities obtained using the multivariate Polya distribution.
TL;DR: In this article, general random graphs (i.e., stochastic models for networks incorporating heterogeneity and/or dependence among edges) are increasingly used in the study of social and other networks.
Abstract: General random graphs (i.e., stochastic models for networks incorporating heterogeneity and/or dependence among edges) are increasingly in wide use in the study of social and other networks, but fe...
TL;DR: In this article, the authors describe a simple strategy for doing more reliable ethnography: after fieldwork has commenced, investigators can use thought experiments to recognize inconvenient phenomena, such as the ethnographic trial and the inconvenience sample.
Abstract: This paper describes a simple strategy for doing more reliable ethnography: after fieldwork has commenced, investigators can use thought experiments to recognize inconvenient phenomena. Two examples are discussed: “the ethnographic trial” and the “inconvenience sample.” The paper uses Clifford Geertz's classic “Notes on the Balinese Cockfight” as a case of how work could be made more reliable with such strategies. It highlights the value of systematically identifying aspects of the situation under study that have been excluded from the analysis.