TL;DR: A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect and found two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power.
Abstract: A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect. An intervening variable (mediator) transmits the effect of an independent variable to a dependent variable. The commonly used R. M. Baron and D. A. Kenny (1986) approach has low statistical power. Two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power except in 1 important case in which Type I error rates are too high. The best balance of Type I error and statistical power across all cases is the test of the joint significance of the two effects comprising the intervening variable effect.
TL;DR: The authors present the case that dichotomization is rarely defensible and often will yield misleading results.
Abstract: The authors examine the practice of dichotomization of quantitative measures, wherein relationships among variables are examined after 1 or more variables have been converted to dichotomous variables by splitting the sample at some point on the scale(s) of measurement. A common form of dichotomization is the median split, where the independent variable is split at the median to form high and low groups, which are then compared with respect to their means on the dependent variable. The consequences of dichotomization for measurement and statistical analyses are illustrated and discussed. The use of dichotomization in practice is described, and justifications that are offered for such usage are examined. The authors present the case that dichotomization is rarely defensible and often will yield misleading results. We consider here some simple statistical procedures for studying relationships of one or more independent variables to one dependent variable, where all variables are quantitative in nature and are measured on meaningful numerical scales. Such measures are often referred to as individual-differences measures, meaning that observed values of such measures are interpretable as reflecting individual differences on the attribute of interest. It is of course straightforward to analyze such data using correlational methods. In the case of a single independent variable, one can use simple linear regression and/or obtain a simple correlation coefficient. In the case of multiple independent variables, one can use multiple regression, possibly including interaction terms. Such methods are routinely used in practice. However, another approach to analysis of such data is also rather widely used. Considering the case of one independent variable, many investigators begin by converting that variable into a dichotomous variable by splitting the scale at some point and designating individuals above and below that point as defining
TL;DR: By extending randomization approaches to ANNs, the “black box” mechanics of ANNs can be greatly illuminated and by coupling this new explanatory power of neural networks with its strong predictive abilities, ANNs promise to be a valuable quantitative tool to evaluate, understand, and predict ecological phenomena.
TL;DR: The analysis of variance is similar to the independent groups t-test; only it is when there are more than two levels of an independent variable as mentioned in this paper. But, often there are categorical variables which have more than 2 levels.
Abstract: ANalysis Of VAriance (ANOVA), or F-test, is an extension of the independent groups t-test. Analysis of variance is a more general statistical procedure than the groups t-test. You will remember that the t-test was used when we had two levels of the independent variable (males and females) and we wanted to see how the groups differed on a interval/ratio variable. However, often there are categorical variables which have more than two levels. In the Crime dataset, for instance, these include social class (rrgclass), religion (religion) and educational qualifications (hedqual). Analysis of Variance is similar to the independent groups t-test; only it is when there are more than two levels of an independent variable.
TL;DR: The logic of theory-based data analysis is discussed in this paper, where the authors present an overview of the connection between analysis, theory and statistics elements of Theory-Based Analysis and the inherent subjectivity of analysis.
Abstract: Chapter 1: Introduction to Theory-Based Data Analysis The Connection Between Analysis, Theory and Statistics Elements of Theory-Based Analysis The Inherent Subjectivity of Analysis Looking Ahead Chapter 2: The Logic of Theory-Based Data Analysis Inductive and Deductive Processes Operationalization and The Assessment of Fit The Roundabout Route of Failing to Reject Summary Chapter 3: Associations and Relationships Association: The Basic Building Block Establishing Relatedness: The "Third Variable" Association and Causality Summary Chapter 4: The Focal Relationship: Demonstrating Internal Validity Coincident Associations: The Exclusionary "Third Variable" Causal Connections: The Inclusive "Third Variable" An Example of Exclusionary and Inclusive "Third Variables" Explaining Y Versus the Focal Relationship Summary Chapter 5: Ruling Out Alternative Explanations: Spuriousness and Control Variables Spuriousness: The Illusion of Relationship The Analysis of Simple Spuriousness Complex Sources of Spuriousness The Analysis of Complex Spuriousness Death Looms on the Horizon: An Example of Partial Spuriousness Summary Chapter 6: Ruling Out Alternative Theoretical Explanations: Additional Independent Variables Redundancy: Alternative Theories Analytic Models For Redundancy Control Versus Independent Variable Summary Chapter 7: Elaborating an Explanation: Antecedent, Intervening, and Consequent Variables Intervening Variables: The Causal Mechanism The Analysis of Intervening Variables Mediation Illustrated: Explaining the Intergenerational Transmission of Divorce Antecedent and Consequent Variables Antecedent and Consequent Variables Illustrated: Divorce and Intergenerational Family Relations Summary Chapter 8: Specifying Conditions of Influence: Effect Modification and Subgroup Variation Conditional Relationships Conditional Relationships as Interactions Subgroup Analysis of Conditional Relationships Subgroup Versus Interaction Analysis Considerations in the Selection of Moderating Variables Summary Chapter 9: Synthesis and Commentary A Recap of Theory-Based Data Analysis Informative Comparisons Imperfect Knowledge
TL;DR: In this paper, Latent variable interaction modeling with continuous observed variables is presented using two different approaches: LISREL 8.30 and PRELIS2 and SIMPLIS programs.
Abstract: Latent variable interaction modeling with continuous observed variables is presented using 2 different approaches. The 1st approach analyzes data using a LISREL 8.30 program where the latent interaction variable is defined by multiplying pairs of observed variables. The 2nd approach analyzes data using PRELIS2 and SIMPLIS programs where the latent interaction variable is defined by multiplying the latent variable scores of the exogeneous latent independent variables. The programs used to create the multivariate normal observed variables and conduct the analyses for the 2 different approaches are given in the appendixes. The product indicant and latent variable score approach produced similar gamma coefficients in their hypothesized models but differed in their standard errors for the gamma coefficients. The latent variable score approach holds the promise of being easier to implement and can be applied to more complex latent variable interaction models.
TL;DR: This paper proposed a trimming procedure that yields the tightest bounds on average treatment effects consistent with the observed data, assuming a monotonicity restriction on how the assignment to treatment effects selection is performed.
Abstract: Empirical researchers routinely encounter sample selection bias whereby 1) the regressor of interest is assumed to be exogenous, 2) the dependent variable is missing in a potentially non-random manner, 3) the dependent variable is characterized by an unbounded (or very large) support, and 4) it is unknown which variables directly affect sample selection but not the outcome. This paper proposes a simple and intuitive bounding procedure that can be used in this context. The proposed trimming procedure yields the tightest bounds on average treatment effects consistent with the observed data. The key assumption is a monotonicity restriction on how the assignment to treatment effects selection -- a restriction that is implicitly assumed in standard formulations of the sample selection problem.
TL;DR: In this paper, the authors examined the influence of collision scenario random variables on the extent of predicted damage in ship collisions and used a simplified collision model to assess the sensitivity of probabilistic damage extent to these variables.
TL;DR: In this article, the authors investigated the effect of dichotomizing or categorizing one variable on the estimate of the coefficient of the other continuous variable and on prediction from the models and showed that the predictive relative efficiency is always higher for the categorized model than that for the dichotomized model.
TL;DR: In this paper, the authors proposed a nonlinear dynamic panel data model, which includes individual specific effects and lagged dependent variables in the model to allow for individual heterogeneity and dynamic feedback.
Abstract: PANEL DATA MODELS have a long history in econometrics and have become increasingly popular in empirical economics over the past two decades. Panel data expand our opportunities to study more complex economic relationships by, for example, allowing for individual heterogeneity and dynamic feedback. These goals are often achieved by including in the model individual specific effects and lagged dependent variables. These two features of the dynamic panel data model, however, often create difficulties in estimation. Although much progress has been made in the linear panel data model (see, among others, Hsiao (1986), Baltagi (1995), and Arellano and Honore (2001) for review), our knowledge of general nonlinear dynamic panel data models is very limited. In most nonlinear models, strict exogeneity of the explanatory variables is still the key assumption. The main difficulty is that with nonlinearity, it is not obvious how to "difference away" the individual specific effects and how to use instrumental variable type techniques.2 Despite these difficulties, some developments have been made on estimating certain nonlinear dynamic models using the "fixed-effects" approach, for example, the censored regression models (Honore (1993), Honore and Hu (2001)), the sample selection models (Kyriazidou (2001)), the discrete choice models (Honor6 and Kyriazidou (2000)), and the models with multiplicative individual effects (Chamberlain (1992), Wooldridge (1997)).
TL;DR: This paper showed that sampling from all cases in the relevant population produces greater confidence in the hypothesis than sampling only from cases that experience the outcome, and thus, an analysis that samples from the entire population is logically defensible.
Abstract: Previous researchers have argued that necessary and/or sufficient causes should be tested through research designs that consider only cases with limited combinations of scores on the independent and the dependent variables. I explore the utility for causal inference of the design proposed by these authors, as compared to an “All Cases Design.” I find that, if researchers define the population carefully and appropriately, each case in the population contributes to causal inference and is therefore useful. Previous authors reject this claim on the basis of a view that holds constant the marginal distribution of either the dependent or the independent variable across the working and the alternate hypotheses. I argue that this restriction is not generally appropriate, and hence, an analysis that samples from the entire population is logically defensible. I also argue that this design is more statistically efficient. A reanalysis of two well-known studies demonstrates that sampling from all cases in the relevant population produces greater confidence in the hypothesis than sampling only from cases that experience the outcome.
TL;DR: In this paper, a multilevel analysis is used to study individual choices of time allocation to maintenance, subsistence, leisure, and travel time exploiting the nested data hierarchy of households, persons, and occasions of measurement.
Abstract: In this paper multilevel analysis is used to study individual choices of time allocation to maintenance, subsistence, leisure, and travel time exploiting the nested data hierarchy of households, persons, and occasions of measurement. The multilevel models in this paper examine the joint and multivariate correlation structure of four dependent variables in a cross-sectional and longitudinal way. In this way, observed and unobserved heterogeneity are estimated using random effects at the household, person, and temporal levels. In addition, random coefficients associated with explanatory variables are also estimated and correlated with these random effects. Using the wide spectrum of options offered by multilevel models to account for individual and group heterogeneity, complex interdependencies among individuals within their households, within themselves over time, and within themselves but across different indicators of behavior, are analyzed. Findings in this analysis include large variance contribution by each level considered, clear evidence of non-linear dynamic behavior in time-allocation, different trajectories of change in time allocation for each of the four dependent variables used, and lack of symmetry in change over time characterized by different trajectories in the longitudinal evolution of each dependent variable. In addition, the multivariate correlation structure among the four dependent variables is different at each of the three levels of analysis.
TL;DR: In this paper, a method is proposed for the calibration of a continuous random variable when the dependent variables are a combination of continuous and categorical, and the model between the controlling variables and calibrated variable is empirically derived.
Abstract: A method is proposed for the calibration of a continuous random variable when the dependent variables are a combination of continuous and categorical, and the model between the controlling variables and calibrated variable is empirically derived. The various probability distributions are estimated from training data by using kernel density procedures with bi-variate normal kernels for continuous variables and uniform smoothing for discrete variables. Bayes's theorem is then used to produce the posterior distribution from which point estimates and estimates of confidence may be made. Individual posterior densities allow each case to be considered separately and cases with conflicting evidence can easily be identified for further investigation. This approach is illustrated by using part of a data set of human adult teeth from individuals of known age. Estimates from the method proposed show less bias than those from the widely used multiple regression. This allows a more accurate reconstruction of the age distributions of ancient populations. In particular bias reduction is most notable at the extreme ages, which also tend to be the least frequent, thereby widening the age distribution. This will allow a more reliable consideration of archaeological and anthropological questions relating to, for example, the maximum lifespan, age-related social structure and the development of age-related disease.
TL;DR: In this paper, the authors employ logistic regression analysis to test a model that predicts the implementation or not of Environmental Management Systems Standards (EMSS) by considering various factors as explanatory variables.
Abstract: This paper employs logistic regression analysis to test a model that predicts the implementation or not of Environmental Management Systems Standards (EMSS) by considering various factors as explanatory variables. The dependent variable is a dichotomous as either implementing or not EMSS by industrial firms. From past experience we identify 15 major variables contributing to implementation of EMSS. A sample of 259 respondents (84 implementing and 175 not) is used to estimate the parameters of the logistic regression model employing maximum likelihood. The results show an overall significant model with 4 of the 15 variables significant. The significance of management perception of environmental issues on their decision to implement EMSS was confirmed with regards to their perception on win-win possibilities. Pressure on companies to improve their environmental performance does not result in higher uptake of the standards. Company’s image and size are important factors in its decision to implement EMSS.
TL;DR: An artificial neural network-based approach for static-security assessment using radial basis function networks to predict the system severity level following a given list of contingencies and a method based on mutual information for selecting the input features of the networks.
TL;DR: Introduction to Panel Data Issues Methods-of-Moments for Panel Linear Models Topics for panel Linear Models Panel Data Estimators for Binary Responses Panel data Estimator for Limited Responses panel Data Sample Selection Models
Abstract: Introduction to Panel Data Issues Methods-of-Moments for Panel Linear Models Topics for Panel Linear Models Panel Data Estimators for Binary Responses Panel Data Estimators for Limited Responses Panel Data Sample Selection Models
TL;DR: In this paper, the authors modeled dollar values of foreign direct investment (FDI) inflows to conditions in seven Latin American countries (Argentina, Brazil, Chile, Colombia, Mexico, Peru and Venezuela) during the 1988-1992 period.
Abstract: This study models dollar values of foreign direct investment (FDI) inflows to conditions in seven Latin American countries (Argentina, Brazil, Chile, Colombia, Mexico, Peru, and Venezuela) during the 1988-1992 period. Although much research on FDI has used time series data to explain inward or outward flows, two things set this study apart. First, this study includes market reforms as independent variables. Second, this study uses newer time series econometric tools (unit root test and cointegration analysis) to correct for a spurious regression. Our model is robust, explaining 79.4 percent of variation. We found three independent variables (size of current account deficit, size of GDP, and value of privatization less FDI in privatized companies) to be significant. Although we found directional support for three other independent variables (degree of capital market liberalization, low inflation rate, and depreciation of the real exchange rate), none of these proved significant.
TL;DR: In this paper, a mathematical model developed to predict automobile ownership for individual households residing in New York City is presented, which is distinguished from previous disaggregate household-level automobile ownership models primarily by the use of ordered probit models rather than the commonly used multinomial logit and sequential logit (SL) models.
Abstract: A mathematical model developed to predict automobile ownership for individual households residing in New York City is presented. This effort is distinguished from previous disaggregate household-level automobile ownership models primarily by the use of ordered probit models rather than the commonly used multinomial logit (MNL) and sequential logit (SL) models. When the dependent variable involves ordinal categorical data (in this case, automobile ownership level--zero automobiles, one automobile, two automobiles, and three or more automobiles), the ordered probit model will discern unequal differences between ordinal categories in the dependent variable, the MNL model will treat categories as independent choice alternatives, and the SL model (a product of binary logits) will assume independence of the error terms across all binary choices. The modeling approach was based on a behavioral analysis that explained the factors influencing household automobile ownership decisions in a highly urbanized environment. In addition to socioeconomic variables, transportation and land use-related measures were developed and used to test the sensitivity of household automobile ownership choice to transit accessibility, traffic congestion, parking cost and availability, and levels of access to opportunity sites through nonmotorized transportation. The estimation results uncover important interactions between socioeconomic- and location-related elements and automobile ownership. Findings provide exploratory methodological and empirical evidence that could lead to an approach to predicting the change in household automobile ownership as a result of changes in future socioeconomic conditions and transportation and land use scenarios.
TL;DR: In this article, the authors extend LaFrance's (1985, 1986, 1990) previous work by deriving the necessary parameter restrictions for two additional classes of incomplete demand system models to be integrable.
Abstract: This study extends LaFranceÂ’'s (1985, 1986, 1990) previous research by deriving the necessary parameter restrictions for two additional classes of incomplete demand system models to be integrable In contrast to LaFrance'Â’s earlier work, this analysis considers models that treat expenditures and expenditure shares as the dependent variables in the specified incomplete demand systems With environmental economists increasingly turning to demand system approaches to value changes in environmental quality, these new results significantly expand the menu of empirical specifications which can be used to fit a given data set Moreover, the alternative specifications considered in this study, in combination with LaFrance'Â’s original work, represent a complete characterization of the linear, log-linear, and semi-log incomplete demand system models
TL;DR: The survey has established three principal findings: distance shows a strong inverse relationship with the utilisation of health services in the metropolis, the vulnerable groups of women, the aged, the sickly, the illiterate and the poor are not affected by distance decay, and independent variables that are statistically significant in influencing utilisation are education, service cost, quality of service and health status.
Abstract: The research primarily aims at testing a model, adapted from existing models, on the influence of distance on the use of health services in the Kumasi metropolis, an expanding urban centre in Ghana. Primary data, collected between August 2000 and February 2001, were used for the study. The data were analysed using a multiple regression model and compound bar graphs. A sample of 250, drawn through systematic random and stratified procedures, was used for the cross-sectional retrospective survey. Data were collected through formal interview schedules, after preliminary observational survey. The survey has established three principal findings. First, distance shows a strong inverse relationship with the utilisation of health services in the metropolis. Second, travel time and transport cost, variables that are related to distance, exhibit a weak negative and positive associations respectively with the use of health services. Third, the vulnerable groups of women, the aged, the sickly, the illiterate and the poor are not affected by distance decay in the utilisation of health services. Finally, independent variables that are statistically significant in influencing utilisation, alongside distance, are education, service cost, quality of service and health status. Recommendations for locational modelling of health services at the deprived periphery, an introduction of an insurance scheme to facilitate health care use, and recommendations for further research have been made.
TL;DR: In this paper, a unified method for the estimation in linear mixed models with errors-in-variables, based upon the corrected score function of Nakamura (1990, Biometrika, 77, 127-137), is presented.
Abstract: The independent variables of linear mixed models are subject to measurement errors in practice. In this paper, we present a unified method for the estimation in linear mixed models with errors-in-variables, based upon the corrected score function of Nakamura (1990, Biometrika, 77, 127–137). Asymptotic normality properties of the estimators are obtained. The estimators are shown to be consistent and convergent at the order of n−1/2. The performance of the proposed method is studied via simulation and the analysis of a data set on hedonic housing prices.
TL;DR: This article proposed a nonlinear model that permits changes over time in the effect of unobservables (e.g., there may be a time trend in the level of wages as well as the returns to skill in the labor market).
Abstract: This paper develops an alternative approach to the widely used Difference-In-Difference (DID) method for evaluating the effects of policy changes. In contrast to the standard approach, we introduce a nonlinear model that permits changes over time in the effect of unobservables (e.g., there may be a time trend in the level of wages as well as the returns to skill in the labor market). Further, our assumptions are independent of the scaling of the outcome. Our approach provides an estimate of the entire counterfactual distribution of outcomes that would have been experienced by the treatment group in the absence of the treatment, and likewise for the untreated group in the presence of the treatment. Thus, it enables the evaluation of policy interventions according to criteria such as a mean-variance tradeoff. We provide conditions under which the model is nonparametrically identified and propose an estimator. We consider extensions to allow for covariates and discrete dependent variables. We also analyze inference, showing that our estimator is root-N consistent and asymptotically normal. Finally, we consider an application.
TL;DR: This work will show how reduced gradients and Hessians of the response(s) with respect to the independent variables can be obtained via algorithmic, or automatic, differentiation (AD), and how these derivatives can be shown to converge twice as fast as the underlying state space iteration.
Abstract: In design optimization and parameter identification, the objective, or response function(s) are typically linked to the actually independent variables through equality constraints, which we will refer to as state equations. Our key assumption is that it is impossible to form and factor the corresponding constraint Jacobian, but one has instead some fixed-point algorithm for computing a feasible state, given any reasonable value of the independent variables. Assuming that this iteration is eventually contractive, we will show how reduced gradients (Jacobians) and Hessians (in other words, the total derivatives) of the response(s) with respect to the independent variables can be obtained via algorithmic, or automatic, differentiation (AD). In our approach the actual application of the so-called reverse, or adjoint differentiation mode is kept local to each iteration step. Consequently, the memory requirement is typically not unduly enlarged. The resulting approximating Lagrange multipliers are used to compute estimates of the reduced function values that can be shown to converge twice as fast as the underlying state space iteration. By a combination with the forward mode of AD, one can also obtain extra-accurate directional derivatives of the reduced functions as well as feasible state space directions and the corresponding reduced or projected Hessians of the Lagrangian. Our approach is verified by test calculations on an aircraft wing with two responses, namely, the lift and drag coefficient, and two variables, namely, the angle of attack and the Mach number. The state is a 2-dimensional flow field defined as solution of the discretized Euler equation under transonic conditions.
TL;DR: In this study, the programming codes written in the matlab language are presented for the two-stage error-in-variable model; four estimation problems are solved with the program.
TL;DR: In this article, the estimation of a stochastic cointegrating regression with OLS estimation is considered and a new instrumental variables (IVs) estimator is proposed and shown to be consistent under a suitable exogeneity assumption.
TL;DR: Getting Started Dataset CD Full Version S PSS and Student Version SPSS: What Is the Difference?
Abstract: Getting Started Dataset CD Full Version SPSS and Student Version SPSS: What Is the Difference? Installing Student Version Notes Introduction to SPSS The Data Editor A Must-Do: Setting Options for Variable Lists The Viewer Selecting, Printing, and Saving Output Exercises Notes Descriptive Statistics Interpreting Measures of Central Tendency and Variation Describing Nominal Variables Describing Ordinal Variables Describing Interval Variables Obtaining Case-level Information with Case Summaries Exercises Notes Transforming Variables Using Recode Recoding a Nominal-level Variable Recoding an Interval-level Variable Using Visual Binning Collapsing an Interval-level Variable with Visual Binning Using Compute Exercises Notes Making Comparisons Using Crosstabs Using Compare Means Graphing Relationships Using Line Chart Using Bar Chart Using the Chart Editor Exercises Notes Making Controlled Comparisons Using Crosstabs with Layers Obtaining and Editing Clustered Bar Charts Using Compare Means with Layers and Obtaining Multiple Line Charts Example of an Interaction Relationship Example of an Additive Relationship Exercises Notes Making Inferences about Sample Means Using Descriptives and One-Sample T Test Using Independent-Samples T Test Exercises Notes Chi-square and Measures of Association Analyzing an Ordinal-Level Relationship Summary Analyzing an Ordinal-Level Relationship with a Control Variable Analyzing a Nominal-Level Relationship with a Control Variable A Problem with Lambda Exercises Notes Correlation and Linear Regression Using Correlate and Regression --> Linear Producing and Editing a Scatterplot Exploring Multivariate Relationships with Regression --> Linear Exercises Notes Dummy Variables and Interaction Effects Regression with Dummy Variables Interaction Effects in Multiple Regression Using Compute for Interaction Variables Exercises Notes Logistic Regression Using Regression --> Binary Logistic Logistic Regression with Multiple Independent Variables Working with Predicted Probabilities: Models with One Independent Variable Working with Predicted Probabilities: Models with Multiple Independent Variables The Sample Averages Method The Probability Profile Method Exercises Notes Doing Your Own Political Analysis Five Doable Ideas Political Knowledge Economic Performance and Election Outcomes Judicial Selection and Public Opinion Electoral Turnout in Comparative Perspective Congress Doing Research on the U.S. Senate Finding Raw Data How to Code Raw Data Two Possible Coding Shortcuts Using the SPSS Text Import Wizard Writing It Up The Research Question Previous Research Data, Hypotheses, and Analysis Conclusions and Implications Notes Appendix Table A-1 Descriptions of Constructed Variables in GSS2008 Table A-2 Descriptions of Variables in States Table A-3 Description of Variables in World Table A-4 Description of Variables and Constructed Scales in NES2008
TL;DR: In this article, the relationship between vocational rehabilitation acceptance and race, gender, education, work status at application, and primary source of support at application was examined using a binary logistic regression.
Abstract: The relationship between vocational rehabilitation (VR) acceptance and race, gender, education, work status at application, and primary source of support at application was examined using a binary logistic regression. Based on the use of a stepwise entry method. race, primary source of support at application, and education were found to be statistically significant. Moreover, after all variables were entered into the binary logistic regression equation, the total amount of variance explained in the dependent variable by the independent variables was 5.3%. Compared to European Americans. African Americans were 2.12 times more likely to be accepted for VR services, when controlling for all other variables in the study. Compared to customers who reported only their own income at application, individuals who reported income from family and friends and other sources were negatively associated with VR acceptance, after controlling for all other variables in the study. Implications for VR counselors are discussed.
TL;DR: In this paper, a general framework to deal with the presence of misclassification in the response variable in choice-based samples is provided, where the contaminated data sampling distribution is written as a function of the error-free conditional distribution of the dependent variable given the covariates and the conditional misclassified probabilities of the observable variable of interest given its latent values.