TL;DR: A tutorial is provided illustrating an approach to estimation of and inference about direct, indirect, and total effects in statistical mediation analysis with a multicategorical independent variable that reproduces the observed and adjusted group means while also generating effects having simple interpretations.
Abstract: Virtually all discussions and applications of statistical mediation analysis have been based on the condition that the independent variable is dichotomous or continuous, even though investigators frequently are interested in testing mediation hypotheses involving a multicategorical independent variable (such as two or more experimental conditions relative to a control group). We provide a tutorial illustrating an approach to estimation of and inference about direct, indirect, and total effects in statistical mediation analysis with a multicategorical independent variable. The approach is mathematically equivalent to analysis of (co)variance and reproduces the observed and adjusted group means while also generating effects having simple interpretations. Supplementary material available online includes extensions to this approach and Mplus, SPSS, and SAS code that implements it.
TL;DR: The authors discusses limitations of two approaches commonly used to control for unobserved group-level heterogeneity in finance research, i.e., demeaning the dependent variable with respect to the group and adding the mean of the group's dependent variable as a control.
Abstract: Controlling for unobserved heterogeneity (or “common errors”), such as industry-specific shocks, is a fundamental challenge in empirical research, as failing to do so can introduce omitted variables biases and preclude causal inference. This paper discusses limitations of two approaches commonly used to control for unobserved group-level heterogeneity in finance research—demeaning the dependent variable with respect to the group (e.g., “industry-adjusting”) and adding the mean of the group’s dependent variable as a control. We show that these techniques, which are used widely in both asset pricing and corporate finance research, typically provide inconsistent coefficients and can lead researchers to incorrect inferences. In contrast, the fixed effects estimator is consistent and should be used instead. We also explain how to estimate the fixed effects model when traditional methods are computationally infeasible. (JEL G12, G2, G3, C01, C13)
TL;DR: This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of coll inearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinairity.
Abstract: A multivariable analysis is the most popular approach when investigating associations between risk factors and disease. However, efficiency of multivariable analysis highly depends on correlation structure among predictive variables. When the covariates in the model are not independent one another, collinearity/multicollinearity problems arise in the analysis, which leads to biased estimation. This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of collinearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinearity. Three correlation scenarios among predictor variables are considered: (1) bivariate collinear structure as the most simple collinearity case, (2) multivariate collinear structure where an explanatory variable is correlated with two other covariates, (3) a more realistic scenario when an independent variable can be expressed by various functions including the other variables.
TL;DR: In this paper, a questionnaire was designed to collect the data on the factors related to compensation like salary, rewards, indirect compensation and employee performance and different analytical and descriptive techniques were used to analyze the data.
Abstract: Purpose: Compensation is very important for the performance of the employees. Therefore they are very important for the organization too. The purpose of this research is to measure the impact of compensation on employee performance. Methodology: A questionnaire was designed to collect the data on the factors related to compensation like salary, rewards, Indirect Compensation and employee performance. The data was collected from different banks of Pakistan. The data collected were analyzed in SPSS 17.0 Version. Different analytical and descriptive techniques were used to analyze the data. Findings: It is founded from different results that Compensation has positive impact on employee performance. It is proved from correlation analysis that all the independent variables have weak or moderate positive relationship to each other. Regression analysis shows that all the independent variables have insignificant and positive impact on employee performance. Descriptive analysis also reveals that all the independent variables have positive impact on employee performance. ANOVA results reveal that education have not same impact on employee performance. Limitations/implications of the research: The major limitation of this research is that this study only covers the banking sector of Punjab. Another limitation is that it excludes many variables of compensation due to shortage of time. Funds were also another limitation. Apart from these limitations this research may provide insights to the managers to enhance the employee performance of their subordinates.
TL;DR: In this paper, the authors used a large-scale travel survey to compare commuter satisfaction across six modes of transportation (walking, bicycle, automobile, bus, metro, commuter train) and investigated how the determinants of commuter satisfaction differ across modes.
Abstract: Understanding satisfaction across different modes of transportation is essential to encourage the use of active modes of transport as well as public transportation. This study uses a large-scale travel survey to compare commuter satisfaction across six modes of transportation (walking, bicycle, automobile, bus, metro, commuter train) and investigates how the determinants of commuter satisfaction differ across modes. The framework guiding this research assumes that external and internal factors influence satisfaction: personal, social, and attitudinal variables must be considered in addition to objective trip characteristics. Using ordinary least square regression technique, the authors develop six mode-specific models of trip satisfaction that include the same independent variables (trip and travel characteristics, personal characteristics, and travel and mode preferences). They find that pedestrians, train commuters and cyclists are significantly more satisfied than drivers, metro and bus users. They also establish that determinants of satisfaction vary considerably by mode, with modes that are more affected by external factors generally displaying lower levels of satisfaction. Mode preference (need/desire to use other modes) affects satisfaction, particularly for transit users. Perceptions that the commute has value other than arriving at a destination significantly increases satisfaction for all modes. Findings from this study provide a better understanding of determinants of trip satisfaction to transport professionals who are interested in this topic and working on increasing satisfaction among different mode users.
TL;DR: In this article, the authors provide guidance on how best to explain the interaction effects theoretically within and across levels of analysis, and suggest that authors pay particular attention to nesting in order to theoretically rule out reverse interactions.
Abstract: Many manuscripts submitted to the Journal of International Business Studies propose an interaction effect in their models in an effort to explain the complexity and contingency of relationships across borders. In this article, we provide guidance on how best to explain the interaction effects theoretically within and across levels of analysis. First, in the case of interactions within the same level of analysis, we suggest that authors provide an explanation of the mechanisms that link the main independent variable to the dependent variable, and then explain how the interaction variable modifies these mechanisms. Moreover, to ensure that the arguments are theoretically complete, we suggest that authors theoretically rule out the potential reverse interaction effect between the main variable and moderating variable. Second, in the case of interactions across levels of analysis, we suggest that authors identify the cross-level nature of the moderating relationships, specify the level of analysis of the main relationship and the nested nature of the cross-level influences, and theoretically explain these cross-level influences. Additionally, we suggest that authors pay particular attention to nesting in order to theoretically rule out reverse interactions.
TL;DR: Managers had negative attitudes to use, but top-level managers reported more use than other respondents, and access to social network sites at the workplace was positively related to both dependent variables, whereas policies prohibiting showed the opposite relationship.
Abstract: A total of 11,018 employees participated in a survey investigating whether demographic, personality, and work-related variables could explain variance in attitudes towards and actual use of social network sites for personal purposes during working hours. Age was negatively related to both dependent variables. Male gender, single status, and education were positively associated with both dependent variables. Managers had negative attitudes to use, but top-level managers reported more use than other respondents. Access to social network sites at the workplace was positively related to both dependent variables, whereas policies prohibiting showed the opposite relationship. Extraversion and Neuroticism were positively related to both dependent variables. Conscientiousness, positive challenge at work, and quantitative demands were all negatively related to both dependent variables.
TL;DR: In this article, the most important determinants of capital structure of 870 listed Indian firms comprising both private sector companies and government companies for the period 2001-2010 were identified using regression analysis, and it was concluded that factors such as profitability, growth, asset tangibility, size, cost of debt, tax rate, and debt serving capacity have significant impact on the leverage structure chosen by firms in the Indian context.
Abstract: The paper identifies the most important determinants of capital structure of 870 listed Indian firms comprising both private sector companies and government companies for the period 2001–2010. Ten independent variables and three dependent variables have been tested using regression analysis. It has been concluded that factors such as profitability, growth, asset tangibility, size, cost of debt, tax rate, and debt serving capacity have significant impact on the leverage structure chosen by firms in the Indian context.
TL;DR: In this article, a spatial Difference-in-Differences (SDID) estimator is proposed to account for possible spatial spillover effects, and an empirical application of the SDID estimator based on the development of a new commuter rail transit system for the suburban agglomeration of Montreal (Canada) is presented and compared to the usual DID estimator.
Abstract: Evaluating the impact of public mass transit systems on real-estate values is an important application of the hedonic price model (HPM) Recently, a mathematical transformation of this approach has been proposed to account for the potential omission of latent spatial variables that may overestimate the impact of accessibility to mass transit systems on values The development of a Difference-in-Differences (DID) estimator, based on the repeat-sales approach, is a move in the right direction However, such an estimator neglects the possibility that specification of the price equation may follow a spatial autoregressive process with respect to the dependent variable The objective of this paper is to propose a spatial Difference-in-Differences (SDID) estimator accounting for possible spatial spillover effects Particular emphasis is placed on the development of a suitable weights matrix accounting for spatial links between observations Finally, an empirical application of the SDID estimator based on the development of a new commuter rail transit system for the suburban agglomeration of Montreal (Canada) is presented and compared to the usual DID estimator
TL;DR: In this article, an estimator for the regression parameters of a spatially lagged dependent variable in terms of an endogenous weighting matrix is proposed, and the results suggest that there is a bootlegging effect in which buyers or agents cross state borders to purchase cigarettes.
TL;DR: An approach to confounder‐selection based on the use of causal diagrams (often called directed acyclic graphs) is discussed, illustrated by constructing a causal diagram for the research question: ‘Does personal smoking affect the risk of subsequent asthma?’.
Abstract: In respiratory health research, interest often lies in estimating the effect of an exposure on a health outcome. If randomization of the exposure of interest is not possible, estimating its effect is typically complicated by confounding bias. This can often be dealt with by controlling for the variables causing the confounding, if measured, in the statistical analysis. Common statistical methods used to achieve this include multivariable regression models adjusting for selected confounding variables or stratification on those variables. Therefore, a key question is which measured variables need to be controlled for in order to remove confounding. An approach to confounder-selection based on the use of causal diagrams (often called directed acyclic graphs) is discussed. A causal diagram is a visual representation of the causal relationships believed to exist between the variables of interest, including the exposure, outcome and potential confounding variables. After creating a causal diagram for the research question, an intuitive and easy-to-use set of rules can be applied, based on a foundation of rigorous mathematics, to decide which measured variables must be controlled for in the statistical analysis in order to remove confounding, to the extent that is possible using the available data. This approach is illustrated by constructing a causal diagram for the research question: 'Does personal smoking affect the risk of subsequent asthma?'. Using data taken from the Tasmanian Longitudinal Health Study, the statistical analysis suggested by the causal diagram approach was performed.
TL;DR: In this article, the authors discuss four different models that have been used in economic geography to explain the spatial context of network structures and their dynamics, and discuss the strengths and weaknesses of the different approach together with domains of applicability the geography of innovation studies.
Abstract: The importance of network structures for the transmission of knowledge and the diffusion of technological change has been recently emphasized in economic geography. Since network structures drive the innovative and economic performance of actors in regional contexts, it is crucial to explain how networks form and evolve over time and how they facilitate inter-organizational learning and knowledge transfer. The analysis of relational dependent variables, however, requires specific statistical procedures. In this paper, we discuss four different models that have been used in economic geography to explain the spatial context of network structures and their dynamics. First, we review gravity models and their recent extensions and modifications to deal with the specific characteristics of networked (individual level) relations. Second, we discuss the quadratic assignment procedure that has been developed in mathematical sociology for diminishing the bias induced by network dependencies. Third, we present exponential random graph models that not only allow dependence between observations, but also model such network dependencies explicitly. Finally, we deal with dynamic networks, by introducing stochastic actor-oriented models. Strengths and weaknesses of the different approach are discussed together with domains of applicability the geography of innovation studies.
TL;DR: It is argued here that many medical applications of machine learning models in genetic disease risk prediction rely essentially on two factors: effective model regularization and rigorous model validation.
Abstract: Supervised machine learning aims at constructing a genotype–phenotype model by learning such genetic patterns from a labeled set of training examples that will also provide accurate phenotypic predictions in new cases with similar genetic background. Such predictive models are increasingly being applied to the mining of panels of genetic variants, environmental, or other nongenetic factors in the prediction of various complex traits and disease phenotypes [1]–[8]. These studies are providing increasing evidence in support of the idea that machine learning provides a complementary view into the analysis of high-dimensional genetic datasets as compared to standard statistical association testing approaches. In contrast to identifying variants explaining most of the phenotypic variation at the population level, supervised machine learning models aim to maximize the predictive (or generalization) power at the level of individuals, hence providing exciting opportunities for e.g., individualized risk prediction based on personal genetic profiles [9]–[11]. Machine learning models can also deal with genetic interactions, which are known to play an important role in the development and treatment of many complex diseases [12]–[16], but are often missed by single-locus association tests [17]. Even in the absence of significant single-loci marginal effects, multilocus panels from distinct molecular pathways may provide synergistic contribution to the prediction power, thereby revealing part of such hidden heritability component that has remained missing because of too small marginal effects to pass the stringent genome-wide significance filters [18]. Multivariate modeling approaches have already been shown to provide improved insights into genetic mechanisms and the interaction networks behind many complex traits, including atherosclerosis, coronary heart disease, and lipid levels, which would have gone undetected using the standard univariate modeling [2], [19]–[22]. However, machine learning models also come with inherent pitfalls, such as increased computational complexity and the risk for model overfitting, which must be understood in order to avoid reporting unrealistic prediction models or over-optimistic prediction results.
We argue here that many medical applications of machine learning models in genetic disease risk prediction rely essentially on two factors: effective model regularization and rigorous model validation. We demonstrate the effects of these factors using representative examples from the literature as well as illustrative case examples. This review is not meant to be a comprehensive survey of all predictive modeling approaches, but we focus on regularized machine learning models, which enforces constraints on the complexity of the learned models so that they would ignore irrelevant patterns in the training examples. Simple risk allele counting or other multilocus risk models that do not incorporate any model parameters to be learned are outside the scope of this review; in fact, such simplistic models that assume independent variants may lead to suboptimal prediction performance in the presence of either direct or indirect interactions through epistasis effects or linkage disequilibrium, respectively [23], [24]. Perhaps the simplest models considered here as learning approaches are those based on weighted risk allele summaries [23], [25]. However, even with such basic risk models intended for predictive purposes, it is important to learn the model parameters (e.g., select the variants and determine their weights) based on training data only; otherwise there is a severe risk of model overfitting, i.e., models not being capable of generalizing to new samples [5]. Representative examples of how model learning and regularization approaches address the overfitting problem are briefly summarized in Box 1, while those readers interested in their implementation details are referred to the accompanying Text S1. We specifically promote here the use of such regularized machine learning models that are scalable to the entire genome-wide scale, often based on linear models, which are easy to interpret and also enable straightforward variable selection. Genome-scale approaches avoid the need of relying on two-stage approaches [26], which apply standard statistical procedures to reduce the number of variants, since such prefiltering may miss predictive interactions across loci and therefore lead to reduced predictive performance [8], [24], [25], [27], [28].
Box 1. Synthesis of Learning Models for Genetic Risk Prediction
The aim of risk models is to capture in a mathematical form the patterns in the genetic and non-genetic data most important for the prediction of disease susceptibility. The first step in model building involves choosing the functional form of the model (e.g., linear or nonlinear), and then making use of a given training data to determine the adjustable parameters of the model (e.g., a subset of variants, their weights, and other model parameters). While it is often sufficient for a statistical model to enable high enough explanatory power in the discovery material, without being overly complicated, a predictive model is also required to generalize to unseen cases.
One consideration in the model construction is how to encode the genotypic measurements using genotype models, such as the dominant, recessive, multiplicative, or additive model, each implying different assumptions about the genetic effects in the data [79]. Categorical variables 0, 1, and 2 are typically used for treating genetic predictor variables (e.g., minor allele dosage), while numeric values are required for continuous risk factors (e.g., blood pressure). Expected posterior probabilities of the genotypes can also be used, especially for imputed genotypes. Transforming the genotype categories into three binary features is an alternative way to deal with missing values without imputation (used in the T1D example; see Text S1 for details).
Statistical or machine learning models identify statistical or predictive interactions, respectively, rather than biological interactions between or within variants [12], [80]. While nonlinear models may better capture complex genetic interactions [7], [81], linear models are easier to interpret and provide a scalable option for performing supervised selection of multilocus variant panels at the genome-wide scale [3]. In linear models, genetic interactions are modeled implicitly by selecting such variant combinations that together are predictive of the phenotype, rather than considering pairwise gene–gene relationships explicitly. Formally, trait yi to be predicted for an individual i is modeled as a linear combination of the individual's predictor variables xij:
(1)
Here, the weights wj are assumed constant across the n individuals, w 0 is the bias offset term and p indicates the number of predictors discovered in the training data. In its basic form, Eq. 1 can be used for modeling continuous traits y (linear regression). For case-control classification, the binary dependent variable y is often transformed using a logistic loss function, which models the probability of a case class given a genotype profile and other risk factor covariates x (logistic regression). It has been shown that the logistic regression and naive Bayes risk models are mathematically very closely related in the context of genetic risk prediction [81].
TL;DR: In this paper, the authors explored the validation of Environmental Kuznets Curve (EKC) hypothesis for Pakistan using time series yearly data 1980-2011 using ARDL bound testing approach to cointegration and VECM-Granger causality test.
Abstract: This study explores the validation of Environmental Kuznets Curve (EKC) hypothesis for Pakistan using time series yearly data 1980-2011 We have taken deforestation as the dependant variable for environmental degradation and four independent variables ie income, energy consumption, trade openness, and population to test the link between these underlying variables We employed ARDL bound testing approach to cointegration and VECM-Granger causality test The results confirmed cointegration among the variables both in short-run and long-run path However, the diminishing negative impact of income on deforestation in long run path confirms the EKC hypothesis for deforestation in Pakistan There is unidirectional causality from income and energy consumption to deforestation and the bidirectional causal effect is detected between income and energy consumption Whereas, in long run income and trade openness granger causes energy consumption The diagnostic test also supported the results and model found stable during sensitivity analysis This study is uniquely designed with the number of significant tests that ensure reliability of results for policy use and contribute future research direction on environment-growth-energy nexus
TL;DR: In this article, a nonparametric mutual information (MI) estimator based on k-nearest-neighbor graphs is proposed, which is robust to local non-uniformity and works well with limited data.
Abstract: We demonstrate that a popular class of nonparametric mutual information (MI) estimators based on k-nearest-neighbor graphs requires number of samples that scales exponentially with the true MI. Consequently, accurate estimation of MI between two strongly dependent variables is possible only for prohibitively large sample size. This important yet overlooked shortcoming of the existing estimators is due to their implicit reliance on local uniformity of the underlying joint distribution. We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude. We demonstrate the superior performance of the proposed estimator on both synthetic and real-world data.
TL;DR: In this paper, the authors investigate the relationship between stock return variation and several aspects of information and governance structures, in both a cross-country setting and a crossfirm setting within the U.S. They show that higher (or equivalently, lower R2) resembles noise.
Abstract: A growing literature investigates the association between stock return variation and several aspects of information and governance structures, in both a cross-country setting and a cross-firm setting within the U.S. Papers use either idiosyncratic stock return volatility or R2 as interchangeable measures of firm-specific return variation but report inconsistent results. An important reason for the differing interpretations is the assumption about whether lower R2 (or higher ) captures firm-specific news or noise. We document that higher (or equivalently, lower R2) resembles noise. In addition, we show, analytically and empirically, that different results obtain when using R2 or because the systematic risk inherent in the R2 metric is also correlated with the independent variable of interest. Therefore, we recommend that when assessing the association between R2 (or ) and some independent variable, researchers (1) control for elements of systematic risk and (2) triangulate their findings with oth...
TL;DR: A special case of spatial cross-validation, spatial leave-one-out (SLOO), giving a criterion equivalent to the AIC in the absence of spatial autocorrelation is proposed, which appears to be a promising solution for selecting relevant variables from most ecological spatial datasets.
Abstract: Aim Processes and variables measured in ecology are almost always spatially autocorrelated, potentially leading to the choice of overly complex models when performing variable selection. One way to solve this problem is to account for residual spatial autocorrelation (RSA) for each subset of variables considered and then use a classical model selection criterion such as the Akaike information criterion (AIC). However, this method can be laborious and it raises other concerns such as which spatial model to use or how to compare different spatial models. To improve the accuracy of variable selection in ecology, this study evaluates an alternative method based on a spatial cross-validation procedure. Such a procedure is usually used for model evaluation but can also provide interesting outcomes for variable selection in the presence of spatial autocorrelation. Innovation We propose to use a special case of spatial cross-validation, spatial leave-one-out (SLOO), giving a criterion equivalent to the AIC in the absence of spatial autocorrelation. SLOO only computes non-spatial models and uses a threshold distance (equal to the range of RSA) to keep each point left out spatially independent from the others. We first provide some simulations to evaluate how SLOO performs compared with AIC. We then assess the robustness of SLOO on a large-scale dataset. R software codes are provided for generalized linear models. Main conclusions The AIC was relevant for variable selection in the presence of RSA if the independent variables considered were not spatially autocorrelated. It otherwise failed because highly spatially autocorrelated variables were more often selected than others. Conversely, SLOO had similar performances whether the variables were themselves spatially autocorrelated or not. It was particularly useful when the range of RSA was small, which is a common property of spatial tools. SLOO appears to be a promising solution for selecting relevant variables from most ecological spatial datasets.
TL;DR: In this paper, a Bayesian approach to determining an appropriate local or global specification, SDEM versus SDM, for static panel variants of these two models is set forth, and the logic of the Bayesian view of model uncertainty suggests these are the only two specifications that need to be considered.
Abstract: Taking a Bayesian perspective on model uncertainty for static panel data models proposed in the spatial econometrics literature considerably simplifies the task of selecting an appropriate model. A wide variety of alternative specifications that include various combinations of spatial dependence in lagged values of the dependent variable, spatial lags of the explanatory variables, as well as dependence in the model disturbances have been the focus of a literature on various statistical tests for distinguishing between these numerous specifications. A Bayesian model uncertainty argument is advanced that logically implies we can simplify this task by focusing on only two model specifications. One of these, labeled the spatial Durbin model (SDM) implies global spatial spillovers , while the second, labeled a spatial Durbin error model (SDEM) leads to local spatial spillovers . A Bayesian approach to determining an appropriate local or global specification, SDEM versus SDM is set forth here for static panel variants of these two models. The logic of the Bayesian view of model uncertainty suggests these are the only two specifications that need to be considered. This greatly simplifies the task confronting practitioners when using static panel data models.
TL;DR: In this article, the authors identify three research gaps: few studies treat speed as an independent variable; most studies analyze speed only until internationalization starts; and, finally, studies have paid little attention to the multidimensionality of the speed concept.
Abstract: This paper studies the performance consequences of the speed of SME internationalization. The authors identify three research gaps: few studies treat speed as an independent variable; most studies analyze speed only until internationalization starts; and, finally, studies have paid little attention to the multidimensionality of the speed concept. The authors seek to address these gaps and to contribute to the literature on the dynamics of internationalization by developing three measures of internationalization speed, which capture its multidimensionality. Building on the theories of learning advantage of newness and time compression diseconomies, the study presents three hypotheses on speed’s effect on performance, and the theoretically derived research model is tested on a sample of 183 SMEs visited on site. The analysis demonstrates that the speed of a firm’s increase in the breadth of its international markets has a positive but curvilinear effect on firm performance. It also demonstrates that the speed of a firm’s increase in commitment of foreign resources has a negative but curvilinear effect on the performance of the firm. These results have implications both for scholars interested in the dynamics of firm internationalization and for SME managers.
TL;DR: In this paper, the authors present the results of a survey with 13 main activities related to human factors that are executed during kaizen implementation process and were integrated in four independent latent variables (management commitment, education, communication and motivation) that are associated to 14 benefits obtained after its implementation.
Abstract: This article presents the results of a survey with 13 main activities related to human factors that are executed during kaizen implementation process and were integrated in four independent latent variables (management commitment, education, communication and motivation) that are associated to 14 benefits obtained after its implementation that were grouped in three dependent latent variables (process, workers and customers). The survey was applied to persons with responsibilities in continuous improvement programs and projects in companies located in Mexico. Independent and dependent variables were integrated in a structural equation model that was evaluated using partial least squares algorithms WarpPLS® for finding causal relations among them. Results indicate that management commitment and education are the main factors that guarantee the success for kaizen implementation programs, but that is moderated by a good communication for having good operational process performance for better workers and customer satisfaction.
TL;DR: The results show that, when each method is applied independently, the variable’s importance rankings are similar and, in addition, coincide with the hierarchy established by researchers who have applied other techniques.
Abstract: One of the main limitations of artificial neural networks (ANN) is their high inability to know in an explicit way the relations established between explanatory variables (input) and dependent variables (output). This is a major reason why they are usually called "black boxes." In the last few years, several methods have been proposed to assess the relative importance of each explanatory variable. Nevertheless, it has not been possible to reach a consensus on which is the best-performing method. This is largely due to the different relative importance obtained for each variable depending on the method used. This importance also varies with the designed network architecture and/or with the initial random weights used to train the ANN. This paper proposes a procedure that seeks to minimize these problems and provides consistency in the results obtained from different methods. Essentially, the idea is to work with a set of neural networks instead of a single one. The proposed procedure is validated using a database collected from a customer satisfaction survey, which was conducted on the public transport system of Granada (Spain) in 2007. The results show that, when each method is applied independently, the variable's importance rankings are similar and, in addition, coincide with the hierarchy established by researchers who have applied other techniques.
TL;DR: The combination of employment and family demands is largely unassociated with health status in countries with dual-earner family policy models, but is associated with poorer health outcomes in countriesWith market-oriented models, mainly among men.
Abstract: Objectives: The objectives of this study were: (i) to analyse the relationship between health status and paid working hours and household composition in the EU-27, and (ii) to examine whether patterns of association differ as a function of family policy typologies and gender. Methods: Cross-sectional study based on data from the 5th European Working Conditions Survey of 2010. The sample included married or cohabiting employees aged 25-64 years from the EU-27 (10,482 men and 8,882 women). The dependent variables were self-perceived health status and psychological well-being. Results: Irrespective of differences in family policy typologies between countries, working long hours was more common among men, and part-time work was more common among women. In Continental and Southern European countries, employment and family demands were associated with poor health status in both sexes, but more consistently among women. In Anglo-Saxon countries, the association was mainly limited to men. Finally, in Nordic and Eastern European countries, employment and family demands were largely unassociated with poor health outcomes in both sexes. Conclusions: The combination of employment and family demands is largely unassociated with health status in countries with dual-earner family policy models, but is associated with poorer health outcomes in countries with market-oriented models, mainly among men. This association is more consistent among women in countries with traditional models, where males are the bread- winners and females are responsible for domestic and care work.
TL;DR: In this article, the authors identify the determinants of the geographical mobility of skilled individuals such as inventors across European regions, and highlight the importance of physical proximity, job opportunities, social networks, as well as other relational variables in mediating this phenomenon.
Abstract: The aim of this paper is to identify the determinants of the geographical mobility of skilled individuals, such as inventors, across European regions. Among a large number of variables, we focus on the role of social proximity between inventors’ communities. We use a control function approach to address the endogenous nature of networks, and zero-inflated negative binomial models to accommodate our estimations to the count nature of the dependent variable and the high number of zeros it contains. Our results highlight the importance of physical proximity, job opportunities, social networks, as well as other relational variables in mediating this phenomenon.
TL;DR: This work provides formulae that account for between-study variation and suggests that researchers set sample sizes with respect to the authors' generally more conservative formULae, which generalize to settings in which there are multiple effects of interest.
Abstract: Statistical power depends on the size of the effect of interest. However, effect sizes are rarely fixed in psychological research: Study design choices, such as the operationalization of the dependent variable or the treatment manipulation, the social context, the subject pool, or the time of day, typically cause systematic variation in the effect size. Ignoring this between-study variation, as standard power formulae do, results in assessments of power that are too optimistic. Consequently, when researchers attempting replication set sample sizes using these formulae, their studies will be underpowered and will thus fail at a greater than expected rate. We illustrate this with both hypothetical examples and data on several well-studied phenomena in psychology. We provide formulae that account for between-study variation and suggest that researchers set sample sizes with respect to our generally more conservative formulae. Our formulae generalize to settings in which there are multiple effects of interest. We also introduce an easy-to-use website that implements our approach to setting sample sizes. Finally, we conclude with recommendations for quantifying between-study variation.
TL;DR: Structural equation modeling (SEM) is a comprehensive statistical modeling tool for analyzing multivariate data involving complex relationships between and among variables (Hoyle, 1995). SEM surpasses traditional regression models by including multiple independent and dependent variables to test associated hypothesizes about relationships among observed and latent variables as discussed by the authors.
Abstract: Structural equation modeling (SEM) is a comprehensive statistical modeling tool for analyzing multivariate data involving complex relationships between and among variables (Hoyle, 1995). SEM surpasses traditional regression models by including multiple independent and dependent variables to test associated hypothesizes about relationships among observed and latent variables. SEM explain why results occur while reducing misleading results by submitting all variables in the model to measurement error or uncontrolled variation of the measured variables. The purpose of this article is to provide basic knowledge of structural equation modeling methodology for testing relationships between indicator variables and latent constructs where SEM is the analysis technique of the research statistical design. It is noteworthy, SEM provides a way to test the specified set of relationships among observed and latent variables as a whole, and allow theory testing even when experiments are not possible. Consequently, these methodological approaches have become ubiquitous in the scientific research process of all disciplines.
TL;DR: In this paper, a multi-linear regression method was used to determine the factors that affect poverty in Indonesia, which is an analysis tool that is used to see the effect of the independent variables which are, the level of income per capita, the rate of inflation, the amount of household consumption, level of education, human development index (HDI), and the poverty level as dependen variable.
Abstract: The purpose of this study is to determine the factors that affect poverty in Indonesia. The method used in this study is a multi-linear regression, which is an analysis tool that is used to see the effect of the independent variables which are, the level of income per capita, the rate of inflation, the level of household consumption, level of education, human development index (HDI) and the poverty level as dependen variable. The data used in this study is secondary data of 33 provinces in Indonesia in 2012. From the study it can be concluded that the variable income per capita, inflation, education level human development index (HDI) and consumption simultaneously affects variable rate of poverty, it can be seen from the test that showing the level signifkansi f <0.05. And from R square is known that the independent variable can explain the poverty rate by 56 percent and the remaining 44 percent would be explained by other variables whose not examined in this study DOI: 10.15408/ess.v4i1.1966
TL;DR: Three methods for phylogenetic regression analyses designed for binary dependent variables (traits with two discrete states) both with each other and with “standard” methods that either ignore phylogenetic relationships or ignore the binary character of the dependent variable are compared.
Abstract: We compare three methods for phylogenetic regression analyses designed for binary dependent variables (traits with two discrete states) both with each other and with “standard” methods that either ignore phylogenetic relationships or ignore the binary character of the dependent variable. In simulations designed to reveal statistical problems arising in different methods, PLogReg (Ives and Garland 2010) performed better than PGLMM (Ives and Helmus 2011) and MCMCglmm (Hadfield 2010) to identify phylogenetic signal in the absence of independent variables; PLogReg also outperformed a standard method for detecting phylogenetic signal in binary data, ancestral character estimation (Schluter et al. 1997; Pagel 1994). All three phylogenetic methods performed similarly for identifying relationships with a continuously valued independent variable x, with all methods having at most moderately inflated Type I error rates, and MCMCglmm having slightly greater power. In contrast, standard logistic regression that ignores phylogeny had seriously inflated Type I errors when x had phylogenetic signal. Perhaps surprisingly, phylogenetic regression that ignored the binary nature of the dependent variable, RegOU (Lavin et al. 2008), performed as well or better than the other methods, at least for larger sample sizes (≥64 species), although this approach does not result in a model that can be used to simulate data (e.g., for bootstrapping). We also apply the methods to a data set describing whether antelopes fight or flee versus hide from predators as a function of their group size (Brashares et al. 2000). We end with rough guidelines for analyzing binary dependent variables, with the main recommendation being that multiple methods and simulations should be used to give confidence in the statistical results.
TL;DR: In this paper, two types of numerical energy models were developed to predict the United States' future industrial energy demand. And they used an ANN (artificial neural network) technique and a MLR (multiple linear regression) technique.
TL;DR: The authors showed that the perceived multicollinearity is merely an illusion that arises from misinterpreting high correlations between independent variables and interaction terms, and that in most cases, such correlations arise from misinterpretation.
Abstract: Numerous papers in the fields of marketing and consumer behavior that utilize moderated multiple regression express concerns regarding multicollinearity issues. In most cases, however, as we show in this paper, the perceived multicollinearity is merely an illusion that arises from misinterpreting high correlations between independent variables and interaction terms.
TL;DR: In this paper, the authors compare the relative efficiency of OLS vs. TS in cross-sectional valuation settings and show that TS is more efficient than OLS under non-ideal conditions.
Abstract: OLS-based archival accounting research encounters two well-known problems. First, outliers tend to influence results excessively. Second, heteroscedastic error terms raise the spectre of inefficient estimation and the need to scale variables. This paper applies a robust estimation approach due to Theil (1950) and Sen (1968) (TS henceforth). The TS method is easily understood and it circumvents the two problems in an elegant, direct way. Because TS and OLS are roughly equally efficient under OLS-ideal conditions (Wilcox 2010), one naturally hypothesizes that TS should be more efficient than OLS under non-ideal conditions. This research compares the relative efficiency of OLS vs. TS in cross-sectional valuation settings. There are two dependent variables, market value and subsequent year earnings; basic accounting variables appear on the equations’ right-hand side. Two criteria are used to compare the estimation methods’ performance: (i) the inter-temporal stability of estimated coefficients and (ii) the goodness-of-fit as measured by the fitted values’ ability to explain actual values. TS dominates OLS on both criteria, and often materially so. Differences in inter-temporal stability of estimated coefficients are particularly apparent, partially due to OLS estimates occasionally resulting in “incorrect” signs. Conclusions remain even if winsorization and the scaling of variables modify OLS.