TL;DR: The statistical similarities among mediation, confounding, and suppression are described and methods to determine the confidence intervals for confounding and suppression effects are proposed based on methods developed for mediated effects.
Abstract: This paper describes the statistical similarities among mediation, confounding, and suppression. Each is quantified by measuring the change in the relationship between an independent and a dependent variable after adding a third variable to the analysis. Mediation and confounding are identical statistically and can be distinguished only on conceptual grounds. Methods to determine the confidence intervals for confounding and suppression effects are proposed based on methods developed for mediated effects. Although the statistical estimation of effects and standard errors is the same, there are important conceptual differences among the three types of effects.
TL;DR: In this article, it is suggested that if the two approaches do not agree upon which of the independent variables are likely to be'significant' then the deductions must be subject to doubt.
Abstract: In many large-scale conservation or ecological problems where experiments are intractable or unethical, regression methods are used to attempt to gauge the impact of a set of nominally independent variables (X) upon a dependent variable (Y). Workers often want to assert that a given X has a major influence on Y, and so, by using this indirection to infer a probable causal relationship. There are two difficulties apart from the demonstrability issue itself: (1) multiple regression is plagued by collinear relationships in X; and (2) any regression is designed to produce a function that in some way minimizes the overall difference between the observed and ‘predicted’ Ys, which does not necessarily equate to determining probable influence in a multivariate setting. Problem (1) may be explored by comparing two avenues, one in which a single ‘best’ regression model is sought and the other where all possible regression models are considered contemporaneously. It is suggested that if the two approaches do not agree upon which of the independent variables are likely to be ‘significant’, then the deductions must be subject to doubt.
TL;DR: In this paper, the identification and estimation in panel data discrete choice models when the explanatory variable set includes strictly exogenous variables, lags of the endogenous dependent variable as well as unobservable individual-specific effects are considered.
Abstract: In this paper, we consider identification and estimation in panel data discrete choice models when the explanatory variable set includes strictly exogenous variables, lags of the endogenous dependent variable as well as unobservable individual-specific effects. For the binary logit model with the dependent variable lagged only once, Chamberlain (1993) gave conditions under which the model is not identified. We present a stronger set of conditions under which the parameters of the model are identified. The identification result suggests estimators of the model, and we show that these are consistent and asymptotically normal, although their rate of convergence is slower than the inverse of the square root of the sample size. We also consider identification in the semiparametric case where the logit assumption is relaxed. We propose an estimator in the spirit of the conditional maximum score estimator (Manski (1987)) and we show that it is consistent. In addition, we discuss an extension of the identification result to multinomial discrete choice models, and to the case where the dependent variable is lagged twice. Finally, we present some Monte Carlo evidence on the small sample performance of the proposed estimators for the binary response model.
TL;DR: In this article, an empirical methodology for modelling and mapping the air temperature (mean maximum, mean and mean minimum) and total precipitation, all of which are monthly and annual, using geographical information systems (GIS) techniques.
TL;DR: Loess as discussed by the authors is a nonparametric method for fitting smooth curves to empirical data, which does not require a priori specification of the relationship between the dependent and independent variables.
TL;DR: The authors explored the effects of a number of affective and social variables on foreign language learners' engagement in oral argumentative tasks, and suggested a multi-level construct whereby some independent variables only come into force when certain conditions have been met.
Abstract: This paper reports on a data-based study in which we explored - as part of a larger-scale British-Hungarian research project - the effects of a number of affective and social variables on foreign language (L2) learners’ engagement in oral argumentative tasks. The assumption underlying the investigation was that students’ verbal behaviour in oral task situations is partly determined by a number of non-linguistic and non-cognitive factors whose examination may constitute a potentially fruitful extension of existing task-based research paradigms. The independent variables in the study included various aspects of L2 motivation and several factors characterizing the learner groups the participating students were members of (such as group cohesiveness and intermember relations), as well as the learners’ L2 proficiency and ‘willingness to communicate’ in their L1. The dependent variables involved objective measures of the students’ language output in two oral argumentative tasks (one in the learners’ L1, the other in their L2): the quantity of speech and the number of turns produced by the speakers. The results provide insights into the interrelationship of the multiple variables determining the learners’ task engagement, and suggest a multi-level construct whereby some independent variables only come into force when certain conditions have been met.
TL;DR: In this article, the authors provide the asymptotic distribution for the maximum of the normalized deviations of the estimated coefficient functions away from the true coefficient functions, using this result and the pre-asymptotic substitution idea for estimating biases and variances, simultaneous confidence bands for the underlying coefficient functions are constructed.
Abstract: Regression analysis is one of the most commonly used techniques in statistics. When the dimension of independent variables is high, it is difficult to conduct efficient non- parametric analysis straightforwardly from the data. As an important alternative to the additive and other non-parametric models, varying-coefficient models can reduce the modelling bias and avoid the "curse of dimensionality" significantly. In addition, the coeffi- cient functions can easily be estimated via a simple local regression. Based on local poly- nomial techniques, we provide the asymptotic distribution for the maximum of the normalized deviations of the estimated coefficient functions away from the true coefficient functions. Using this result and the pre-asymptotic substitution idea for estimating biases and variances, simultaneous confidence bands for the underlying coefficient functions are constructed. An important question in the varying coefficient models is whether an estimated coefficient function is statistically significantly different from zero or a constant. Based on newly derived asymptotic theory, a formal procedure is proposed for testing whether a particular parametric form fits a given data set. Simulated and real-data examples are used to illustrate our techniques.
TL;DR: In this article, the authors consider the case of a noisily measured variable with a negative covariance between the measurement error and the true value of the variable and show that the parameter in a univariate regression is bounded between the OLS estimator and an instrumental variables estimator.
Abstract: The bias introduced by errors in the measurement of independent variables has increasingly been a topic of interest among researchers estimating economic parameters. However, studies typically use the assumption of classical measurement error; that is, the variable of interest and its measurement error are uncorrelated, and the expected value of the mismeasured variable is equal to the expected value of the true measure. These assumptions often arise from convenience rather than conviction. When a variable is bounded, it is likely that the measurement error and the true value of the variable are negatively correlated. We consider the case of a noisily measured variable with a negative covariance between the measurement error and the true value of the variable. We show that, asymptotically, the parameter in a univariate regression is bounded between the ordinary least squares (OLS) estimator and an instrumental variables (IV) estimator. Further, we demonstrate that the OLS bound can be improved in...
TL;DR: In this paper, a method for forecasting a value of a dependent variable, such as product demand, in a future time period later than the next, upcoming future time periods is presented.
Abstract: A method for forecasting a value of a dependent variable, such as product demand, in a future time period later than the next, upcoming future time period. The method includes selecting a dependent variable for which a value is to be forecast, gathering historical data on values of the dependent variable and explanatory variables in prior time periods, and determining a forecasting equation based on the gathered historical data. The method includes selecting a future time period that is a number of time periods beyond the next, upcoming time period. The forecasting method continues with calculating a forecasted value of the dependent variable for the selected future time period, then determining an error value by comparing the forecasted value with the historical data and based on the error value, modifying the forecasting equation to reduce the error value. The forecasting equation may be a time series forecasting equation and the determining of the forecasting equation includes initial setting values for included time series forecasting parameters. The modifying of forecast equation then includes adjusting these forecasting parameters to lower or otherwise optimize the error value. Particularly, the method includes selecting an error metric for optimization for the forecasting equation and the adjusting of the parameters is performed as a function of the selected error metric to move it toward an optimal value.
Abstract: The chapters demonstrate two SEM programs with distinct user interfaces and capabilities (Amos and Mplus) with enough specificity that readers can conduct their own analyses without consulting additional resources. Examples from social work literature highlight best practices for the specification, estimation, interpretation, and modification of structural equation models. Oftentimes, confirmatory factor analysis and general structure modeling are the most flexible, powerful, and appropriate choices for social work data.
TL;DR: In this paper, the authors present an improved computational method and system of empirical induction that can be used to arrive at generalized conclusions and make predictions involving longitudinal associations between and among variables and events.
Abstract: The present invention is an improved computational method and system of empirical induction that can be used to arrive at generalized conclusions and make predictions involving longitudinal associations between and among variables and events. Empirical induction is used to gain scientific knowledge, to develop and evaluate treatments and other interventions, and to help make predictions and decisions. The invention, which is distinct from and often complementary to the statistical method, is applied to repeated measures and multiple time-series data and can be used to quantify, discover, analyze, and describe longitudinal associations for individual real and conceptual entities. Major improvements include provisions to define Boolean independent events and Boolean dependent events and to apply analysis parameters such as episode length and episode criterion for both independent and dependent variables, persistence after independent events, and delay and persistence after Boolean independent events.
TL;DR: In this paper, the authors show that using the logarithm of positively skewed dependent variables severely decreases estimates of true latent moderator effects using moderated regression procedures in a Monte Carlo simulation.
Abstract: When gross deviations from parametric assumptions are observed, conventional data transformations are often applied with little regard for substantive theoretical implications. One such transformation involves using the logarithm of positively skewed dependent variables. Log transformations were shown to severely decrease estimates of true moderator effects using moderated regression procedures in a Monte Carlo simulation. Estimates of moderator effect sizes were substantially better estimates of the true latent moderator effect (i.e., larger by a multiple of 2.6 to 534) when estimated using a simple percentile bootstrap procedure in the original, positively skewed data. Conclusions with regard to the presence or absence of a true moderator effect using a simple bootstrap procedure were unaffected by the violation of parametric assumptions in the original, positively skewed data. In contrast, moderated regression analysis performed on a log-transformed dependent variable severely increased Type-II error. ...
TL;DR: Several variations of a new genetic learning algorithm (GLOWER) are described and evaluated, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.
Abstract: Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule learning algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment with GLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.
TL;DR: In this article, a method and computer-based apparatus for monitoring the degradation of, predicting the remaining service life of, and/or planning maintenance for, an operating system are disclosed.
Abstract: A method and computer-based apparatus for monitoring the degradation of, predicting the remaining service life of, and/or planning maintenance for, an operating system are disclosed. Diagnostic information on degradation of the operating system is obtained through measurement of one or more performance characteristics by one or more sensors onboard and/or proximate the operating system. Though not required, it is preferred that the sensor data are validated to improve the accuracy and reliability of the service life predictions. The condition or degree of degradation of the operating system is presented to a user by way of one or more calculated, numeric degradation figures of merit that are trended against one or more independent variables using one or more mathematical techniques. Furthermore, more than one trendline and uncertainty interval may be generated for a given degradation figure of merit/independent variable data set. The trendline(s) and uncertainty interval(s) are subsequently compared to one or more degradation figure of merit thresholds to predict the remaining service life of the operating system. The present invention enables multiple mathematical approaches in determining which trendline(s) to use to provide the best estimate of the remaining service life.
TL;DR: In this paper, a general approach to defining variance explained in latent dependent variables of non-recursive linear structural equation models is presented, which can be easily implemented in EQS or LISREL.
Abstract: Whereas measures of explained variance in a regression and an equation of a recursive structural equation model can be simply summarized by a standard R2 measure, this is not possible in nonrecursive models in which there are reciprocal interdependencies among variables. This article provides a general approach to defining variance explained in latent dependent variables of nonrecursive linear structural equation models. A new method of its estimation, easily implemented in EQS or LISREL and available in EQS 6, is described and illustrated.
TL;DR: In this article, a central limit theorem for a triangular array of m-dependent random variables is presented, where m may tend to infinity with the row index at a certain rate.
TL;DR: This paper developed and tested a model in which the Compensation of Outside Directors is significantly related to Director Effort, External Monitoring, Internal Referents and Firm Performance, after controlling for Firm Size and Inside Ownership.
Abstract: Using data on 200 large U.S. corporations in 1996, this study develops and tests a model in which the Compensation of Outside Directors is significantly related to Director Effort, External Monitoring, Internal Referents and Firm Performance, after controlling for Firm Size and Inside Ownership. There is some support for each set of hypotheses relating to the different independent variables in the model.
TL;DR: In this article, a new inequality of the Ostrowski type in three independent variables is established, and the discrete analogue of the main result is also given, which is the same as the one in this paper.
TL;DR: In this paper, the ability of econometric land use models to accurately forecast forest area is evaluated using a panel data set for Alabama consisting of county and time-series observation for the period 1964 to 1992.
Abstract: Predictions of future forestland area are an important component of forest policy analyses. In this article, we test the ability of econometric land use models to accurately forecast forest area. We construct a panel data set for Alabama consisting of county and time-series observation for the period 1964 to 1992. We estimate models using restricted data sets-namely, data from early periods-and use out-of-sample values of dependent and independent variables to construct precise tests of the model's forecasting accuracy. Three model specifications are examined: ordinary least squares, dummy variables (fixed effects), and error components (random effects). We find that the dummy variables model produces more accurate forecasts at the county and state level than the other model specifications. This result is related to the ability of the dummy variables model to more completely control for cross-sectional variation in the dependent variables. This suggests that the estimated model parameters better capture the temporal relationship between forest area and economic variables.
TL;DR: In this paper, three dependent variables are used: the number of parties, the electoral support for the leading party, and the effective number of candidates in a free party system to test the assumption that size affects party system fragmentation.
Abstract: The present study tests the assumption that size affects party system fragmentation. Three dependent variables are used: the number of parties, the electoral support for the leading party, and the effective number of parties. The study operates on two levels. On the macro level, the research population consists of 77 countries with free party systems. On the micro level, local units in Great Britain and Finland constitute the object of research. The impact of the following intervening variables is controlled for: the effective threshold, presidentialism, socioeconomic diversification, and ethnic and religious diversity. On the macro level, the results show that size contains far more explanatory power than any other variable. This holds true for countries using a plurality electoral system as well as those using a proportional electoral system. On the micro level, there is a strong association between the size of the population and the number of parties, whereas the other dependent variables are insensiti...
TL;DR: In this paper, it is suggested that in typical experimental circumstances in which systematic errors are significant, the common practice of organizing the set point order of independent variables to maximize data acquisition rate results in a test matrix that fails to produce the highest quality research result, and it is possible to accept a lower rate of data acquisition and still produce results of higher technical quality with less cost and in less time than conventional test procedures, simply by optimizing the sequence in which independent variable levels are set.
Abstract: This paper illustrates how, in the presence of systematic error, the quality of an experimental result can be influenced by the order in which the independent variables are set. It is suggested that in typical experimental circumstances in which systematic errors are significant, the common practice of organizing the set point order of independent variables to maximize data acquisition rate results in a test matrix that fails to produce the highest quality research result. With some care to match the volume of data required to satisfy inference error risk tolerances, it is possible to accept a lower rate of data acquisition and still produce results of higher technical quality (lower experimental error) with less cost and in less time than conventional test procedures, simply by optimizing the sequence in which independent variable levels are set.
TL;DR: In this article, the authors proposed a new class of wavelet-based tests for serial correlation of unknown form in the estimated residuals of an error component model, where the error components can be one-way or two-way, the individual and time effects can be fixed or random, the regressors may contain lagged dependent variables or deterministic/stochastic trending variables.
Abstract: Wavelet analysis is a new mathematical tool developed as a unified field of science over the last decade. As spatially adaptive analytic tools, wavelets are useful for capturing serial correlation where the spectrum has peaks or kinks, as can arise from persistent/strong dependence, seasonality or use of seasonal data such as quarterly and monthly data, business cycles, and other kinds of periodicity. This paper proposes a new class of wavelet-based tests for serial correlation of unknown form in the estimated residuals of an error component model, where the error components can be one-way or two-way, the individual and time effects can be fixed or random, the regressors may contain lagged dependent variables or deterministic/stochastic trending variables. The proposed tests are applicable to unbalanced heterogeneous panel data. They have a convenient null limit N (0,1) distribution. No formulation of an alternative is required, and the tests are consistent against serial correlation of unknown form. We propose and justify a data-driven finest scale, in an automatic manner, converges to zero under the null hypothesis of no serial correlation and grows to infinity as the sample size increases under the alternative, ensuring the consistency of the proposed tests. Simulation studies show that the new tests perform rather well in small and finite samples in comparison with some existing popular tests for panel models and can be used as an effective evaluation procedure for panel models.
TL;DR: This paper analyzed the sensitivity of the results to specification of the dependent variable and found that misspecification bias from modeling discrete data with continuous distributions is important, and that the results are sensitive across specifications.
Abstract: Previous studies have drawn a theoretical and empirical connection between foreign direct investment (FDI) and exchange rates using continuous measures of FDI. However, FDI data are often in discrete count form. I take a representative study of the FDI/exchange rate relationship by Jose M. Campa (1993), and I analyze the sensitivity of the results to specification of the dependent variable. Whereas Campa uses a Tobit specification, I use a count data specification to model counts of FDI occurrences. Using data on FDI in the United States from 1982 to 1993, controlling for the traditional determinants of FDI, I find that the results are sensitive across specifications. Significance levels and the magnitude of the coefficients change when going from a continuous Tobit specification to a zero inflated Poisson (ZIP) model designed for count data. Formal statistical testing finds that the ZIP specification likely models the data most properly. Thus, I indicate that misspecification bias from modeling discrete data with continuous distributions is important.
TL;DR: The Problem Design of the Model and the Contributing-factor Diagram The Risk Model Code Results from Model Execution So, So What?
Abstract: INTRODUCTION Scope Realism Models, Validation, and Precision Value TWO APPROACHES TO SOLVING DECISION TREES-A CLASS-ACTION SUIT EXAMPLE Introduction Building the Decision Tree What is the Question? Interpretation of the Probabilistic-Branching Model So, So What? TERRORISM RISK MODELS-RELATIVE AND ABSOLUTE RISK Terrorism Relative-Risk Model What is the Question? Building the Contributing-Factor Diagram for the Relative-Ranking Terrorist-Threat Risk Model Category Weights Relative-Risk Model Equations Relative-Risk Model Applied to Terrorist Organization #1 Relative Risk Model Results from Evaluation of Terrorist Organization #1 Relative-Risk Model Applied to Terrorist Organization #2 Relative-Risk Model Results from Evaluation of Terrorist Organization #2 Comparison of the Two Terrorist Organizations Building the Terrorism Absolute-Cost Risk Model Absolute-Cost Risk Model Equations Application of the Absolute-Cost Risk Model to Terrorist Organization #2 Absolute-Cost Risk Model Results for Evaluation of Terrorist Organization #2 So, So What? GATHERING INFORMATION CONSISTENTLY IN AN INCONSISTENT WORLD Introduction The Problem The Solution So, So What? NEW MANUFACTURING FACILITY- BUSINESS-JUSTIFICATION MODEL Introduction What is the Question? Construction of the Contributing-Factor Diagram Populating the Model with Data Risk Model Equations Results from Model Execution So, So What? OIL-FIELD-DEVELOPMENT INVESTMENT-OPPORTUNITY RISK MODEL Introduction What is the Question? Categories and Variables Field-Development Risk Model Equations Populating the Model with Data Results from Model Execution So, So What? USING CHANCE OF FAILURE AND RISK-WEIGHTED VALUES TO REFLECT THE EFFECT OF "SOFT" ISSUES ON THE VALUE OF AN OPPORTUNITY Introduction Accurate Estimates of Value are Essential Types of Chance of Failure How to Express and use Chance of Failure Risk-Weighted Values and the Value of a Portfolio Element Value of a Portfolio Composed of Dissimilar Elements So, So What? PRODUCTION-SHARING AGREEMENT RISK MODEL Introduction What is the Question? Building the Contributing-Factor Diagram Risk Model Equations Populating the Model with Technical Data Chances of Abject Failure Populating the Model with Financial Data Results from the Model So, So What? SCHEDULING AND OPTIMIZATION RISK MODEL Introduction The Problem Design of the Model and the Contributing-factor Diagram The Risk Model Code Results from Model Execution So, So What? DECISION/OPTION-SELECTION RISK MODEL Introduction The Current Situation The Problem Results from Model Execution So, So What? RISK PROCESS TO IDENTIFY BUSINESS DRIVERS, MAXIMIZE VALUE, AND DETERMINE THE VALUE OF POTENTIAL EXPENDITURES Introduction The Problem The Risk/Uncertainty Model Populating the Model with Data Results from Model Execution Determining Business Drivers and Maximizing Value Determining the Value of New Information/Services So, So What? SUMMARY Other Applications It is Mostly the Process - Not the Technology Accomplishment of Vision Generates Real Returns Exploration Example Maintenance/Construction Example BUILDING A CONSENSUS MODEL What is the Question? - Most of the Time and Effort Consensus Model Group Dynamics Write it Down Sort it Out Group Dynamics Again Units Overarching Categories BUILDING A CONTRIBUTING-FACTOR DIAGRAM The Contributing-Factor Diagram - Getting Started Identify and Define Variables Ask the Right Question Double-Dipping Double-Dipping and Counting the Chickens Fixing the Double-Dipping and Counting of Chickens Problem CFD-Building Example Short List of Hints for Building a CFD MONTE CARLO ANALYSIS A Bit of History For What is it Good? Simple Monte Carlo Example How Many Random Comparisons are Enough? Output from Monte Carlo Analysis - The Frequency and Cumulative Frequency Plots Interpreting Cumulative Frequency Plots Combining Monte Carlo-Output Curves DECISIONS AND DISTRIBUTIONS Decisions Just what is a Distribution? Distribution - How to Approach Them Symmetrical Distributions Skewed Distribution Spike Distributions Flat Distributions Truncated Distributions Discrete Distributions Bimodal Distributions Reading Data from a File Peakedness "Specific" Distribution Types CHANCE OF FAILURE Chance of Failue - What it It? Failure of a Risk Component Chance of Failure that does no affect and Input Distribution Incorporating chance of Failure in a Plot of Cumulative Frequency Another Reason for chance of Failure The "Inserting 0s Work Around" COF and Multiple Output Variables TIME SERIES ANALYSIS AND DEPENDENCE Introduction to Time-Series Analysis and Dependence Time-Series Analysis - Why? Time-Series Analysis - How? Time-Series Analysis -Results Some Things to Consider Dependence - What is It? Independent and Dependent Variables Degree of Dependence Multiple Dependencies and Circular Dependence Effect of Dependence on Monte Carlo Output Dependence - It's Ubiquitous RISK-WEIGHTED VALUES AND SENSITIVITY ANALYSIS Introduction to Risk-Weighted Values and Sensitivity Analysis Risk-Weighted Values - Why? Risk-Weighted Values - How? The Net Risk-Weighted Value The Economic Risk-Weighed Resource (ERWR) Value Risk-Weighted Values - The Answer Sensitivity Analysis - Why? Sensitivity Analysis - How? Sensitivity Analysis - Results
TL;DR: This paper used logistic regression to construct a one-quarter ahead prediction model for classical business cycle regimes in the UK, using simple mechanical rules to date turning points in quarterly real GDP data from 1963 to 1999.
Abstract: This paper uses logistic regression to construct a one-quarter ahead prediction model for classical business cycle regimes in the UK. The binary dependent variable is obtained by applying simple mechanical rules to date turning points in quarterly real GDP data from 1963 to 1999. Using a range of real and financial leading indicators, several parsimonious one-quarter-ahead models are developed for the GDP regimes, with model selection based on the SIC criterion. A real M4 variable is consistently found to have predictive content. One model that performs well combines this with nominal UK and German short-term interest rates. The role of the latter emphasises the open nature of the UK economy.
TL;DR: A new approach that applies the rough set theory to form a forecasting model for sightseeing expenditures in Hong Kong revealed that the forecasting model can classify 94.1% of the testing cases, and that 87.5%" of the classified cases were identical to their actual counterparts.
Abstract: The existing tourism demand forecasting models in tourism are unable to capture useful information from a database with numeric and nonnumeric data. This article presents a new approach that applies the rough set theory to form a forecasting model for sightseeing expenditures in Hong Kong. The rough set theory deals with the classificatory analysis of imprecise, uncertain, or incomplete knowledge (data) by incorporating the classical set theory. Based on officially published tourist sightseeing data, decision rules are generated to represent the relationships between the independent variables and the dependent variable. Experimental results revealed that the forecasting model can classify 94.1% of the testing cases, and that 87.5% of the classified cases were identical to their actual counterparts. There was no significant difference between the actual values and the forecast values. The advantages of using decision rules induced by rough set to forecast sightseeing expenditure were also offered.
TL;DR: In this paper, the authors argue that the results of earlier studies should be used according to structured and not overly arbitrary criteria for selecting which variables to test as well as for their subsequent acceptance or rejection, and an explanatory analysis of Singapore's condominium market is used as an empirical illustration of the decision rule for variable selection proposed in the methodological part of the paper.
Abstract: Various approaches to hypothesis testing have been used in the past for the purpose of estimating hedonic price equations The criteria for testing and rejecting explanatory variables have however rarely been made explicit This paper argues that the results of earlier studies should be used according to structured and not overly arbitrary criteria for selecting which variables to test as well as for their subsequent acceptance or rejection An explanatory analysis of Singapore's condominium market is used as an empirical illustration of the decision rule for variable selection proposed in the methodological part of the paper
TL;DR: In this paper, the authors show that an econometric model can be causal only if the interpretations given to its coefficients are consistent with these realities and present a numerically stable algorithm for estimating such a model subject to equality and inequality constraints on the model parameters.
Abstract: This paper states four realities of econometric model building and shows that an econometric model can be causal only if the interpretations given to its coefficients are consistent with these realities. A numerically stable algorithm for estimating such a model subject to equality and inequality constraints on the model parameters is presented. This algorithm is designed in such a way that it can be applied even when the matrix of observations on the model's independent variables and the covariance matrix of the model's errors are deficient in rank.
TL;DR: Sliced Inverse Regression (SIR) as mentioned in this paper identifies appropriate factors through simple statistical tests for determining the number of factors to retain and for assessing the significance of factorloading coefficients.
Abstract: In data-rich marketing environments (e.g., direct marketing or new product design), managers face an ever-growing need to reduce the number of variables effectively. To accomplish this goal, the authors introduce a new method called sliced inverse regression (SIR), which finds factors by taking into account the information contained in both the dependent and independent variables. Sliced inverse regression objectively identifies appropriate factors through simple statistical tests for determining the number of factors to retain and for assessing the significance of factor-loading coefficients. The authors make conceptual connections between SIR and several existing approaches, including principal components regression (PCR) and partial least squares regression (PLSR). Using Monte Carlo experiments, the authors demonstrate that SIR performs better than these approaches. Two empirical examples—designing a new executive business program and direct marketing by a catalog company—are presented to illu...
TL;DR: In this article, the authors used log-multiplicative association models as latent variable models for discrete variables and showed that these models have desirable properties, including having schematic or graphical representations of the system of observed and unobserved variables, the log-multi...
Abstract: Associations between multiple discrete measures are often due to collapsing over other variables. When the variables collapsed over are unobserved and continuous, log-multiplicative association models, including log-linear models with linear-by-linear interactions for ordinal categorical data and extensions of Goodman's (1979, 1985) RC(M) association model for multiple nominal and/or ordinal categorical variables, can be used to study the relationship between the observed discrete variables and the unobserved continuous ones, and to study the unobserved variables. The derivation and use of log-multiplicative association models as latent variable models for discrete variables are presented in this paper. The models are based on graphical models for discrete and continuous variables where the variables follow a conditional Gaussian distribution. The models have many desirable properties, including having schematic or graphical representations of the system of observed and unobserved variables, the log-multi...