Aggregation Among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data
TL;DR: In fact, only a single theoretical process exists for which known statistical methods can estimate the same parameters as mentioned in this paper, and it is generally used only for count and duration data, and not for binary, count, or duration data.
read more
Abstract: Binary, count, and duration data all code discrete events occurring at points in time. Although a single data generation process can produce all of these three data types, the statistical literature is not very helpful in providing methods to estimate parameters of the same process from each. In fact, only a single theoretical process exists for which known statistical methods can estimate the same parameters—and it is generally used only for count and duration data. The result is that seemingly trivial decisions about which level of data to use can have important consequences for substantive interpretations. We describe the theoretical event process for which results exist, based on time independence. We also derive a set of models for a time-dependent process and compare their predictions to those of a commonly used model. Any hope of understanding and avoiding the more serious problems of aggregation bias in events data is contingent on first deriving a much wider arsenal of statistical models and theoretical processes that are not constrained by the particular forms of data that happen to be available. We discuss these issues and suggest an agenda for political methodologists interested in this very large class of aggregation problems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference
TL;DR: A unified approach is proposed that makes it possible for researchers to preprocess data with matching and then to apply the best parametric techniques they would have used anyway and this procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.
Back to the Future: Modeling Time Dependence in Binary Data
TL;DR: Monte Carlo analysis demonstrates that, for the types of hazards one often sees in substantive research, the polynomial approximation always outperforms time dummies and generally performs as well as splines or even more flexible autosmoothing procedures.
Diffusion and the International Context of Democratization
TL;DR: The authors argue that democracy often comes about as a result of changes in the relative power of important actors and groups as well as their evaluations of particular institutions, both of which are often influenced by forces outside the country in question.
A Lot More to Do: The Sensitivity of Time-Series Cross-Section Analyses to Simple Alternative Specifications
Sven E. Wilson,Daniel M. Butler +1 more
TL;DR: The 195 papers reviewed show a widespread failure to diagnose and treat common problems of time-series, cross-section (TSCS) data analysis and a lot more to do with TSCS data than many researchers have apparently assumed.
427
Throwing Out the Baby with the Bath Water: A Comment on Green, Kim, and Yoon
Nathaniel Beck,Jonathan N. Katz +1 more
TL;DR: Green, Kim, and Yoon as mentioned in this paper show that fixed effects are pernicious for IR time-series cross-section models with a binary dependent variable and that they are often problematic for IR models with continuous dependent variable.
References
The Statistical Analysis of Failure Time Data.
Abstract: Preface.1. Introduction.1.1 Failure Time Data.1.2 Failure Time Distributions.1.3 Time Origins, Censoring, and Truncation.1.4 Estimation of the Survivor Function.1.5 Comparison of Survival Curves.1.6 Generalizations to Accommodate Delayed Entry.1.7 Counting Process Notation.Bibliographic Notes.Exercises and Complements.2. Failure Time Models.2.1 Introduction.2.2 Some Continuous Parametric Failure Time Models.2.3 Regression Models.2.4 Discrete Failure Time Models.Bibliographic Notes.Exercises and Complements.3. Inference in Parametric Models and Related Topics.3.1 Introduction.3.2 Censoring Mechanisms.3.3 Censored Samples from an Exponential Distribution.3.4 Large-Sample Likelihood Theory.3.5 Exponential Regression.3.6 Estimation in Log-Linear Regression Models.3.7 Illustrations in More Complex Data Sets.3.8 Discrimination Among Parametric Models.3.9 Inference with Interval Censoring.3.10 Discussion.Bibliographic Notes.Exercises and Complements.4. Relative Risk (Cox) Regression Models.4.1 Introduction.4.2 Estimation of beta.4.3 Estimation of the Baseline Hazard or Survivor Function.4.4 Inclusion of Strata.4.5 Illustrations.4.6 Counting Process Formulas. 4.7 Related Topics on the Cox Model.4.8 Sampling from Discrete Models.Bibliographic Notes.Exercises and Complements.5. Counting Processes and Asymptotic Theory.5.1 Introduction.5.2 Counting Processes and Intensity Functions.5.3 Martingales.5.4 Vector-Valued Martingales.5.5 Martingale Central Limit Theorem.5.6 Asymptotics Associated with Chapter 1.5.7 Asymptotic Results for the Cox Model.5.8 Asymptotic Results for Parametric Models.5.9 Efficiency of the Cox Model Estimator.5.10 Partial Likelihood Filtration.Bibliographic Notes.Exercises and Complements.6. Likelihood Construction and Further Results.6.1 Introduction.6.2 Likelihood Construction in Parametric Models.6.3 Time-Dependent Covariates and Further Remarks on Likelihood Construction.6.4 Time Dependence in the Relative Risk Model.6.5 Nonnested Conditioning Events.6.6 Residuals and Model Checking for the Cox Model.Bibliographic Notes.Exercises and Complements.7. Rank Regression and the Accelerated Failure Time Model.7.1 Introduction.7.2 Linear Rank Tests.7.3 Development and Properties of Linear Rank Tests.7.4 Estimation in the Accelerated Failure Time Model.7.5 Some Related Regression Models.Bibliographic Notes.Exercises and Complements.8. Competing Risks and Multistate Models.8.1 Introduction.8.2 Competing Risks.8.3 Life-History Processes.Bibliographic Notes.Exercises and Complements.9. Modeling and Analysis of Recurrent Event Data.9.1 Introduction.9.2 Intensity Processes for Recurrent Events.9.3 Overall Intensity Process Modeling and Estimation.9.4 Mean Process Modeling and Estimation.9.5 Conditioning on Aspects of the Counting Process History.Bibliographic Notes.Exercises and Complements.10. Analysis of Correlated Failure Time Data.10.1 Introduction.10.2 Regression Models for Correlated Failure Time Data.10.3 Representation and Estimation of the Bivariate Survivor Function.10.4 Pairwise Dependency Estimation.10.5 Illustration: Australian Twin Data.10.6 Approaches to Nonparametric Estimation of the Bivariate Survivor Function.10.7 Survivor Function Estimation in Higher Dimensions.Bibliographic Notes.Exercises and Complements.11. Additional Failure Time Data Topics.11.1 Introduction.11.2 Stratified Bivariate Failure Time Analysis.11.3 Fixed Study Period Survival Studies.11.4 Cohort Sampling and Case-Control Studies.11.5 Missing Covariate Data.11.6 Mismeasured Covariate Data.11.7 Sequential Testing with Failure Time Endpoints.11.8 Bayesian Analysis of the Proportional Hazards Model.11.9 Some Analyses of a Particular Data Set.Bibliographic Notes.Exercises and Complements.Glossary of Notation.Appendix A: Some Sets of Data.Appendix B: Supporting Technical Material.Bibliography.Author Index.Subject Index.
11.1K
•Book
Regression Analysis of Count Data
A. Colin Cameron,Pravin K. Trivedi +1 more
- 28 Sep 1998
TL;DR: The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences.
6.2K
The Statistical Analysis of Failure Time Data
TL;DR: This book complements the other references well, and merits a place on the bookshelf of anyone concerned with the analysis of lifetime data from any eld.
5.3K