A Unified Inference Framework for Multiple Imputation Using Martingales

Q: What are different causal estimands?

Different causal estimands refer to various ways of measuring causal effects in a population. They can include average causal effects over a subset of the population, such as the average causal effect on the treated. Nonlinear causal estimands are also considered, such as the log of the causal risk ratio and the log of the causal odds ratio. These estimands can be estimated using commonly-used estimators, denoted by E{Y (a)}, for a = 0, 1. By linearizing these estimators, a similar linear form as (2.1) can be established, which serves as the basis for constructing the weighted bootstrap inference.

Q: What is missingness not at random (MNAR)?

Missingness not at random (MNAR) occurs when the missing pattern depends on the missing values themselves, even after controlling for observed data. This scenario is common for sensitive questions like alcohol consumption and income. Causal inference with MNAR data is challenging as the full data distribution and ACE are not identifiable. Identification conditions are required to ensure the full data distribution's identifiability. For example, Yang et al. (2019) proposed an outcome-independent missingness mechanism, assuming the missingness pattern is independent of the outcome given treatment and confounders. This allows for modifications in the posterior predictive distribution of X (j) R i,i and wild bootstrap steps after imputation.

Q: How does the framework handle missingness in outcome and confounders?

The framework can accommodate scenarios where both the outcome and confounders are subject to missingness by adding an outcome imputation step in the MI procedure. An additional missingness indicator R Y is introduced, indicating whether Y is observed or not. In Step MI-1, th* is generated from the posterior distribution p(th|Z obs). For units with R Y=1, X* is generated from f(X R Xi,i, | A i, X R Xi,i, Y i, R i, R Y i=1; th*), and for units with R Y=0, X* and Y* are generated from f(X R Xi,i, Y i | A i, X R Xi,i, R Xi, R Y i=0; th*). The MI estimator is then written in a general form with both imputed outcome and confounders. The martingale difference arrays in the wild bootstrap procedure are also adjusted accordingly. This approach allows the framework to handle missingness in both outcome and confounders under MAR and MNAR assumptions.

Q: What are the simulation study results for different full sample estimators in MI inference?

The simulation study evaluates the finite sample performance of the proposed inference when MI adopts different full sample estimators including the outcome regression, IPW, AIPW, and matching estimators. The study generates the treatment indicator A from a Bernoulli distribution and assumes X [2] is missing at random with a missingness rate of about 45%. The inference procedure assumes the correct missingness mechanism, and the standard MI inference and proposed bootstrap inference are compared. The proposed bootstrap procedure uses B = 1,000 and generates weights u k from Mammen's two-point distribution. The study finds that the proposed variance estimator is unbiased for all four ACE estimators and is not sensitive to the number of imputations m and the choice of quantile-based or Wald-type confidence interval. However, when the true missing data mechanism is missingness not at random, the MI point estimator has large biases and all the confidence intervals have poor coverage rates. The study also compares the performance of the proposed method with other methods developed for multiple imputation inference, such as the doubling variance approach and likelihood ratio based procedure. The proposed method outperforms Rubin's method in terms of accuracy and coverage rate of confidence intervals.

Q: What is the purpose of introducing X* in the dataset?

The purpose of introducing X* in the dataset is to link the recorded truncated family poverty ratio values to the full continuous space. X* represents a latent variable that includes the truncated family poverty ratio variable X* (X* R X | A, X R X , Y, R X ; th * (j)). It allows for a more comprehensive analysis by considering the full range of the continuous variable, rather than just the truncated values. By introducing X*, researchers can better understand the relationship between family poverty ratio and other variables in the dataset, leading to more accurate and insightful results.

Question

1. What is the influence function in causal inference?

2. What is the ACE and how is it estimated in the context of estimating average causal effects?

3. What is the purpose of multiple imputation in handling missing data?

4. How does missingness at random (MAR) affect ACE estimation?

Accepted Answer

The influence function, denoted as ps(L i ), captures the first-order asymptotic behavior of an estimator tn in causal inference. It is a generic estimator of the target parameter t and is expressed as tn - t = 1 n n i=1 ps(L i ) + o P (n -1/2 ). The influence function is essential for characterizing the asymptotic distribution of an estimator and constructing confidence intervals for the target parameter. It is a key component in understanding the behavior of estimators and their accuracy in estimating causal effects.

Accepted Answer

The Average Causal Effect (ACE) is the difference in outcomes between the treatment and control groups, represented as E{Y (1) - Y (0)}. In the context of estimating average causal effects, the ACE is estimated using various estimators such as outcome regression, augmented/inverse probability weighting (AIPW/IPW), or matching. These estimators require correct specifications of the outcome model and propensity score model. The outcome regression estimator uses the difference in outcome means for treatment and control groups, while the IPW estimator uses the weighted average of outcomes based on the propensity score. The AIPW estimator combines the IPW estimator with the outcome regression estimator. Matching estimators impute potential outcomes based on nearest neighbors in the opposite treatment group. These estimators are asymptotically linear and their influence functions are given in the supplementary material. Assumptions such as the correct specification of the outcome model and propensity score model are crucial for the identification of the ACE using these estimators.

Accepted Answer

Multiple imputation creates multiple complete data sets by filling in missing values with imputed values generated from the posterior predictive distribution. This allows for applying a full sample estimator to each imputed data set, facilitating the calculation of the full sample estimator. Rubin's combining rule is then used to summarize the results from the multiple imputed data sets, providing an MI estimator and variance estimator for the full sample estimator. This approach is particularly useful when dealing with missing values in a dataset, as it enables researchers to estimate parameters and conduct statistical analyses despite the presence of missing data.

Accepted Answer

Missingness at random (MAR) affects ACE estimation by requiring additional assumptions. Under MAR, the observed data capture all information related to missingness. Assumption 3 states that X R X R X | Z obs holds. This assumption ensures that the observed data provide sufficient information about the missing values. When applying MAR, the full sample estimators (2.3)-(2.6) are not feasible to calculate. Instead, the estimation of ACE requires further assumptions. Following the empirical literature, the MAR assumption is imposed. This assumption allows for the estimation of ACE by considering the observed and missing parts of X, denoted as X R X and X R X, respectively. The estimation process involves comparing the observed and missing data to derive the ACE. By imposing the MAR assumption, researchers can account for missing values in X and estimate the ACE accurately.

Accepted Answer

The wild bootstrap procedure is a method proposed by Wu (1986) and Liu (1988) to estimate the variance of tMI. It involves two steps: Step 1 - Sample u k for k = 1, ..., n + nm such that E(u k | Z obs) = 0, E(u 2 k | Z obs) = 1, and E(u 4 k | Z obs) < . Step 2 - Compute the bootstrap replicate T * = n -1/2 n+nm k=1 xn,k u k, where xn,k = 1/2 E{ps(L i ) | Z obs , th} + GI -1 obs S( th; Z obs,i ) if k = i, and xn,k = 1/2 m ps(L * (j) i ) - E{ps(L i ) | Z obs , th} if k = n + (i - 1)m + j. The wild bootstrap procedure is not sensitive to the choice of the sampling distribution of u k, and it can be used with various distributions such as the standard normal distribution, Mammen's two-point distribution, or the Poisson distribution. The procedure is valid for estimating the variance of tMI, as shown in Theorem 1 with regularity assumptions.

Accepted Answer

Different causal estimands refer to various ways of measuring causal effects in a population. They can include average causal effects over a subset of the population, such as the average causal effect on the treated. Nonlinear causal estimands are also considered, such as the log of the causal risk ratio and the log of the causal odds ratio. These estimands can be estimated using commonly-used estimators, denoted by E{Y (a)}, for a = 0, 1. By linearizing these estimators, a similar linear form as (2.1) can be established, which serves as the basis for constructing the weighted bootstrap inference.

Accepted Answer

Missingness not at random (MNAR) occurs when the missing pattern depends on the missing values themselves, even after controlling for observed data. This scenario is common for sensitive questions like alcohol consumption and income. Causal inference with MNAR data is challenging as the full data distribution and ACE are not identifiable. Identification conditions are required to ensure the full data distribution's identifiability. For example, Yang et al. (2019) proposed an outcome-independent missingness mechanism, assuming the missingness pattern is independent of the outcome given treatment and confounders. This allows for modifications in the posterior predictive distribution of X (j) R i,i and wild bootstrap steps after imputation.

Accepted Answer

The framework can accommodate scenarios where both the outcome and confounders are subject to missingness by adding an outcome imputation step in the MI procedure. An additional missingness indicator R Y is introduced, indicating whether Y is observed or not. In Step MI-1, th* is generated from the posterior distribution p(th|Z obs). For units with R Y=1, X* is generated from f(X R Xi,i, | A i, X R Xi,i, Y i, R i, R Y i=1; th*), and for units with R Y=0, X* and Y* are generated from f(X R Xi,i, Y i | A i, X R Xi,i, R Xi, R Y i=0; th*). The MI estimator is then written in a general form with both imputed outcome and confounders. The martingale difference arrays in the wild bootstrap procedure are also adjusted accordingly. This approach allows the framework to handle missingness in both outcome and confounders under MAR and MNAR assumptions.

Accepted Answer

The simulation study evaluates the finite sample performance of the proposed inference when MI adopts different full sample estimators including the outcome regression, IPW, AIPW, and matching estimators. The study generates the treatment indicator A from a Bernoulli distribution and assumes X [2] is missing at random with a missingness rate of about 45%. The inference procedure assumes the correct missingness mechanism, and the standard MI inference and proposed bootstrap inference are compared. The proposed bootstrap procedure uses B = 1,000 and generates weights u k from Mammen's two-point distribution. The study finds that the proposed variance estimator is unbiased for all four ACE estimators and is not sensitive to the number of imputations m and the choice of quantile-based or Wald-type confidence interval. However, when the true missing data mechanism is missingness not at random, the MI point estimator has large biases and all the confidence intervals have poor coverage rates. The study also compares the performance of the proposed method with other methods developed for multiple imputation inference, such as the doubling variance approach and likelihood ratio based procedure. The proposed method outperforms Rubin's method in terms of accuracy and coverage rate of confidence intervals.

Accepted Answer

The purpose of introducing X* in the dataset is to link the recorded truncated family poverty ratio values to the full continuous space. X* represents a latent variable that includes the truncated family poverty ratio variable X* (X* R X | A, X R X , Y, R X ; th * (j)). It allows for a more comprehensive analysis by considering the full range of the continuous variable, rather than just the truncated values. By introducing X*, researchers can better understand the relationship between family poverty ratio and other variables in the dataset, leading to more accurate and insightful results.

Accepted Answer

Multiple imputation can be extended for complex sampling by incorporating approximate Bayesian computation techniques. Kim and Yang (2017) and Wang et al. (2018) proposed this approach, which can be used for multiple imputation in complex sampling scenarios. This extension would involve adapting the martingale representation to handle complex sampling, allowing for more accurate imputation in such cases. Future research can explore this extension to enhance the framework's applicability and effectiveness in complex sampling situations.

A Unified Inference Framework for Multiple Imputation Using Martingales

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What is the influence function in causal inference?

2. What is the ACE and how is it estimated in the context of estimating average causal effects?

3. What is the purpose of multiple imputation in handling missing data?

4. How does missingness at random (MAR) affect ACE estimation?

5. What is the wild bootstrap procedure for estimating the variance of tMI?

6. What are different causal estimands?

7. What is missingness not at random (MNAR)?

8. How does the framework handle missingness in outcome and confounders?

9. What are the simulation study results for different full sample estimators in MI inference?

10. What is the purpose of introducing X* in the dataset?

11. How can multiple imputation be extended for complex sampling?

Citations

Distributional imputation for the analysis of censored recurrent events

References

Multiple imputation for nonresponse in surveys

Estimating causal effects of treatments in randomized and nonrandomized studies.

Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme

Matching Methods for Causal Inference: A Review and a Look Forward

Martingale Limit Theory and Its Application

Related Papers (5)

Evaluating the inference mechanism of adaptive learning systems

Imputation approaches for potential outcomes in causal inference

Searching distributed collections with inference networks

Inference with Imputed Conditional Means

Missing data imputation and haplotype phase inference for genome-wide association studies