Doubly robust nonparametric multiple imputation for ignorable missing data

Qi Long, Chiu-Hsieh Hsu, Yisheng Li

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.

Original languageEnglish (US)
Pages (from-to)149-172
Number of pages24
JournalStatistica Sinica
Volume22
Issue number1
DOIs
StatePublished - Jan 2012

Fingerprint

Multiple Imputation
Missing Data
Missing Values
Covariates
Estimator
Missing Observations
Missing at Random
Model
Misspecification
Consistent Estimator
Imputation
Robust Methods
Social Sciences
Dimension Reduction
Asymptotic Properties
Higher Dimensions
Weighting
Sensitivity Analysis
Data analysis
Choose

Keywords

  • Doubly robust
  • Missing at random
  • Multiple imputation
  • Nearest neighbor
  • Nonparametric imputation
  • Sensitivity analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Doubly robust nonparametric multiple imputation for ignorable missing data. / Long, Qi; Hsu, Chiu-Hsieh; Li, Yisheng.

In: Statistica Sinica, Vol. 22, No. 1, 01.2012, p. 149-172.

Research output: Contribution to journalArticle

@article{e053c06e5200450faa831ee82dcb60db,
title = "Doubly robust nonparametric multiple imputation for ignorable missing data",
abstract = "Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.",
keywords = "Doubly robust, Missing at random, Multiple imputation, Nearest neighbor, Nonparametric imputation, Sensitivity analysis",
author = "Qi Long and Chiu-Hsieh Hsu and Yisheng Li",
year = "2012",
month = "1",
doi = "10.5705/ss.2010.069",
language = "English (US)",
volume = "22",
pages = "149--172",
journal = "Statistica Sinica",
issn = "1017-0405",
publisher = "Institute of Statistical Science",
number = "1",

}

TY - JOUR

T1 - Doubly robust nonparametric multiple imputation for ignorable missing data

AU - Long, Qi

AU - Hsu, Chiu-Hsieh

AU - Li, Yisheng

PY - 2012/1

Y1 - 2012/1

N2 - Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.

AB - Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.

KW - Doubly robust

KW - Missing at random

KW - Multiple imputation

KW - Nearest neighbor

KW - Nonparametric imputation

KW - Sensitivity analysis

UR - http://www.scopus.com/inward/record.url?scp=84863381809&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863381809&partnerID=8YFLogxK

U2 - 10.5705/ss.2010.069

DO - 10.5705/ss.2010.069

M3 - Article

VL - 22

SP - 149

EP - 172

JO - Statistica Sinica

JF - Statistica Sinica

SN - 1017-0405

IS - 1

ER -