Cox regression analysis with missing covariates via nonparametric multiple imputation

Chiu-Hsieh Hsu, Mandi Yu

Research output: Contribution to journalArticle

Abstract

We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

Original languageEnglish (US)
Pages (from-to)1676-1688
Number of pages13
JournalStatistical Methods in Medical Research
Volume28
Issue number6
DOIs
StatePublished - Jun 1 2019

Fingerprint

Cox Regression
Missing Covariates
Multiple Imputation
Imputation
Regression Analysis
Covariates
Misspecification
Observation
SEER Program
Observed Information
Missing Observations
Link Function
Epidemiology
Censoring
Regression Coefficient
Breast Cancer
Model
Surveillance
Completion
Nearest Neighbor

Keywords

  • Augmented inverse probability weighted method
  • Cox regression
  • missing covariates
  • multiple imputation
  • predictive mean matching

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Cite this

Cox regression analysis with missing covariates via nonparametric multiple imputation. / Hsu, Chiu-Hsieh; Yu, Mandi.

In: Statistical Methods in Medical Research, Vol. 28, No. 6, 01.06.2019, p. 1676-1688.

Research output: Contribution to journalArticle

@article{970fabf6207b425eb2cd0a8c53273226,
title = "Cox regression analysis with missing covariates via nonparametric multiple imputation",
abstract = "We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.",
keywords = "Augmented inverse probability weighted method, Cox regression, missing covariates, multiple imputation, predictive mean matching",
author = "Chiu-Hsieh Hsu and Mandi Yu",
year = "2019",
month = "6",
day = "1",
doi = "10.1177/0962280218772592",
language = "English (US)",
volume = "28",
pages = "1676--1688",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "SAGE Publications Ltd",
number = "6",

}

TY - JOUR

T1 - Cox regression analysis with missing covariates via nonparametric multiple imputation

AU - Hsu, Chiu-Hsieh

AU - Yu, Mandi

PY - 2019/6/1

Y1 - 2019/6/1

N2 - We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

AB - We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

KW - Augmented inverse probability weighted method

KW - Cox regression

KW - missing covariates

KW - multiple imputation

KW - predictive mean matching

UR - http://www.scopus.com/inward/record.url?scp=85067348774&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067348774&partnerID=8YFLogxK

U2 - 10.1177/0962280218772592

DO - 10.1177/0962280218772592

M3 - Article

C2 - 29717943

AN - SCOPUS:85067348774

VL - 28

SP - 1676

EP - 1688

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 6

ER -