Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random

Qi Long, Xiaoxi Zhang, Chiu-Hsieh Hsu

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on complete cases loses efficiency because of the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. Although a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve. We conduct simulation studies to evaluate the finite sample performance of the proposed methods and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods by using an observational study of maternal depression during pregnancy.

Original languageEnglish (US)
Pages (from-to)3149-3161
Number of pages13
JournalStatistics in Medicine
Volume30
Issue number26
DOIs
StatePublished - Nov 20 2011

Fingerprint

Missing at Random
Multiple Imputation
Operating Characteristics
Biomarkers
ROC Curve
Receiver
Auxiliary Variables
Imputation
Receiver Operating Characteristic Curve
Propensity Score
Model Misspecification
Observational Study
Curse of Dimensionality
Pregnancy
Dimension Reduction
Sample Size
Observational Studies
Diagnostics
Mothers
Simulation Study

Keywords

  • Area under curve
  • Bootstrap methods
  • Dimension reduction
  • Multiple imputation
  • Nearest neighbor methods
  • Nonparametric imputation
  • Receiver operating characteristics curve

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random. / Long, Qi; Zhang, Xiaoxi; Hsu, Chiu-Hsieh.

In: Statistics in Medicine, Vol. 30, No. 26, 20.11.2011, p. 3149-3161.

Research output: Contribution to journalArticle

@article{604eb88f39904c5db67521e6fa3a43bd,
title = "Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random",
abstract = "The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on complete cases loses efficiency because of the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. Although a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve. We conduct simulation studies to evaluate the finite sample performance of the proposed methods and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods by using an observational study of maternal depression during pregnancy.",
keywords = "Area under curve, Bootstrap methods, Dimension reduction, Multiple imputation, Nearest neighbor methods, Nonparametric imputation, Receiver operating characteristics curve",
author = "Qi Long and Xiaoxi Zhang and Chiu-Hsieh Hsu",
year = "2011",
month = "11",
day = "20",
doi = "10.1002/sim.4338",
language = "English (US)",
volume = "30",
pages = "3149--3161",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "26",

}

TY - JOUR

T1 - Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random

AU - Long, Qi

AU - Zhang, Xiaoxi

AU - Hsu, Chiu-Hsieh

PY - 2011/11/20

Y1 - 2011/11/20

N2 - The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on complete cases loses efficiency because of the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. Although a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve. We conduct simulation studies to evaluate the finite sample performance of the proposed methods and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods by using an observational study of maternal depression during pregnancy.

AB - The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on complete cases loses efficiency because of the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. Although a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve. We conduct simulation studies to evaluate the finite sample performance of the proposed methods and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods by using an observational study of maternal depression during pregnancy.

KW - Area under curve

KW - Bootstrap methods

KW - Dimension reduction

KW - Multiple imputation

KW - Nearest neighbor methods

KW - Nonparametric imputation

KW - Receiver operating characteristics curve

UR - http://www.scopus.com/inward/record.url?scp=80055065425&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80055065425&partnerID=8YFLogxK

U2 - 10.1002/sim.4338

DO - 10.1002/sim.4338

M3 - Article

C2 - 22025311

AN - SCOPUS:80055065425

VL - 30

SP - 3149

EP - 3161

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 26

ER -