Statistical analysis and handling of missing data in cluster randomized trials: A systematic review

Mallorie H. Fiero, Shuang Huang, Eyal - Oren, Melanie L Bell

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Background: Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. Methods: We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Results: Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. Conclusions: High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.

Original languageEnglish (US)
Article number72
JournalTrials
Volume17
Issue number1
DOIs
StatePublished - Feb 9 2016

Fingerprint

Cluster Analysis
Statistical Data Interpretation
Random Allocation
PubMed
Research Design
Research Personnel
Health
Research

Keywords

  • Cluster randomized trials
  • Dropout
  • Missing data
  • Sensitivity analysis

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Pharmacology (medical)

Cite this

Statistical analysis and handling of missing data in cluster randomized trials : A systematic review. / Fiero, Mallorie H.; Huang, Shuang; Oren, Eyal -; Bell, Melanie L.

In: Trials, Vol. 17, No. 1, 72, 09.02.2016.

Research output: Contribution to journalArticle

@article{ad701f8cc02245edbcfa2c5aca894937,
title = "Statistical analysis and handling of missing data in cluster randomized trials: A systematic review",
abstract = "Background: Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. Methods: We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Results: Of the 86 included CRTs, 80 (93{\%}) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19{\%} (range 0.5 to 90{\%}). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55{\%}), whereas 18 (22{\%}) used mixed models, six (8{\%}) used single imputation, four (5{\%}) used unweighted generalized estimating equations, and two (2{\%}) used multiple imputation. Fourteen (16{\%}) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78{\%}) trials accounted for clustering in the primary analysis. Conclusions: High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.",
keywords = "Cluster randomized trials, Dropout, Missing data, Sensitivity analysis",
author = "Fiero, {Mallorie H.} and Shuang Huang and Oren, {Eyal -} and Bell, {Melanie L}",
year = "2016",
month = "2",
day = "9",
doi = "10.1186/s13063-016-1201-z",
language = "English (US)",
volume = "17",
journal = "Trials",
issn = "1745-6215",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Statistical analysis and handling of missing data in cluster randomized trials

T2 - A systematic review

AU - Fiero, Mallorie H.

AU - Huang, Shuang

AU - Oren, Eyal -

AU - Bell, Melanie L

PY - 2016/2/9

Y1 - 2016/2/9

N2 - Background: Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. Methods: We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Results: Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. Conclusions: High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.

AB - Background: Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. Methods: We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Results: Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. Conclusions: High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.

KW - Cluster randomized trials

KW - Dropout

KW - Missing data

KW - Sensitivity analysis

UR - http://www.scopus.com/inward/record.url?scp=84957882755&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84957882755&partnerID=8YFLogxK

U2 - 10.1186/s13063-016-1201-z

DO - 10.1186/s13063-016-1201-z

M3 - Article

C2 - 26862034

AN - SCOPUS:84957882755

VL - 17

JO - Trials

JF - Trials

SN - 1745-6215

IS - 1

M1 - 72

ER -