A myriad of methods: Calculated sample size for two proportions was dependent on the choice of sample size formula and software

Melanie L Bell, Armando Teixeira-Pinto, Joanne E. McKenzie, Jake Olivier

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Objectives Several methods exist to calculate sample size for the difference of proportions (risk difference). Researchers are often unaware that there are different formulae, different underlying assumptions, and what the impact of choice of formula is on the calculated sample size. The aim of this study was to discuss and compare different sample size formulae for the risk difference. Study Design and Setting Four sample size formulae were used to calculate sample size for nine scenarios. Software documentation for SAS, Stata, G*Power, PASS, StatXact, and several R libraries were searched for default assumptions. Each package was used to calculate sample size for two scenarios. Results We demonstrate that for a set of parameters, sample size can vary as much as 60% depending on the formula used. Varying software and assumptions yielded discrepancies of 78% and 7% between the smallest and largest calculated sizes, respectively. Discrepancies were most pronounced when powering for large risk differences. The default assumptions varied considerably between software packages, and defaults were not clearly documented. Conclusion Researchers should be aware of the assumptions in power calculations made by different statistical software packages. Assumptions should be explicitly stated in grant proposals and manuscripts and should match proposed analyses.

Original languageEnglish (US)
Pages (from-to)601-605
Number of pages5
JournalJournal of Clinical Epidemiology
Volume67
Issue number5
DOIs
StatePublished - 2014

Fingerprint

Sample Size
Software
Research Personnel
Manuscripts
Organized Financing
Documentation
Libraries

Keywords

  • Binary
  • Continuity correction
  • Difference in proportions
  • Power
  • Risk difference
  • Sample size
  • Statistical software

ASJC Scopus subject areas

  • Epidemiology

Cite this

A myriad of methods : Calculated sample size for two proportions was dependent on the choice of sample size formula and software. / Bell, Melanie L; Teixeira-Pinto, Armando; McKenzie, Joanne E.; Olivier, Jake.

In: Journal of Clinical Epidemiology, Vol. 67, No. 5, 2014, p. 601-605.

Research output: Contribution to journalArticle

@article{424cd0acc3ee4b6ab60908238fd408d6,
title = "A myriad of methods: Calculated sample size for two proportions was dependent on the choice of sample size formula and software",
abstract = "Objectives Several methods exist to calculate sample size for the difference of proportions (risk difference). Researchers are often unaware that there are different formulae, different underlying assumptions, and what the impact of choice of formula is on the calculated sample size. The aim of this study was to discuss and compare different sample size formulae for the risk difference. Study Design and Setting Four sample size formulae were used to calculate sample size for nine scenarios. Software documentation for SAS, Stata, G*Power, PASS, StatXact, and several R libraries were searched for default assumptions. Each package was used to calculate sample size for two scenarios. Results We demonstrate that for a set of parameters, sample size can vary as much as 60{\%} depending on the formula used. Varying software and assumptions yielded discrepancies of 78{\%} and 7{\%} between the smallest and largest calculated sizes, respectively. Discrepancies were most pronounced when powering for large risk differences. The default assumptions varied considerably between software packages, and defaults were not clearly documented. Conclusion Researchers should be aware of the assumptions in power calculations made by different statistical software packages. Assumptions should be explicitly stated in grant proposals and manuscripts and should match proposed analyses.",
keywords = "Binary, Continuity correction, Difference in proportions, Power, Risk difference, Sample size, Statistical software",
author = "Bell, {Melanie L} and Armando Teixeira-Pinto and McKenzie, {Joanne E.} and Jake Olivier",
year = "2014",
doi = "10.1016/j.jclinepi.2013.10.008",
language = "English (US)",
volume = "67",
pages = "601--605",
journal = "Journal of Clinical Epidemiology",
issn = "0895-4356",
publisher = "Elsevier USA",
number = "5",

}

TY - JOUR

T1 - A myriad of methods

T2 - Calculated sample size for two proportions was dependent on the choice of sample size formula and software

AU - Bell, Melanie L

AU - Teixeira-Pinto, Armando

AU - McKenzie, Joanne E.

AU - Olivier, Jake

PY - 2014

Y1 - 2014

N2 - Objectives Several methods exist to calculate sample size for the difference of proportions (risk difference). Researchers are often unaware that there are different formulae, different underlying assumptions, and what the impact of choice of formula is on the calculated sample size. The aim of this study was to discuss and compare different sample size formulae for the risk difference. Study Design and Setting Four sample size formulae were used to calculate sample size for nine scenarios. Software documentation for SAS, Stata, G*Power, PASS, StatXact, and several R libraries were searched for default assumptions. Each package was used to calculate sample size for two scenarios. Results We demonstrate that for a set of parameters, sample size can vary as much as 60% depending on the formula used. Varying software and assumptions yielded discrepancies of 78% and 7% between the smallest and largest calculated sizes, respectively. Discrepancies were most pronounced when powering for large risk differences. The default assumptions varied considerably between software packages, and defaults were not clearly documented. Conclusion Researchers should be aware of the assumptions in power calculations made by different statistical software packages. Assumptions should be explicitly stated in grant proposals and manuscripts and should match proposed analyses.

AB - Objectives Several methods exist to calculate sample size for the difference of proportions (risk difference). Researchers are often unaware that there are different formulae, different underlying assumptions, and what the impact of choice of formula is on the calculated sample size. The aim of this study was to discuss and compare different sample size formulae for the risk difference. Study Design and Setting Four sample size formulae were used to calculate sample size for nine scenarios. Software documentation for SAS, Stata, G*Power, PASS, StatXact, and several R libraries were searched for default assumptions. Each package was used to calculate sample size for two scenarios. Results We demonstrate that for a set of parameters, sample size can vary as much as 60% depending on the formula used. Varying software and assumptions yielded discrepancies of 78% and 7% between the smallest and largest calculated sizes, respectively. Discrepancies were most pronounced when powering for large risk differences. The default assumptions varied considerably between software packages, and defaults were not clearly documented. Conclusion Researchers should be aware of the assumptions in power calculations made by different statistical software packages. Assumptions should be explicitly stated in grant proposals and manuscripts and should match proposed analyses.

KW - Binary

KW - Continuity correction

KW - Difference in proportions

KW - Power

KW - Risk difference

KW - Sample size

KW - Statistical software

UR - http://www.scopus.com/inward/record.url?scp=84897479934&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897479934&partnerID=8YFLogxK

U2 - 10.1016/j.jclinepi.2013.10.008

DO - 10.1016/j.jclinepi.2013.10.008

M3 - Article

C2 - 24439070

AN - SCOPUS:84897479934

VL - 67

SP - 601

EP - 605

JO - Journal of Clinical Epidemiology

JF - Journal of Clinical Epidemiology

SN - 0895-4356

IS - 5

ER -