Identification and quantitation of clinically relevant microbes in patient samples: Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity

George S. Watts, James E. Thornton, Ken Youens-Clark, Alise J. Ponsero, Marvin J. Slepian, Emmanuel Menashi, Charles Hu, Wuquan Deng, David G. Armstrong, Spenser Reed, Lee D. Cranmer, Bonnie L. Hurwitz

Research output: Contribution to journalArticle

Abstract

Infections are a serious health concern worldwide, particularly in vulnerable populations such as the immunocompromised, elderly, and young. Advances in metagenomic sequencing availability, speed, and decreased cost offer the opportunity to supplement or even replace culture-based identification of pathogens with DNA sequence-based diagnostics. Adopting metagenomic analysis for clinical use requires that all aspects of the workflow are optimized and tested, including data analysis and computational time and resources. We tested the accuracy, sensitivity, and resource requirements of three top metagenomic taxonomic classifiers that use fast k-mer based algorithms: Centrifuge, CLARK, and KrakenUniq. Binary mixtures of bacteria showed all three reliably identified organisms down to 1% relative abundance, while only the relative abundance estimates of Centrifuge and CLARK were accurate. All three classifiers identified the organisms present in their default databases from a mock bacterial community of 20 organisms, but only Centrifuge had no false positives. In addition, Centrifuge required far less computational resources and time for analysis. Centrifuge analysis of metagenomes obtained from samples of VAP, infected DFUs, and FN showed Centrifuge identified pathogenic bacteria and one virus that were corroborated by culture or a clinical PCR assay. Importantly, in both diabetic foot ulcer patients, metagenomic sequencing identified pathogens 4-6 weeks before culture. Finally, we show that Centrifuge results were minimally affected by elimination of time-consuming read quality control and host screening steps.

Original languageEnglish (US)
Article numbere1006863
JournalPLoS computational biology
Volume15
Issue number11
DOIs
StatePublished - Jan 1 2019

Fingerprint

Metagenomics
centrifuges
Centrifuges
centrifuge
Classifiers
Classifier
microorganisms
Bacteria
Sequencing
Resources
Binary Mixtures
Metagenome
Quality Control
False Positive
sampling
DNA Sequence
Pathogens
Diabetic Foot
Work Flow
Virus

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Cite this

Identification and quantitation of clinically relevant microbes in patient samples : Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity. / Watts, George S.; Thornton, James E.; Youens-Clark, Ken; Ponsero, Alise J.; Slepian, Marvin J.; Menashi, Emmanuel; Hu, Charles; Deng, Wuquan; Armstrong, David G.; Reed, Spenser; Cranmer, Lee D.; Hurwitz, Bonnie L.

In: PLoS computational biology, Vol. 15, No. 11, e1006863, 01.01.2019.

Research output: Contribution to journalArticle

Watts, George S. ; Thornton, James E. ; Youens-Clark, Ken ; Ponsero, Alise J. ; Slepian, Marvin J. ; Menashi, Emmanuel ; Hu, Charles ; Deng, Wuquan ; Armstrong, David G. ; Reed, Spenser ; Cranmer, Lee D. ; Hurwitz, Bonnie L. / Identification and quantitation of clinically relevant microbes in patient samples : Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity. In: PLoS computational biology. 2019 ; Vol. 15, No. 11.
@article{dd344cc98bd94d82a4fa99cb702a4559,
title = "Identification and quantitation of clinically relevant microbes in patient samples: Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity",
abstract = "Infections are a serious health concern worldwide, particularly in vulnerable populations such as the immunocompromised, elderly, and young. Advances in metagenomic sequencing availability, speed, and decreased cost offer the opportunity to supplement or even replace culture-based identification of pathogens with DNA sequence-based diagnostics. Adopting metagenomic analysis for clinical use requires that all aspects of the workflow are optimized and tested, including data analysis and computational time and resources. We tested the accuracy, sensitivity, and resource requirements of three top metagenomic taxonomic classifiers that use fast k-mer based algorithms: Centrifuge, CLARK, and KrakenUniq. Binary mixtures of bacteria showed all three reliably identified organisms down to 1{\%} relative abundance, while only the relative abundance estimates of Centrifuge and CLARK were accurate. All three classifiers identified the organisms present in their default databases from a mock bacterial community of 20 organisms, but only Centrifuge had no false positives. In addition, Centrifuge required far less computational resources and time for analysis. Centrifuge analysis of metagenomes obtained from samples of VAP, infected DFUs, and FN showed Centrifuge identified pathogenic bacteria and one virus that were corroborated by culture or a clinical PCR assay. Importantly, in both diabetic foot ulcer patients, metagenomic sequencing identified pathogens 4-6 weeks before culture. Finally, we show that Centrifuge results were minimally affected by elimination of time-consuming read quality control and host screening steps.",
author = "Watts, {George S.} and Thornton, {James E.} and Ken Youens-Clark and Ponsero, {Alise J.} and Slepian, {Marvin J.} and Emmanuel Menashi and Charles Hu and Wuquan Deng and Armstrong, {David G.} and Spenser Reed and Cranmer, {Lee D.} and Hurwitz, {Bonnie L.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1371/journal.pcbi.1006863",
language = "English (US)",
volume = "15",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "11",

}

TY - JOUR

T1 - Identification and quantitation of clinically relevant microbes in patient samples

T2 - Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity

AU - Watts, George S.

AU - Thornton, James E.

AU - Youens-Clark, Ken

AU - Ponsero, Alise J.

AU - Slepian, Marvin J.

AU - Menashi, Emmanuel

AU - Hu, Charles

AU - Deng, Wuquan

AU - Armstrong, David G.

AU - Reed, Spenser

AU - Cranmer, Lee D.

AU - Hurwitz, Bonnie L.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Infections are a serious health concern worldwide, particularly in vulnerable populations such as the immunocompromised, elderly, and young. Advances in metagenomic sequencing availability, speed, and decreased cost offer the opportunity to supplement or even replace culture-based identification of pathogens with DNA sequence-based diagnostics. Adopting metagenomic analysis for clinical use requires that all aspects of the workflow are optimized and tested, including data analysis and computational time and resources. We tested the accuracy, sensitivity, and resource requirements of three top metagenomic taxonomic classifiers that use fast k-mer based algorithms: Centrifuge, CLARK, and KrakenUniq. Binary mixtures of bacteria showed all three reliably identified organisms down to 1% relative abundance, while only the relative abundance estimates of Centrifuge and CLARK were accurate. All three classifiers identified the organisms present in their default databases from a mock bacterial community of 20 organisms, but only Centrifuge had no false positives. In addition, Centrifuge required far less computational resources and time for analysis. Centrifuge analysis of metagenomes obtained from samples of VAP, infected DFUs, and FN showed Centrifuge identified pathogenic bacteria and one virus that were corroborated by culture or a clinical PCR assay. Importantly, in both diabetic foot ulcer patients, metagenomic sequencing identified pathogens 4-6 weeks before culture. Finally, we show that Centrifuge results were minimally affected by elimination of time-consuming read quality control and host screening steps.

AB - Infections are a serious health concern worldwide, particularly in vulnerable populations such as the immunocompromised, elderly, and young. Advances in metagenomic sequencing availability, speed, and decreased cost offer the opportunity to supplement or even replace culture-based identification of pathogens with DNA sequence-based diagnostics. Adopting metagenomic analysis for clinical use requires that all aspects of the workflow are optimized and tested, including data analysis and computational time and resources. We tested the accuracy, sensitivity, and resource requirements of three top metagenomic taxonomic classifiers that use fast k-mer based algorithms: Centrifuge, CLARK, and KrakenUniq. Binary mixtures of bacteria showed all three reliably identified organisms down to 1% relative abundance, while only the relative abundance estimates of Centrifuge and CLARK were accurate. All three classifiers identified the organisms present in their default databases from a mock bacterial community of 20 organisms, but only Centrifuge had no false positives. In addition, Centrifuge required far less computational resources and time for analysis. Centrifuge analysis of metagenomes obtained from samples of VAP, infected DFUs, and FN showed Centrifuge identified pathogenic bacteria and one virus that were corroborated by culture or a clinical PCR assay. Importantly, in both diabetic foot ulcer patients, metagenomic sequencing identified pathogens 4-6 weeks before culture. Finally, we show that Centrifuge results were minimally affected by elimination of time-consuming read quality control and host screening steps.

UR - http://www.scopus.com/inward/record.url?scp=85076108681&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076108681&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1006863

DO - 10.1371/journal.pcbi.1006863

M3 - Article

C2 - 31756192

AN - SCOPUS:85076108681

VL - 15

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 11

M1 - e1006863

ER -