Accurate genome relative abundance estimation for closely related species in a metagenomic sample

Michael B. Sohn, Lingling An, Naruekamol Pookhao, Qike Li

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.Results: We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn's disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn's disease.Conclusions: By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Original languageEnglish (US)
Article number242
JournalBMC Bioinformatics
Volume15
Issue number1
DOIs
StatePublished - Jul 16 2014

Fingerprint

Metagenomics
Elimination
Genome
Genes
Taxonomies
Taxonomy
Chemical analysis
Crohn Disease
Genomics
Mouth
Homology
Cavity
Alignment
Benchmarking
Sequence Analysis
Quantification
Limiting
Benchmark
Datasets
Estimate

Keywords

  • Alignment similarity
  • Closely related species
  • Genomic similarity
  • Metagenomics

ASJC Scopus subject areas

  • Applied Mathematics
  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Cite this

Accurate genome relative abundance estimation for closely related species in a metagenomic sample. / Sohn, Michael B.; An, Lingling; Pookhao, Naruekamol; Li, Qike.

In: BMC Bioinformatics, Vol. 15, No. 1, 242, 16.07.2014.

Research output: Contribution to journalArticle

@article{d2a12c4d91a540649eccba4f06766562,
title = "Accurate genome relative abundance estimation for closely related species in a metagenomic sample",
abstract = "Background: Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.Results: We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn's disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn's disease.Conclusions: By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.",
keywords = "Alignment similarity, Closely related species, Genomic similarity, Metagenomics",
author = "Sohn, {Michael B.} and Lingling An and Naruekamol Pookhao and Qike Li",
year = "2014",
month = "7",
day = "16",
doi = "10.1186/1471-2105-15-242",
language = "English (US)",
volume = "15",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Accurate genome relative abundance estimation for closely related species in a metagenomic sample

AU - Sohn, Michael B.

AU - An, Lingling

AU - Pookhao, Naruekamol

AU - Li, Qike

PY - 2014/7/16

Y1 - 2014/7/16

N2 - Background: Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.Results: We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn's disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn's disease.Conclusions: By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

AB - Background: Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.Results: We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn's disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn's disease.Conclusions: By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

KW - Alignment similarity

KW - Closely related species

KW - Genomic similarity

KW - Metagenomics

UR - http://www.scopus.com/inward/record.url?scp=84904271039&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904271039&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-15-242

DO - 10.1186/1471-2105-15-242

M3 - Article

C2 - 25027647

AN - SCOPUS:84904271039

VL - 15

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 242

ER -