The impact of automated filtering of BLAST-determined homologs in the phylogenetic detection of horizontal gene transfer from a transcriptome assembly

Jennifer H. Wisecaver, Jeremiah Hackett

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Phylomes (comprehensive sets of gene phylogenies for organisms) are built to investigate fundamental questions in genomics and evolutionary biology, such as those pertaining to the detection and characterization of horizontal gene transfer in microbes. To address these questions, phylome construction demands rigorous yet efficient phylogenetic methods. Currently, many sequence alignment and tree-building models can analyze several thousands of genes in a high-throughput manner. However, the phylogenetics is complicated by variability in sequence divergence and different taxon sampling among genes. In addition, homolog selection for automated approaches often relies on arbitrary sequence similarity thresholds that are likely inappropriate for all genes in a genome. To investigate the effects of automated homolog selection on the detection of horizontal gene transfer using phylogenomics, we constructed the phylome of a transcriptome assembly of Alexandrium tamarense, a microbial eukaryote with a history of horizontal and endosymbiotic gene transfer, using seven sequence similarity thresholds for selecting putative homologs to be included in phylogenetic analyses. We show that no single threshold recovered informative trees for the majority of A. tamarense unigenes compared to the pooled results from all pipeline iterations. As much as 29% of trees built could have misleading phylogenetic relationships that appear biased in favor of those otherwise indicative of horizontal gene transfer. Perhaps worse, nearly half of the unigenes were represented by a single tree built at just one threshold, making it difficult to assess the validity of phylogenetic relationships recovered in these cases. However, combining the results from several pipeline iterations maximizes the number of informative phylogenies. Moreover, when the same phylogenetic relationship for a given unigene is recovered in multiple pipeline iterations, conclusions regarding gene origin are more robust to methodological artifact. Using these methods, the majority of A. tamarense unigenes showed evolutionary relationships indicative of vertical inheritance. Nevertheless, many other unigenes revealed diverse phylogenetic associations, suggestive of possible gene transfer. This analysis suggests that caution should be used when interpreting the results from phylogenetic pipelines implementing a single similarity threshold. Our approach is a practical method to mitigate the problems associated with automated sequence selection in phylogenomics.

Original languageEnglish (US)
Pages (from-to)184-192
Number of pages9
JournalMolecular Phylogenetics and Evolution
Volume71
Issue number1
DOIs
StatePublished - Feb 2014

Fingerprint

Horizontal Gene Transfer
gene transfer
Transcriptome
transcriptome
phylogenetics
unigenes
phylogeny
Genes
gene
Phylogeny
genes
Sequence Alignment
Genomics
Eukaryota
Artifacts
horizontal gene transfer
detection
evolutionary biology
eukaryote
Genome

Keywords

  • Dinoflagellates
  • Gene trees
  • Homolog selection
  • Horizontal gene transfer
  • Phylogenomics
  • Taxon sampling

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Cite this

@article{3112628114bb4f9bbeabf6b1fab39006,
title = "The impact of automated filtering of BLAST-determined homologs in the phylogenetic detection of horizontal gene transfer from a transcriptome assembly",
abstract = "Phylomes (comprehensive sets of gene phylogenies for organisms) are built to investigate fundamental questions in genomics and evolutionary biology, such as those pertaining to the detection and characterization of horizontal gene transfer in microbes. To address these questions, phylome construction demands rigorous yet efficient phylogenetic methods. Currently, many sequence alignment and tree-building models can analyze several thousands of genes in a high-throughput manner. However, the phylogenetics is complicated by variability in sequence divergence and different taxon sampling among genes. In addition, homolog selection for automated approaches often relies on arbitrary sequence similarity thresholds that are likely inappropriate for all genes in a genome. To investigate the effects of automated homolog selection on the detection of horizontal gene transfer using phylogenomics, we constructed the phylome of a transcriptome assembly of Alexandrium tamarense, a microbial eukaryote with a history of horizontal and endosymbiotic gene transfer, using seven sequence similarity thresholds for selecting putative homologs to be included in phylogenetic analyses. We show that no single threshold recovered informative trees for the majority of A. tamarense unigenes compared to the pooled results from all pipeline iterations. As much as 29{\%} of trees built could have misleading phylogenetic relationships that appear biased in favor of those otherwise indicative of horizontal gene transfer. Perhaps worse, nearly half of the unigenes were represented by a single tree built at just one threshold, making it difficult to assess the validity of phylogenetic relationships recovered in these cases. However, combining the results from several pipeline iterations maximizes the number of informative phylogenies. Moreover, when the same phylogenetic relationship for a given unigene is recovered in multiple pipeline iterations, conclusions regarding gene origin are more robust to methodological artifact. Using these methods, the majority of A. tamarense unigenes showed evolutionary relationships indicative of vertical inheritance. Nevertheless, many other unigenes revealed diverse phylogenetic associations, suggestive of possible gene transfer. This analysis suggests that caution should be used when interpreting the results from phylogenetic pipelines implementing a single similarity threshold. Our approach is a practical method to mitigate the problems associated with automated sequence selection in phylogenomics.",
keywords = "Dinoflagellates, Gene trees, Homolog selection, Horizontal gene transfer, Phylogenomics, Taxon sampling",
author = "Wisecaver, {Jennifer H.} and Jeremiah Hackett",
year = "2014",
month = "2",
doi = "10.1016/j.ympev.2013.11.016",
language = "English (US)",
volume = "71",
pages = "184--192",
journal = "Molecular Phylogenetics and Evolution",
issn = "1055-7903",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - The impact of automated filtering of BLAST-determined homologs in the phylogenetic detection of horizontal gene transfer from a transcriptome assembly

AU - Wisecaver, Jennifer H.

AU - Hackett, Jeremiah

PY - 2014/2

Y1 - 2014/2

N2 - Phylomes (comprehensive sets of gene phylogenies for organisms) are built to investigate fundamental questions in genomics and evolutionary biology, such as those pertaining to the detection and characterization of horizontal gene transfer in microbes. To address these questions, phylome construction demands rigorous yet efficient phylogenetic methods. Currently, many sequence alignment and tree-building models can analyze several thousands of genes in a high-throughput manner. However, the phylogenetics is complicated by variability in sequence divergence and different taxon sampling among genes. In addition, homolog selection for automated approaches often relies on arbitrary sequence similarity thresholds that are likely inappropriate for all genes in a genome. To investigate the effects of automated homolog selection on the detection of horizontal gene transfer using phylogenomics, we constructed the phylome of a transcriptome assembly of Alexandrium tamarense, a microbial eukaryote with a history of horizontal and endosymbiotic gene transfer, using seven sequence similarity thresholds for selecting putative homologs to be included in phylogenetic analyses. We show that no single threshold recovered informative trees for the majority of A. tamarense unigenes compared to the pooled results from all pipeline iterations. As much as 29% of trees built could have misleading phylogenetic relationships that appear biased in favor of those otherwise indicative of horizontal gene transfer. Perhaps worse, nearly half of the unigenes were represented by a single tree built at just one threshold, making it difficult to assess the validity of phylogenetic relationships recovered in these cases. However, combining the results from several pipeline iterations maximizes the number of informative phylogenies. Moreover, when the same phylogenetic relationship for a given unigene is recovered in multiple pipeline iterations, conclusions regarding gene origin are more robust to methodological artifact. Using these methods, the majority of A. tamarense unigenes showed evolutionary relationships indicative of vertical inheritance. Nevertheless, many other unigenes revealed diverse phylogenetic associations, suggestive of possible gene transfer. This analysis suggests that caution should be used when interpreting the results from phylogenetic pipelines implementing a single similarity threshold. Our approach is a practical method to mitigate the problems associated with automated sequence selection in phylogenomics.

AB - Phylomes (comprehensive sets of gene phylogenies for organisms) are built to investigate fundamental questions in genomics and evolutionary biology, such as those pertaining to the detection and characterization of horizontal gene transfer in microbes. To address these questions, phylome construction demands rigorous yet efficient phylogenetic methods. Currently, many sequence alignment and tree-building models can analyze several thousands of genes in a high-throughput manner. However, the phylogenetics is complicated by variability in sequence divergence and different taxon sampling among genes. In addition, homolog selection for automated approaches often relies on arbitrary sequence similarity thresholds that are likely inappropriate for all genes in a genome. To investigate the effects of automated homolog selection on the detection of horizontal gene transfer using phylogenomics, we constructed the phylome of a transcriptome assembly of Alexandrium tamarense, a microbial eukaryote with a history of horizontal and endosymbiotic gene transfer, using seven sequence similarity thresholds for selecting putative homologs to be included in phylogenetic analyses. We show that no single threshold recovered informative trees for the majority of A. tamarense unigenes compared to the pooled results from all pipeline iterations. As much as 29% of trees built could have misleading phylogenetic relationships that appear biased in favor of those otherwise indicative of horizontal gene transfer. Perhaps worse, nearly half of the unigenes were represented by a single tree built at just one threshold, making it difficult to assess the validity of phylogenetic relationships recovered in these cases. However, combining the results from several pipeline iterations maximizes the number of informative phylogenies. Moreover, when the same phylogenetic relationship for a given unigene is recovered in multiple pipeline iterations, conclusions regarding gene origin are more robust to methodological artifact. Using these methods, the majority of A. tamarense unigenes showed evolutionary relationships indicative of vertical inheritance. Nevertheless, many other unigenes revealed diverse phylogenetic associations, suggestive of possible gene transfer. This analysis suggests that caution should be used when interpreting the results from phylogenetic pipelines implementing a single similarity threshold. Our approach is a practical method to mitigate the problems associated with automated sequence selection in phylogenomics.

KW - Dinoflagellates

KW - Gene trees

KW - Homolog selection

KW - Horizontal gene transfer

KW - Phylogenomics

KW - Taxon sampling

UR - http://www.scopus.com/inward/record.url?scp=84890819896&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890819896&partnerID=8YFLogxK

U2 - 10.1016/j.ympev.2013.11.016

DO - 10.1016/j.ympev.2013.11.016

M3 - Article

C2 - 24321593

AN - SCOPUS:84890819896

VL - 71

SP - 184

EP - 192

JO - Molecular Phylogenetics and Evolution

JF - Molecular Phylogenetics and Evolution

SN - 1055-7903

IS - 1

ER -