Ensemble multiple sequence alignment via advising

Dan DeBlasio, John D Kececioglu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The multiple sequence alignments computed by an aligner for different settings of its parameters, as well as the alignments computed by different aligners using their default settings, can differ markedly in accuracy. Parameter advising is the task of choosing a parameter setting for an aligner to maximize the accuracy of the resulting alignment. We extend parameter advising to aligner advising, which in contrast chooses among a set of aligners to maximize accuracy. In the context of aligner advising, default advising selects from a set of aligners that are using their default settings, while general advising selects both the aligner and its parameter setting. In this paper, we apply aligner advising for the first time, to create a true ensemble aligner. Through cross-validation experiments on benchmark protein sequence alignments, we show that parameter advising boosts an aligner's accuracy beyond its default setting for virtually all of the standard aligners currently used in practice. Furthermore, aligner advising with a collection of aligners further improves upon parameter advising with any single aligner, though surprisingly the performance of default advising on testing data is actually superior to general advising due to less overfitting to training data. The new ensemble aligner that results from aligner advising is significantly more accurate than the best single default aligner, especially on hard-to-align sequences. This successfully demonstrates how to construct out of a collection of individual aligners, a more accurate ensemble aligner.

Original languageEnglish (US)
Title of host publicationBCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages452-461
Number of pages10
ISBN (Print)9781450338530
DOIs
StatePublished - Sep 9 2015
Event6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015 - Atlanta, United States
Duration: Sep 9 2015Sep 12 2015

Other

Other6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015
CountryUnited States
CityAtlanta
Period9/9/159/12/15

Fingerprint

Sequence Alignment
Benchmarking
Proteins
Testing
Experiments

Keywords

  • Accuracy estimation
  • Aligner advising
  • Ensemble methods
  • Multiple sequence alignment
  • Parameter advising

ASJC Scopus subject areas

  • Software
  • Health Informatics
  • Computer Science Applications
  • Biomedical Engineering

Cite this

DeBlasio, D., & Kececioglu, J. D. (2015). Ensemble multiple sequence alignment via advising. In BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 452-461). Association for Computing Machinery, Inc. https://doi.org/10.1145/2808719.2808766

Ensemble multiple sequence alignment via advising. / DeBlasio, Dan; Kececioglu, John D.

BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2015. p. 452-461.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

DeBlasio, D & Kececioglu, JD 2015, Ensemble multiple sequence alignment via advising. in BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, pp. 452-461, 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015, Atlanta, United States, 9/9/15. https://doi.org/10.1145/2808719.2808766
DeBlasio D, Kececioglu JD. Ensemble multiple sequence alignment via advising. In BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc. 2015. p. 452-461 https://doi.org/10.1145/2808719.2808766
DeBlasio, Dan ; Kececioglu, John D. / Ensemble multiple sequence alignment via advising. BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2015. pp. 452-461
@inproceedings{c09d11c3fa67436487c41afe46db6d0b,
title = "Ensemble multiple sequence alignment via advising",
abstract = "The multiple sequence alignments computed by an aligner for different settings of its parameters, as well as the alignments computed by different aligners using their default settings, can differ markedly in accuracy. Parameter advising is the task of choosing a parameter setting for an aligner to maximize the accuracy of the resulting alignment. We extend parameter advising to aligner advising, which in contrast chooses among a set of aligners to maximize accuracy. In the context of aligner advising, default advising selects from a set of aligners that are using their default settings, while general advising selects both the aligner and its parameter setting. In this paper, we apply aligner advising for the first time, to create a true ensemble aligner. Through cross-validation experiments on benchmark protein sequence alignments, we show that parameter advising boosts an aligner's accuracy beyond its default setting for virtually all of the standard aligners currently used in practice. Furthermore, aligner advising with a collection of aligners further improves upon parameter advising with any single aligner, though surprisingly the performance of default advising on testing data is actually superior to general advising due to less overfitting to training data. The new ensemble aligner that results from aligner advising is significantly more accurate than the best single default aligner, especially on hard-to-align sequences. This successfully demonstrates how to construct out of a collection of individual aligners, a more accurate ensemble aligner.",
keywords = "Accuracy estimation, Aligner advising, Ensemble methods, Multiple sequence alignment, Parameter advising",
author = "Dan DeBlasio and Kececioglu, {John D}",
year = "2015",
month = "9",
day = "9",
doi = "10.1145/2808719.2808766",
language = "English (US)",
isbn = "9781450338530",
pages = "452--461",
booktitle = "BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Ensemble multiple sequence alignment via advising

AU - DeBlasio, Dan

AU - Kececioglu, John D

PY - 2015/9/9

Y1 - 2015/9/9

N2 - The multiple sequence alignments computed by an aligner for different settings of its parameters, as well as the alignments computed by different aligners using their default settings, can differ markedly in accuracy. Parameter advising is the task of choosing a parameter setting for an aligner to maximize the accuracy of the resulting alignment. We extend parameter advising to aligner advising, which in contrast chooses among a set of aligners to maximize accuracy. In the context of aligner advising, default advising selects from a set of aligners that are using their default settings, while general advising selects both the aligner and its parameter setting. In this paper, we apply aligner advising for the first time, to create a true ensemble aligner. Through cross-validation experiments on benchmark protein sequence alignments, we show that parameter advising boosts an aligner's accuracy beyond its default setting for virtually all of the standard aligners currently used in practice. Furthermore, aligner advising with a collection of aligners further improves upon parameter advising with any single aligner, though surprisingly the performance of default advising on testing data is actually superior to general advising due to less overfitting to training data. The new ensemble aligner that results from aligner advising is significantly more accurate than the best single default aligner, especially on hard-to-align sequences. This successfully demonstrates how to construct out of a collection of individual aligners, a more accurate ensemble aligner.

AB - The multiple sequence alignments computed by an aligner for different settings of its parameters, as well as the alignments computed by different aligners using their default settings, can differ markedly in accuracy. Parameter advising is the task of choosing a parameter setting for an aligner to maximize the accuracy of the resulting alignment. We extend parameter advising to aligner advising, which in contrast chooses among a set of aligners to maximize accuracy. In the context of aligner advising, default advising selects from a set of aligners that are using their default settings, while general advising selects both the aligner and its parameter setting. In this paper, we apply aligner advising for the first time, to create a true ensemble aligner. Through cross-validation experiments on benchmark protein sequence alignments, we show that parameter advising boosts an aligner's accuracy beyond its default setting for virtually all of the standard aligners currently used in practice. Furthermore, aligner advising with a collection of aligners further improves upon parameter advising with any single aligner, though surprisingly the performance of default advising on testing data is actually superior to general advising due to less overfitting to training data. The new ensemble aligner that results from aligner advising is significantly more accurate than the best single default aligner, especially on hard-to-align sequences. This successfully demonstrates how to construct out of a collection of individual aligners, a more accurate ensemble aligner.

KW - Accuracy estimation

KW - Aligner advising

KW - Ensemble methods

KW - Multiple sequence alignment

KW - Parameter advising

UR - http://www.scopus.com/inward/record.url?scp=84963556566&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963556566&partnerID=8YFLogxK

U2 - 10.1145/2808719.2808766

DO - 10.1145/2808719.2808766

M3 - Conference contribution

AN - SCOPUS:84963556566

SN - 9781450338530

SP - 452

EP - 461

BT - BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

PB - Association for Computing Machinery, Inc

ER -