Learning Parameter-Advising Sets for Multiple Sequence Alignment

Dan Deblasio, John D Kececioglu

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. In this paper, we consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice.

Original languageEnglish (US)
Article number7102700
Pages (from-to)1028-1041
Number of pages14
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume14
Issue number5
DOIs
StatePublished - Sep 1 2017

Fingerprint

Parameter Learning
Multiple Sequence Alignment
Sequence Alignment
Learning
Benchmarking
Approximation algorithms
Alignment
Substitution reactions
Estimator
Approximation Algorithms
Experiments
Scoring

Keywords

  • accuracy estimation
  • alignment scoring functions
  • Multiple sequence alignment
  • parameter advising
  • parameter values

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

Learning Parameter-Advising Sets for Multiple Sequence Alignment. / Deblasio, Dan; Kececioglu, John D.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 14, No. 5, 7102700, 01.09.2017, p. 1028-1041.

Research output: Contribution to journalArticle

@article{4aa669e4c0e64413ae18d9e0693dbb92,
title = "Learning Parameter-Advising Sets for Multiple Sequence Alignment",
abstract = "While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. In this paper, we consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice.",
keywords = "accuracy estimation, alignment scoring functions, Multiple sequence alignment, parameter advising, parameter values",
author = "Dan Deblasio and Kececioglu, {John D}",
year = "2017",
month = "9",
day = "1",
doi = "10.1109/TCBB.2015.2430323",
language = "English (US)",
volume = "14",
pages = "1028--1041",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "5",

}

TY - JOUR

T1 - Learning Parameter-Advising Sets for Multiple Sequence Alignment

AU - Deblasio, Dan

AU - Kececioglu, John D

PY - 2017/9/1

Y1 - 2017/9/1

N2 - While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. In this paper, we consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice.

AB - While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. In this paper, we consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice.

KW - accuracy estimation

KW - alignment scoring functions

KW - Multiple sequence alignment

KW - parameter advising

KW - parameter values

UR - http://www.scopus.com/inward/record.url?scp=85018521836&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018521836&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2015.2430323

DO - 10.1109/TCBB.2015.2430323

M3 - Article

C2 - 28991725

AN - SCOPUS:85018521836

VL - 14

SP - 1028

EP - 1041

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 5

M1 - 7102700

ER -