Simple and fast inverse alignment

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)

Abstract

For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. While some choices for substitution scores are now common, largely due to convention, there is no standard for choosing gap penalties. An objective way to resolve this question is to learn the appropriate values by solving the Inverse String Alignment Problem: given examples of correct alignments, find parameter values that make the examples be optimal-scoring alignments of their strings. We present a new polynomial-time algorithm for Inverse String Alignment that is simple to implement, fast in practice, and for the first time can learn hundreds of parameters simultaneously. The approach is also flexible: minor modifications allow us to solve inverse unique alignment (find parameter values that make the examples be the unique optimal alignments of their strings), and inverse near-optimal alignment (find parameter values that make the example alignments be as close to optimal as possible). Computational results with an implementation for global alignment show that, for the first time, we can find best-possible values for all 212 parameters of the standard protein-sequence scoring-model from hundreds of alignments in a few minutes of computation.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages441-455
Number of pages15
Volume3909 LNBI
DOIs
StatePublished - 2006
Event10th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2006 - Venice, Italy
Duration: Apr 2 2006Apr 5 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3909 LNBI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2006
CountryItaly
CityVenice
Period4/2/064/5/06

Fingerprint

Alignment
Sequence Alignment
Strings
Scoring
Proteins
Substitution
Substitution reactions
Protein Sequence
Polynomial-time Algorithm
Penalty
Computational Results
Minor
Resolve
Polynomials
Computing

Keywords

  • Affine gap penalties
  • Cutting plane algorithms
  • Linear programming
  • Parametric sequence alignment
  • Sequence analysis
  • Substitution score matrices
  • Supervised learning

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Kececioglu, J. D., & Kim, E. (2006). Simple and fast inverse alignment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3909 LNBI, pp. 441-455). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3909 LNBI). https://doi.org/10.1007/11732990_37

Simple and fast inverse alignment. / Kececioglu, John D; Kim, Eagu.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3909 LNBI 2006. p. 441-455 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3909 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kececioglu, JD & Kim, E 2006, Simple and fast inverse alignment. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3909 LNBI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3909 LNBI, pp. 441-455, 10th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2006, Venice, Italy, 4/2/06. https://doi.org/10.1007/11732990_37
Kececioglu JD, Kim E. Simple and fast inverse alignment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3909 LNBI. 2006. p. 441-455. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11732990_37
Kececioglu, John D ; Kim, Eagu. / Simple and fast inverse alignment. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3909 LNBI 2006. pp. 441-455 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{69889aa8df9f4cd080fd7a8cdc3ca06f,
title = "Simple and fast inverse alignment",
abstract = "For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. While some choices for substitution scores are now common, largely due to convention, there is no standard for choosing gap penalties. An objective way to resolve this question is to learn the appropriate values by solving the Inverse String Alignment Problem: given examples of correct alignments, find parameter values that make the examples be optimal-scoring alignments of their strings. We present a new polynomial-time algorithm for Inverse String Alignment that is simple to implement, fast in practice, and for the first time can learn hundreds of parameters simultaneously. The approach is also flexible: minor modifications allow us to solve inverse unique alignment (find parameter values that make the examples be the unique optimal alignments of their strings), and inverse near-optimal alignment (find parameter values that make the example alignments be as close to optimal as possible). Computational results with an implementation for global alignment show that, for the first time, we can find best-possible values for all 212 parameters of the standard protein-sequence scoring-model from hundreds of alignments in a few minutes of computation.",
keywords = "Affine gap penalties, Cutting plane algorithms, Linear programming, Parametric sequence alignment, Sequence analysis, Substitution score matrices, Supervised learning",
author = "Kececioglu, {John D} and Eagu Kim",
year = "2006",
doi = "10.1007/11732990_37",
language = "English (US)",
isbn = "3540332952",
volume = "3909 LNBI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "441--455",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Simple and fast inverse alignment

AU - Kececioglu, John D

AU - Kim, Eagu

PY - 2006

Y1 - 2006

N2 - For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. While some choices for substitution scores are now common, largely due to convention, there is no standard for choosing gap penalties. An objective way to resolve this question is to learn the appropriate values by solving the Inverse String Alignment Problem: given examples of correct alignments, find parameter values that make the examples be optimal-scoring alignments of their strings. We present a new polynomial-time algorithm for Inverse String Alignment that is simple to implement, fast in practice, and for the first time can learn hundreds of parameters simultaneously. The approach is also flexible: minor modifications allow us to solve inverse unique alignment (find parameter values that make the examples be the unique optimal alignments of their strings), and inverse near-optimal alignment (find parameter values that make the example alignments be as close to optimal as possible). Computational results with an implementation for global alignment show that, for the first time, we can find best-possible values for all 212 parameters of the standard protein-sequence scoring-model from hundreds of alignments in a few minutes of computation.

AB - For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. While some choices for substitution scores are now common, largely due to convention, there is no standard for choosing gap penalties. An objective way to resolve this question is to learn the appropriate values by solving the Inverse String Alignment Problem: given examples of correct alignments, find parameter values that make the examples be optimal-scoring alignments of their strings. We present a new polynomial-time algorithm for Inverse String Alignment that is simple to implement, fast in practice, and for the first time can learn hundreds of parameters simultaneously. The approach is also flexible: minor modifications allow us to solve inverse unique alignment (find parameter values that make the examples be the unique optimal alignments of their strings), and inverse near-optimal alignment (find parameter values that make the example alignments be as close to optimal as possible). Computational results with an implementation for global alignment show that, for the first time, we can find best-possible values for all 212 parameters of the standard protein-sequence scoring-model from hundreds of alignments in a few minutes of computation.

KW - Affine gap penalties

KW - Cutting plane algorithms

KW - Linear programming

KW - Parametric sequence alignment

KW - Sequence analysis

KW - Substitution score matrices

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=33745804251&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745804251&partnerID=8YFLogxK

U2 - 10.1007/11732990_37

DO - 10.1007/11732990_37

M3 - Conference contribution

SN - 3540332952

SN - 9783540332954

VL - 3909 LNBI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 441

EP - 455

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -