Aligning alignments exactly

John D Kececioglu, Dean Starrett

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.

Original languageEnglish (US)
Title of host publicationProceedings of the Annual International Conference on Computational Molecular Biology, RECOMB
Pages85-96
Number of pages12
Volume8
StatePublished - 2004
EventRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology - San Diego, CA., United States
Duration: Mar 27 2004Mar 31 2004

Other

OtherRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology
CountryUnited States
CitySan Diego, CA.
Period3/27/043/31/04

Fingerprint

Sequence Alignment
Costs and Cost Analysis
Benchmarking
Costs
Computational complexity
Demonstrations

Keywords

  • Exact algorithms
  • Linear gap costs
  • Multiple sequence alignment
  • Sum of pairs

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Computer Science(all)

Cite this

Kececioglu, J. D., & Starrett, D. (2004). Aligning alignments exactly. In Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB (Vol. 8, pp. 85-96)

Aligning alignments exactly. / Kececioglu, John D; Starrett, Dean.

Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB. Vol. 8 2004. p. 85-96.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kececioglu, JD & Starrett, D 2004, Aligning alignments exactly. in Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB. vol. 8, pp. 85-96, RECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology, San Diego, CA., United States, 3/27/04.
Kececioglu JD, Starrett D. Aligning alignments exactly. In Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB. Vol. 8. 2004. p. 85-96
Kececioglu, John D ; Starrett, Dean. / Aligning alignments exactly. Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB. Vol. 8 2004. pp. 85-96
@inproceedings{43b0f7be508348f683bbf5d68f7372b4,
title = "Aligning alignments exactly",
abstract = "A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.",
keywords = "Exact algorithms, Linear gap costs, Multiple sequence alignment, Sum of pairs",
author = "Kececioglu, {John D} and Dean Starrett",
year = "2004",
language = "English (US)",
volume = "8",
pages = "85--96",
booktitle = "Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB",

}

TY - GEN

T1 - Aligning alignments exactly

AU - Kececioglu, John D

AU - Starrett, Dean

PY - 2004

Y1 - 2004

N2 - A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.

AB - A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.

KW - Exact algorithms

KW - Linear gap costs

KW - Multiple sequence alignment

KW - Sum of pairs

UR - http://www.scopus.com/inward/record.url?scp=2442608622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442608622&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:2442608622

VL - 8

SP - 85

EP - 96

BT - Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB

ER -