Aligning alignments exactly

John Kececioglu, Dean Starrett

Research output: Contribution to conferencePaperpeer-review

28 Scopus citations

Abstract

A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.

Original languageEnglish (US)
Pages85-96
Number of pages12
DOIs
StatePublished - 2004
EventRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology - San Diego, CA., United States
Duration: Mar 27 2004Mar 31 2004

Other

OtherRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology
CountryUnited States
CitySan Diego, CA.
Period3/27/043/31/04

Keywords

  • Exact algorithms
  • Linear gap costs
  • Multiple sequence alignment
  • Sum of pairs

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)

Fingerprint Dive into the research topics of 'Aligning alignments exactly'. Together they form a unique fingerprint.

Cite this