### Abstract

A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB |

Pages | 85-96 |

Number of pages | 12 |

Volume | 8 |

State | Published - 2004 |

Event | RECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology - San Diego, CA., United States Duration: Mar 27 2004 → Mar 31 2004 |

### Other

Other | RECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology |
---|---|

Country | United States |

City | San Diego, CA. |

Period | 3/27/04 → 3/31/04 |

### Fingerprint

### Keywords

- Exact algorithms
- Linear gap costs
- Multiple sequence alignment
- Sum of pairs

### ASJC Scopus subject areas

- Biochemistry, Genetics and Molecular Biology(all)
- Computer Science(all)

### Cite this

*Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB*(Vol. 8, pp. 85-96)

**Aligning alignments exactly.** / Kececioglu, John D; Starrett, Dean.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB.*vol. 8, pp. 85-96, RECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology, San Diego, CA., United States, 3/27/04.

}

TY - GEN

T1 - Aligning alignments exactly

AU - Kececioglu, John D

AU - Starrett, Dean

PY - 2004

Y1 - 2004

N2 - A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.

AB - A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.

KW - Exact algorithms

KW - Linear gap costs

KW - Multiple sequence alignment

KW - Sum of pairs

UR - http://www.scopus.com/inward/record.url?scp=2442608622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442608622&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:2442608622

VL - 8

SP - 85

EP - 96

BT - Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB

ER -