Aligning alignments

John D Kececioglu, Weiqing Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally difficult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give efficient algorithms for the sequence vs. alignment, sequence vs. profile, alignment vs. alignment, and profile vs. profile variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the first provably efficient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidatelist technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages189-208
Number of pages20
Volume1448 LNCS
StatePublished - 1998
Externally publishedYes
Event9th Annual Symposium on Combinatorial Pattern Matching, CPM 1998 - Piscataway, NJ, United States
Duration: Jul 20 1998Jul 22 1998

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1448 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other9th Annual Symposium on Combinatorial Pattern Matching, CPM 1998
CountryUnited States
CityPiscataway, NJ
Period7/20/987/22/98

Fingerprint

Alignment
Count
Efficient Algorithms
Sequence Comparison
Multiple Sequence Alignment
Sequence Alignment
Computational Biology
Costs
NP-complete problem
Objective function
Heuristics

Keywords

  • Affine gap costs
  • Profiles 1 Introduction While
  • Quasi-natural gap costs
  • Sequence comparison
  • Sum-of-pMrs alignment

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Kececioglu, J. D., & Zhang, W. (1998). Aligning alignments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1448 LNCS, pp. 189-208). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1448 LNCS).

Aligning alignments. / Kececioglu, John D; Zhang, Weiqing.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1448 LNCS 1998. p. 189-208 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1448 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kececioglu, JD & Zhang, W 1998, Aligning alignments. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1448 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1448 LNCS, pp. 189-208, 9th Annual Symposium on Combinatorial Pattern Matching, CPM 1998, Piscataway, NJ, United States, 7/20/98.
Kececioglu JD, Zhang W. Aligning alignments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1448 LNCS. 1998. p. 189-208. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Kececioglu, John D ; Zhang, Weiqing. / Aligning alignments. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1448 LNCS 1998. pp. 189-208 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{78c059590730410898dd57b003978240,
title = "Aligning alignments",
abstract = "While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally difficult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give efficient algorithms for the sequence vs. alignment, sequence vs. profile, alignment vs. alignment, and profile vs. profile variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the first provably efficient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidatelist technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.",
keywords = "Affine gap costs, Profiles 1 Introduction While, Quasi-natural gap costs, Sequence comparison, Sum-of-pMrs alignment",
author = "Kececioglu, {John D} and Weiqing Zhang",
year = "1998",
language = "English (US)",
isbn = "3540647392",
volume = "1448 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "189--208",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Aligning alignments

AU - Kececioglu, John D

AU - Zhang, Weiqing

PY - 1998

Y1 - 1998

N2 - While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally difficult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give efficient algorithms for the sequence vs. alignment, sequence vs. profile, alignment vs. alignment, and profile vs. profile variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the first provably efficient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidatelist technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.

AB - While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally difficult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give efficient algorithms for the sequence vs. alignment, sequence vs. profile, alignment vs. alignment, and profile vs. profile variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the first provably efficient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidatelist technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.

KW - Affine gap costs

KW - Profiles 1 Introduction While

KW - Quasi-natural gap costs

KW - Sequence comparison

KW - Sum-of-pMrs alignment

UR - http://www.scopus.com/inward/record.url?scp=84877316776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877316776&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84877316776

SN - 3540647392

SN - 9783540647393

VL - 1448 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 189

EP - 208

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -