### Abstract

We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor (d + 1)/(d - 1) of the minimum in Q(k^{d}T(d, n)+k^{2d}n^{2}) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T(d, n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set script capital L sign of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in script capital L sign. The time T(d, n) is O(d2^{d}n^{d}), given O(ds^{d+l})-time preprocessing for an alphabet of size s.) The approximation algorithm is conceptually simple and easy to implement, and actually applies to any metric space in which a Steiner point for any fixed-sized set can be computed in polynomial time. We also introduce a new problem, bottleneck tree-alignment, in which the objective is to label the internal nodes of the tree so as to minimize the length of the longest edge. We describe an exponential-time exact algorithm for the case of unit-cost edit operations, and show there is a simple linear-time approximation algorithm for the general case that finds a solution within a factor O(log k) of the minimum. 1998 Published by Elsevier Science B.V. All rights reserved.

Original language | English (US) |
---|---|

Pages (from-to) | 355-366 |

Number of pages | 12 |

Journal | Discrete Applied Mathematics |

Volume | 88 |

Issue number | 1-3 |

DOIs | |

State | Published - Nov 9 1998 |

Externally published | Yes |

### Keywords

- Approximation algorithms
- Computational biology
- Evolutionary trees
- Multiple sequence alignment

### ASJC Scopus subject areas

- Discrete Mathematics and Combinatorics
- Applied Mathematics