Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement

John D Kececioglu, D. Sankoff

Research output: Contribution to journalArticle

172 Citations (Scopus)

Abstract

Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements, and reverses their order. For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal in O(n 2) time and 0(n) space for n-element permutations, and a branch- and-bound exact algorithm, that finds an optimal solution in 0(mL(n, n)) time and 0(n 2) space, where m is the size of the branch- and-bound search tree, and L(n, n) is the time to solve a linear program of n variables and n constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch- and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming. In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing by k random reversals, we find that the average upper bound on reversal distance estimates k to within one reversal for k<1/2n and n<100. For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals for n<50. Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.

Original languageEnglish (US)
Pages (from-to)180-210
Number of pages31
JournalAlgorithmica
Volume13
Issue number1-2
DOIs
StatePublished - Feb 1995
Externally publishedYes

Fingerprint

Genome Rearrangement
Approximation algorithms
Exact Algorithms
Reversal
Sorting
Approximation Algorithms
Genes
Chromosomes
Permutation
Random Permutation
Branch and Bound Algorithm
Greedy Algorithm
Linear programming
Chromosome
Series
Upper and Lower Bounds
Tightness
Computational Biology
Search Trees
Branch-and-bound

Keywords

  • Approximation algorithms
  • Branch- and-bound algorithms
  • Chromosome inversions
  • Computational biology
  • Edit distance
  • Experimental analysis of algorithms
  • Genome rearrangements
  • Permutations
  • Sorting by reversals

ASJC Scopus subject areas

  • Applied Mathematics
  • Safety, Risk, Reliability and Quality
  • Software
  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications
  • Computer Science(all)

Cite this

Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. / Kececioglu, John D; Sankoff, D.

In: Algorithmica, Vol. 13, No. 1-2, 02.1995, p. 180-210.

Research output: Contribution to journalArticle

@article{bd16c810848d47daaea40b02c6a05886,
title = "Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement",
abstract = "Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements, and reverses their order. For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal in O(n 2) time and 0(n) space for n-element permutations, and a branch- and-bound exact algorithm, that finds an optimal solution in 0(mL(n, n)) time and 0(n 2) space, where m is the size of the branch- and-bound search tree, and L(n, n) is the time to solve a linear program of n variables and n constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch- and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming. In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing by k random reversals, we find that the average upper bound on reversal distance estimates k to within one reversal for k<1/2n and n<100. For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals for n<50. Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.",
keywords = "Approximation algorithms, Branch- and-bound algorithms, Chromosome inversions, Computational biology, Edit distance, Experimental analysis of algorithms, Genome rearrangements, Permutations, Sorting by reversals",
author = "Kececioglu, {John D} and D. Sankoff",
year = "1995",
month = "2",
doi = "10.1007/BF01188586",
language = "English (US)",
volume = "13",
pages = "180--210",
journal = "Algorithmica",
issn = "0178-4617",
publisher = "Springer New York",
number = "1-2",

}

TY - JOUR

T1 - Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement

AU - Kececioglu, John D

AU - Sankoff, D.

PY - 1995/2

Y1 - 1995/2

N2 - Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements, and reverses their order. For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal in O(n 2) time and 0(n) space for n-element permutations, and a branch- and-bound exact algorithm, that finds an optimal solution in 0(mL(n, n)) time and 0(n 2) space, where m is the size of the branch- and-bound search tree, and L(n, n) is the time to solve a linear program of n variables and n constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch- and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming. In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing by k random reversals, we find that the average upper bound on reversal distance estimates k to within one reversal for k<1/2n and n<100. For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals for n<50. Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.

AB - Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements, and reverses their order. For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal in O(n 2) time and 0(n) space for n-element permutations, and a branch- and-bound exact algorithm, that finds an optimal solution in 0(mL(n, n)) time and 0(n 2) space, where m is the size of the branch- and-bound search tree, and L(n, n) is the time to solve a linear program of n variables and n constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch- and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming. In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing by k random reversals, we find that the average upper bound on reversal distance estimates k to within one reversal for k<1/2n and n<100. For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals for n<50. Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.

KW - Approximation algorithms

KW - Branch- and-bound algorithms

KW - Chromosome inversions

KW - Computational biology

KW - Edit distance

KW - Experimental analysis of algorithms

KW - Genome rearrangements

KW - Permutations

KW - Sorting by reversals

UR - http://www.scopus.com/inward/record.url?scp=0029185212&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029185212&partnerID=8YFLogxK

U2 - 10.1007/BF01188586

DO - 10.1007/BF01188586

M3 - Article

AN - SCOPUS:0029185212

VL - 13

SP - 180

EP - 210

JO - Algorithmica

JF - Algorithmica

SN - 0178-4617

IS - 1-2

ER -