Combinatorial algorithms for DNA sequence assembly

J. D. Kececioglu, E. W. Myers

Research output: Contribution to journalArticle

134 Scopus citations

Abstract

The trend toward very large DNA sequencing projects, such as those being undertaken as part of the Human Genome Program, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NP-hard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four-phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates, and list a series of alternate solutions in the event that several appear equally good. Moreover, it uses a limited form of multiple sequence alignment to detect, and often correct, errors in the data. Our combined algorithm has successfully reconstructed nonrepetitive sequences of length 50,000 sampled at error rates of as high as 10%.

Original languageEnglish (US)
Pages (from-to)7-51
Number of pages45
JournalAlgorithmica
Volume13
Issue number1-2
DOIs
StatePublished - Feb 1 1995
Externally publishedYes

Keywords

  • Approximation algorithms
  • Branch- and-bound algorithms
  • Computational biology
  • Fragment assembly
  • Sequence reconstruction

ASJC Scopus subject areas

  • Computer Science(all)
  • Computer Science Applications
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Combinatorial algorithms for DNA sequence assembly'. Together they form a unique fingerprint.

  • Cite this