Extending assembly of short DNA sequences to handle error

William R. Jeck, Josephine A. Reinhardt, David A Baltrus, Matthew T. Hickenbotham, Vincent Magrini, Elaine R. Mardis, Jeffery L. Dangl, Corbin D. Jones

Research output: Contribution to journalArticle

168 Citations (Scopus)

Abstract

Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads (∼30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error.

Original languageEnglish (US)
Pages (from-to)2942-2944
Number of pages3
JournalBioinformatics
Volume23
Issue number21
DOIs
StatePublished - Nov 2007
Externally publishedYes

Fingerprint

DNA sequences
DNA Sequence
Sequencing
Genome
Technology
Coverage
Genes
Genomics
Error Rate
Partial
Datasets

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Jeck, W. R., Reinhardt, J. A., Baltrus, D. A., Hickenbotham, M. T., Magrini, V., Mardis, E. R., ... Jones, C. D. (2007). Extending assembly of short DNA sequences to handle error. Bioinformatics, 23(21), 2942-2944. https://doi.org/10.1093/bioinformatics/btm451

Extending assembly of short DNA sequences to handle error. / Jeck, William R.; Reinhardt, Josephine A.; Baltrus, David A; Hickenbotham, Matthew T.; Magrini, Vincent; Mardis, Elaine R.; Dangl, Jeffery L.; Jones, Corbin D.

In: Bioinformatics, Vol. 23, No. 21, 11.2007, p. 2942-2944.

Research output: Contribution to journalArticle

Jeck, WR, Reinhardt, JA, Baltrus, DA, Hickenbotham, MT, Magrini, V, Mardis, ER, Dangl, JL & Jones, CD 2007, 'Extending assembly of short DNA sequences to handle error', Bioinformatics, vol. 23, no. 21, pp. 2942-2944. https://doi.org/10.1093/bioinformatics/btm451
Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V, Mardis ER et al. Extending assembly of short DNA sequences to handle error. Bioinformatics. 2007 Nov;23(21):2942-2944. https://doi.org/10.1093/bioinformatics/btm451
Jeck, William R. ; Reinhardt, Josephine A. ; Baltrus, David A ; Hickenbotham, Matthew T. ; Magrini, Vincent ; Mardis, Elaine R. ; Dangl, Jeffery L. ; Jones, Corbin D. / Extending assembly of short DNA sequences to handle error. In: Bioinformatics. 2007 ; Vol. 23, No. 21. pp. 2942-2944.
@article{59a0994f433b4f56a8691e7c90f9ca9e,
title = "Extending assembly of short DNA sequences to handle error",
abstract = "Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads (∼30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error.",
author = "Jeck, {William R.} and Reinhardt, {Josephine A.} and Baltrus, {David A} and Hickenbotham, {Matthew T.} and Vincent Magrini and Mardis, {Elaine R.} and Dangl, {Jeffery L.} and Jones, {Corbin D.}",
year = "2007",
month = "11",
doi = "10.1093/bioinformatics/btm451",
language = "English (US)",
volume = "23",
pages = "2942--2944",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "21",

}

TY - JOUR

T1 - Extending assembly of short DNA sequences to handle error

AU - Jeck, William R.

AU - Reinhardt, Josephine A.

AU - Baltrus, David A

AU - Hickenbotham, Matthew T.

AU - Magrini, Vincent

AU - Mardis, Elaine R.

AU - Dangl, Jeffery L.

AU - Jones, Corbin D.

PY - 2007/11

Y1 - 2007/11

N2 - Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads (∼30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error.

AB - Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads (∼30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error.

UR - http://www.scopus.com/inward/record.url?scp=36448948250&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36448948250&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm451

DO - 10.1093/bioinformatics/btm451

M3 - Article

C2 - 17893086

AN - SCOPUS:36448948250

VL - 23

SP - 2942

EP - 2944

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 21

ER -