Recombination-filtered genomic datasets by information maximization

August E. Woerner, Murray P. Cox, Michael F Hammer

Research output: Contribution to journalArticle

156 Citations (Scopus)

Abstract

With the increasing amount of DNA sequence data available from natural populations, new computational methods are needed to efficiently process raw sequences into formats that are applicable to a variety of analytical methods. One highly successful approach to inferring aspects of demographic history is grounded in coalescent theory. Many of these methods restrict themselves to perfectly tree-like genealogies (i.e. regions with no observed recombination), because theoretical difficulties prevent ready statistical evaluation of recombining regions. However, determining which recombination-filtered dataset to analyze from a larger recombination-rich genomic region is a non-trivial problem. Current applications primarily aim to quantify recombination rates (rather than produce optimal recombination-filtered blocks), require significant manual intervention, and are impractical for multiple genomic datasets in high-throughput, automated research environments. Here, we present a fast, simple and automatable command-line program that extracts optimal recombination-filtered blocks (no four-gamete violations) from recombination-rich genomic re-sequence data.

Original languageEnglish (US)
Pages (from-to)1851-1853
Number of pages3
JournalBioinformatics
Volume23
Issue number14
DOIs
StatePublished - Jul 15 2007

Fingerprint

DNA sequences
Computational methods
Recombination
Genetic Recombination
Genomics
Throughput
Genealogy
Genealogy and Heraldry
Datasets
Analytical Methods
Germ Cells
DNA Sequence
Computational Methods
High Throughput
Quantify
Demography
Line
Evaluation
Research
Population

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Recombination-filtered genomic datasets by information maximization. / Woerner, August E.; Cox, Murray P.; Hammer, Michael F.

In: Bioinformatics, Vol. 23, No. 14, 15.07.2007, p. 1851-1853.

Research output: Contribution to journalArticle

Woerner, August E. ; Cox, Murray P. ; Hammer, Michael F. / Recombination-filtered genomic datasets by information maximization. In: Bioinformatics. 2007 ; Vol. 23, No. 14. pp. 1851-1853.
@article{d295515c230f44358c4ea538e6c58e26,
title = "Recombination-filtered genomic datasets by information maximization",
abstract = "With the increasing amount of DNA sequence data available from natural populations, new computational methods are needed to efficiently process raw sequences into formats that are applicable to a variety of analytical methods. One highly successful approach to inferring aspects of demographic history is grounded in coalescent theory. Many of these methods restrict themselves to perfectly tree-like genealogies (i.e. regions with no observed recombination), because theoretical difficulties prevent ready statistical evaluation of recombining regions. However, determining which recombination-filtered dataset to analyze from a larger recombination-rich genomic region is a non-trivial problem. Current applications primarily aim to quantify recombination rates (rather than produce optimal recombination-filtered blocks), require significant manual intervention, and are impractical for multiple genomic datasets in high-throughput, automated research environments. Here, we present a fast, simple and automatable command-line program that extracts optimal recombination-filtered blocks (no four-gamete violations) from recombination-rich genomic re-sequence data.",
author = "Woerner, {August E.} and Cox, {Murray P.} and Hammer, {Michael F}",
year = "2007",
month = "7",
day = "15",
doi = "10.1093/bioinformatics/btm253",
language = "English (US)",
volume = "23",
pages = "1851--1853",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "14",

}

TY - JOUR

T1 - Recombination-filtered genomic datasets by information maximization

AU - Woerner, August E.

AU - Cox, Murray P.

AU - Hammer, Michael F

PY - 2007/7/15

Y1 - 2007/7/15

N2 - With the increasing amount of DNA sequence data available from natural populations, new computational methods are needed to efficiently process raw sequences into formats that are applicable to a variety of analytical methods. One highly successful approach to inferring aspects of demographic history is grounded in coalescent theory. Many of these methods restrict themselves to perfectly tree-like genealogies (i.e. regions with no observed recombination), because theoretical difficulties prevent ready statistical evaluation of recombining regions. However, determining which recombination-filtered dataset to analyze from a larger recombination-rich genomic region is a non-trivial problem. Current applications primarily aim to quantify recombination rates (rather than produce optimal recombination-filtered blocks), require significant manual intervention, and are impractical for multiple genomic datasets in high-throughput, automated research environments. Here, we present a fast, simple and automatable command-line program that extracts optimal recombination-filtered blocks (no four-gamete violations) from recombination-rich genomic re-sequence data.

AB - With the increasing amount of DNA sequence data available from natural populations, new computational methods are needed to efficiently process raw sequences into formats that are applicable to a variety of analytical methods. One highly successful approach to inferring aspects of demographic history is grounded in coalescent theory. Many of these methods restrict themselves to perfectly tree-like genealogies (i.e. regions with no observed recombination), because theoretical difficulties prevent ready statistical evaluation of recombining regions. However, determining which recombination-filtered dataset to analyze from a larger recombination-rich genomic region is a non-trivial problem. Current applications primarily aim to quantify recombination rates (rather than produce optimal recombination-filtered blocks), require significant manual intervention, and are impractical for multiple genomic datasets in high-throughput, automated research environments. Here, we present a fast, simple and automatable command-line program that extracts optimal recombination-filtered blocks (no four-gamete violations) from recombination-rich genomic re-sequence data.

UR - http://www.scopus.com/inward/record.url?scp=34547863856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547863856&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm253

DO - 10.1093/bioinformatics/btm253

M3 - Article

C2 - 17519249

AN - SCOPUS:34547863856

VL - 23

SP - 1851

EP - 1853

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 14

ER -