With the increasing amount of DNA sequence data available from natural populations, new computational methods are needed to efficiently process raw sequences into formats that are applicable to a variety of analytical methods. One highly successful approach to inferring aspects of demographic history is grounded in coalescent theory. Many of these methods restrict themselves to perfectly tree-like genealogies (i.e. regions with no observed recombination), because theoretical difficulties prevent ready statistical evaluation of recombining regions. However, determining which recombination-filtered dataset to analyze from a larger recombination-rich genomic region is a non-trivial problem. Current applications primarily aim to quantify recombination rates (rather than produce optimal recombination-filtered blocks), require significant manual intervention, and are impractical for multiple genomic datasets in high-throughput, automated research environments. Here, we present a fast, simple and automatable command-line program that extracts optimal recombination-filtered blocks (no four-gamete violations) from recombination-rich genomic re-sequence data.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics