An online copy number variant detection method for short sequencing reads

Ayten Yiğiter, Jie Chen, Lingling An, Nazan Danacioğlu

Research output: Contribution to journalArticle

Abstract

The availability of the next generation sequencing (NGS) technology in today's biomedical research has provided new opportunities in scientific discovery of genetic information. The high-throughput NGS technology, especially DNA-seq, is particularly useful in profiling a genome for the analysis of DNA copy number variants (CNVs). The read count (RC) data resulting from NGS technology are massive and information rich. How to exploit the RC data for accurate CNV detection has become a computational and statistical challenge. We provide a statistical online change point method to help detect CNVs in the sequencing RC data in this paper. This method uses the idea of online searching for change point (or breakpoint) with a Markov chain assumption on the breakpoints loci and an iterative computing process via a Bayesian framework. We illustrate that an online change-point detection method is particularly suitable for identifying CNVs in the RC data. The algorithm is applied to the publicly available NCI-H2347 lung cancer cell line sequencing reads data for locating the breakpoints. Extensive simulation studies have been carried out and results show the good behavior of the proposed algorithm. The algorithm is implemented in R and the codes are available upon request.

Original languageEnglish (US)
Pages (from-to)1556-1571
Number of pages16
JournalJournal of Applied Statistics
Volume42
Issue number7
DOIs
StatePublished - Jul 3 2015

Fingerprint

Count Data
Sequencing
Change Point
Change-point Detection
Lung Cancer
Profiling
High Throughput
Locus
Markov chain
Genome
Availability
Simulation Study
Count data
Computing
Line
Cell
Change point

Keywords

  • Bayesian estimate
  • change point (or breakpoint)
  • DNA copy number variation
  • next generation sequencing
  • online change-point detection method

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

An online copy number variant detection method for short sequencing reads. / Yiğiter, Ayten; Chen, Jie; An, Lingling; Danacioğlu, Nazan.

In: Journal of Applied Statistics, Vol. 42, No. 7, 03.07.2015, p. 1556-1571.

Research output: Contribution to journalArticle

Yiğiter, Ayten ; Chen, Jie ; An, Lingling ; Danacioğlu, Nazan. / An online copy number variant detection method for short sequencing reads. In: Journal of Applied Statistics. 2015 ; Vol. 42, No. 7. pp. 1556-1571.
@article{2c2d8043e81b4cd7ab7afc477ce4e986,
title = "An online copy number variant detection method for short sequencing reads",
abstract = "The availability of the next generation sequencing (NGS) technology in today's biomedical research has provided new opportunities in scientific discovery of genetic information. The high-throughput NGS technology, especially DNA-seq, is particularly useful in profiling a genome for the analysis of DNA copy number variants (CNVs). The read count (RC) data resulting from NGS technology are massive and information rich. How to exploit the RC data for accurate CNV detection has become a computational and statistical challenge. We provide a statistical online change point method to help detect CNVs in the sequencing RC data in this paper. This method uses the idea of online searching for change point (or breakpoint) with a Markov chain assumption on the breakpoints loci and an iterative computing process via a Bayesian framework. We illustrate that an online change-point detection method is particularly suitable for identifying CNVs in the RC data. The algorithm is applied to the publicly available NCI-H2347 lung cancer cell line sequencing reads data for locating the breakpoints. Extensive simulation studies have been carried out and results show the good behavior of the proposed algorithm. The algorithm is implemented in R and the codes are available upon request.",
keywords = "Bayesian estimate, change point (or breakpoint), DNA copy number variation, next generation sequencing, online change-point detection method",
author = "Ayten Yiğiter and Jie Chen and Lingling An and Nazan Danacioğlu",
year = "2015",
month = "7",
day = "3",
doi = "10.1080/02664763.2014.1001330",
language = "English (US)",
volume = "42",
pages = "1556--1571",
journal = "Journal of Applied Statistics",
issn = "0266-4763",
publisher = "Routledge",
number = "7",

}

TY - JOUR

T1 - An online copy number variant detection method for short sequencing reads

AU - Yiğiter, Ayten

AU - Chen, Jie

AU - An, Lingling

AU - Danacioğlu, Nazan

PY - 2015/7/3

Y1 - 2015/7/3

N2 - The availability of the next generation sequencing (NGS) technology in today's biomedical research has provided new opportunities in scientific discovery of genetic information. The high-throughput NGS technology, especially DNA-seq, is particularly useful in profiling a genome for the analysis of DNA copy number variants (CNVs). The read count (RC) data resulting from NGS technology are massive and information rich. How to exploit the RC data for accurate CNV detection has become a computational and statistical challenge. We provide a statistical online change point method to help detect CNVs in the sequencing RC data in this paper. This method uses the idea of online searching for change point (or breakpoint) with a Markov chain assumption on the breakpoints loci and an iterative computing process via a Bayesian framework. We illustrate that an online change-point detection method is particularly suitable for identifying CNVs in the RC data. The algorithm is applied to the publicly available NCI-H2347 lung cancer cell line sequencing reads data for locating the breakpoints. Extensive simulation studies have been carried out and results show the good behavior of the proposed algorithm. The algorithm is implemented in R and the codes are available upon request.

AB - The availability of the next generation sequencing (NGS) technology in today's biomedical research has provided new opportunities in scientific discovery of genetic information. The high-throughput NGS technology, especially DNA-seq, is particularly useful in profiling a genome for the analysis of DNA copy number variants (CNVs). The read count (RC) data resulting from NGS technology are massive and information rich. How to exploit the RC data for accurate CNV detection has become a computational and statistical challenge. We provide a statistical online change point method to help detect CNVs in the sequencing RC data in this paper. This method uses the idea of online searching for change point (or breakpoint) with a Markov chain assumption on the breakpoints loci and an iterative computing process via a Bayesian framework. We illustrate that an online change-point detection method is particularly suitable for identifying CNVs in the RC data. The algorithm is applied to the publicly available NCI-H2347 lung cancer cell line sequencing reads data for locating the breakpoints. Extensive simulation studies have been carried out and results show the good behavior of the proposed algorithm. The algorithm is implemented in R and the codes are available upon request.

KW - Bayesian estimate

KW - change point (or breakpoint)

KW - DNA copy number variation

KW - next generation sequencing

KW - online change-point detection method

UR - http://www.scopus.com/inward/record.url?scp=84928618340&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84928618340&partnerID=8YFLogxK

U2 - 10.1080/02664763.2014.1001330

DO - 10.1080/02664763.2014.1001330

M3 - Article

VL - 42

SP - 1556

EP - 1571

JO - Journal of Applied Statistics

JF - Journal of Applied Statistics

SN - 0266-4763

IS - 7

ER -