An accurate and powerful method for copy number variation detection

Feifei Xiao, Xizhi Luo, Ning - Hao, Yue Niu, Xiangjun Xiao, Guoshuai Cai, Christopher I. Amos, Heping Zhang

Research output: Contribution to journalArticle

Abstract

MOTIVATION: Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS: Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION: http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)2891-2898
Number of pages8
JournalBioinformatics (Oxford, England)
Volume35
Issue number17
DOIs
StatePublished - Sep 1 2019

Fingerprint

Signal detection
Bioinformatics
Chromosomes
Computational efficiency
Melanoma
Screening
Genes
Statistics
Availability
Statistical Power
Chromosome Deletion
Computational Biology
Software
Genome
Signal Detection
Change Point
Sensitivity and Specificity
Duplication
Computational Efficiency
Deletion

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

An accurate and powerful method for copy number variation detection. / Xiao, Feifei; Luo, Xizhi; Hao, Ning -; Niu, Yue; Xiao, Xiangjun; Cai, Guoshuai; Amos, Christopher I.; Zhang, Heping.

In: Bioinformatics (Oxford, England), Vol. 35, No. 17, 01.09.2019, p. 2891-2898.

Research output: Contribution to journalArticle

Xiao, Feifei ; Luo, Xizhi ; Hao, Ning - ; Niu, Yue ; Xiao, Xiangjun ; Cai, Guoshuai ; Amos, Christopher I. ; Zhang, Heping. / An accurate and powerful method for copy number variation detection. In: Bioinformatics (Oxford, England). 2019 ; Vol. 35, No. 17. pp. 2891-2898.
@article{9f32898648f24df29885230b0118b42d,
title = "An accurate and powerful method for copy number variation detection",
abstract = "MOTIVATION: Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS: Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION: http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.",
author = "Feifei Xiao and Xizhi Luo and Hao, {Ning -} and Yue Niu and Xiangjun Xiao and Guoshuai Cai and Amos, {Christopher I.} and Heping Zhang",
year = "2019",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/bty1041",
language = "English (US)",
volume = "35",
pages = "2891--2898",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "17",

}

TY - JOUR

T1 - An accurate and powerful method for copy number variation detection

AU - Xiao, Feifei

AU - Luo, Xizhi

AU - Hao, Ning -

AU - Niu, Yue

AU - Xiao, Xiangjun

AU - Cai, Guoshuai

AU - Amos, Christopher I.

AU - Zhang, Heping

PY - 2019/9/1

Y1 - 2019/9/1

N2 - MOTIVATION: Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS: Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION: http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS: Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION: http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85072051497&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072051497&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty1041

DO - 10.1093/bioinformatics/bty1041

M3 - Article

C2 - 30649252

AN - SCOPUS:85072051497

VL - 35

SP - 2891

EP - 2898

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

ER -