A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data

Miao Zhang, Yiwen Liu, Hua Zhou, Joseph Watkins, Jin Zhou

Research output: Contribution to journalArticlepeer-review

Abstract

BACKGROUND: Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce MCPCA_PopGen to analyze population structure of low-depth sequencing data. RESULTS: The method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. CONCLUSIONS: We apply MCPCA_PopGen to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The MCPCA_PopGen package is available on https://github.com/yiwenstat/MCPCA_PopGen .

Original languageEnglish (US)
Pages (from-to)348
Number of pages1
JournalBMC bioinformatics
Volume22
Issue number1
DOIs
StatePublished - Jun 26 2021

Keywords

  • Data-adaptive
  • Dimension reduction
  • Low-coverage
  • Non-linear kernel
  • Population structure

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data'. Together they form a unique fingerprint.

Cite this