Data Descriptor: Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data

Jianwei - Zhang, Ling Ling Chen, Shuai Sun, David A Kudrna, Dario Copetti, Weiming Li, Ting Mu, Wen Biao Jiao, Feng Xing, Seunghee Lee, Jayson Talag, Jia Ming Song, Bogu Du, Weibo Xie, Meizhong Luo, Carlos Ernesto Maldonado, Jose Luis Goicoechea, Lizhong Xiong, Changyin Wu, Yongzhong XingDao Xiu Zhou, Sibin Yu, Yu Zhao, Gongwei Wang, Yeisoo Yu, Yijie Luo, Beatriz Elena Padilla Hurtado, Ann Danowitz, Rod A Wing, Qifa Zhang

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ∼97 (ZS97) and ∼74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.

Original languageEnglish (US)
Article number160076
JournalScientific data
Volume3
DOIs
StatePublished - Sep 13 2016

Fingerprint

Sequencing
Descriptors
Genome
Genes
Technology
Oryza Sativa
Platinum
Firearms
Structure-function
Clone
Annotation
Rice
Oryza
Clone Cells
Research
Datasets

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Computer Science Applications
  • Information Systems
  • Statistics, Probability and Uncertainty
  • Statistics and Probability
  • Medicine(all)

Cite this

Data Descriptor : Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data. / Zhang, Jianwei -; Chen, Ling Ling; Sun, Shuai; Kudrna, David A; Copetti, Dario; Li, Weiming; Mu, Ting; Jiao, Wen Biao; Xing, Feng; Lee, Seunghee; Talag, Jayson; Song, Jia Ming; Du, Bogu; Xie, Weibo; Luo, Meizhong; Maldonado, Carlos Ernesto; Goicoechea, Jose Luis; Xiong, Lizhong; Wu, Changyin; Xing, Yongzhong; Zhou, Dao Xiu; Yu, Sibin; Zhao, Yu; Wang, Gongwei; Yu, Yeisoo; Luo, Yijie; Hurtado, Beatriz Elena Padilla; Danowitz, Ann; Wing, Rod A; Zhang, Qifa.

In: Scientific data, Vol. 3, 160076, 13.09.2016.

Research output: Contribution to journalArticle

Zhang, J, Chen, LL, Sun, S, Kudrna, DA, Copetti, D, Li, W, Mu, T, Jiao, WB, Xing, F, Lee, S, Talag, J, Song, JM, Du, B, Xie, W, Luo, M, Maldonado, CE, Goicoechea, JL, Xiong, L, Wu, C, Xing, Y, Zhou, DX, Yu, S, Zhao, Y, Wang, G, Yu, Y, Luo, Y, Hurtado, BEP, Danowitz, A, Wing, RA & Zhang, Q 2016, 'Data Descriptor: Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data', Scientific data, vol. 3, 160076. https://doi.org/10.1038/sdata.2016.76
Zhang, Jianwei - ; Chen, Ling Ling ; Sun, Shuai ; Kudrna, David A ; Copetti, Dario ; Li, Weiming ; Mu, Ting ; Jiao, Wen Biao ; Xing, Feng ; Lee, Seunghee ; Talag, Jayson ; Song, Jia Ming ; Du, Bogu ; Xie, Weibo ; Luo, Meizhong ; Maldonado, Carlos Ernesto ; Goicoechea, Jose Luis ; Xiong, Lizhong ; Wu, Changyin ; Xing, Yongzhong ; Zhou, Dao Xiu ; Yu, Sibin ; Zhao, Yu ; Wang, Gongwei ; Yu, Yeisoo ; Luo, Yijie ; Hurtado, Beatriz Elena Padilla ; Danowitz, Ann ; Wing, Rod A ; Zhang, Qifa. / Data Descriptor : Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data. In: Scientific data. 2016 ; Vol. 3.
@article{94e09ef861a347539442dada8b3eec8c,
title = "Data Descriptor: Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data",
abstract = "Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ∼97 (ZS97) and ∼74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.",
author = "Zhang, {Jianwei -} and Chen, {Ling Ling} and Shuai Sun and Kudrna, {David A} and Dario Copetti and Weiming Li and Ting Mu and Jiao, {Wen Biao} and Feng Xing and Seunghee Lee and Jayson Talag and Song, {Jia Ming} and Bogu Du and Weibo Xie and Meizhong Luo and Maldonado, {Carlos Ernesto} and Goicoechea, {Jose Luis} and Lizhong Xiong and Changyin Wu and Yongzhong Xing and Zhou, {Dao Xiu} and Sibin Yu and Yu Zhao and Gongwei Wang and Yeisoo Yu and Yijie Luo and Hurtado, {Beatriz Elena Padilla} and Ann Danowitz and Wing, {Rod A} and Qifa Zhang",
year = "2016",
month = "9",
day = "13",
doi = "10.1038/sdata.2016.76",
language = "English (US)",
volume = "3",
journal = "Scientific data",
issn = "2052-4463",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Data Descriptor

T2 - Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data

AU - Zhang, Jianwei -

AU - Chen, Ling Ling

AU - Sun, Shuai

AU - Kudrna, David A

AU - Copetti, Dario

AU - Li, Weiming

AU - Mu, Ting

AU - Jiao, Wen Biao

AU - Xing, Feng

AU - Lee, Seunghee

AU - Talag, Jayson

AU - Song, Jia Ming

AU - Du, Bogu

AU - Xie, Weibo

AU - Luo, Meizhong

AU - Maldonado, Carlos Ernesto

AU - Goicoechea, Jose Luis

AU - Xiong, Lizhong

AU - Wu, Changyin

AU - Xing, Yongzhong

AU - Zhou, Dao Xiu

AU - Yu, Sibin

AU - Zhao, Yu

AU - Wang, Gongwei

AU - Yu, Yeisoo

AU - Luo, Yijie

AU - Hurtado, Beatriz Elena Padilla

AU - Danowitz, Ann

AU - Wing, Rod A

AU - Zhang, Qifa

PY - 2016/9/13

Y1 - 2016/9/13

N2 - Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ∼97 (ZS97) and ∼74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.

AB - Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ∼97 (ZS97) and ∼74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.

UR - http://www.scopus.com/inward/record.url?scp=84987925169&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84987925169&partnerID=8YFLogxK

U2 - 10.1038/sdata.2016.76

DO - 10.1038/sdata.2016.76

M3 - Article

C2 - 27622467

AN - SCOPUS:84987925169

VL - 3

JO - Scientific data

JF - Scientific data

SN - 2052-4463

M1 - 160076

ER -