Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs

Carol Soderlund, Anne Descour, David A Kudrna, Matthew Bomhoff, Lomax Boyd, Jennifer Currie, Angelina Angelova, Kristi Collura, Marina Wissotski, Elizabeth Ashley, Darren Morrow, John Fernandes, Virginia Walbot, Yeisoo Yu

Research output: Contribution to journalArticle

111 Citations (Scopus)

Abstract

Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 59 and 39 UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org).

Original languageEnglish (US)
Article numbere1000740
JournalPLoS Genetics
Volume5
Issue number11
DOIs
StatePublished - Nov 2009

Fingerprint

Zea mays
Complementary DNA
maize
protein
corn
genome
Untranslated Regions
DNA Transposable Elements
gene
Expressed Sequence Tags
sorghum
Libraries
transposons
rice
amino acid
Genome
Proteins
Sorghum
Nucleic Acid Databases
resource

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Ecology, Evolution, Behavior and Systematics
  • Cancer Research
  • Genetics(clinical)

Cite this

Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. / Soderlund, Carol; Descour, Anne; Kudrna, David A; Bomhoff, Matthew; Boyd, Lomax; Currie, Jennifer; Angelova, Angelina; Collura, Kristi; Wissotski, Marina; Ashley, Elizabeth; Morrow, Darren; Fernandes, John; Walbot, Virginia; Yu, Yeisoo.

In: PLoS Genetics, Vol. 5, No. 11, e1000740, 11.2009.

Research output: Contribution to journalArticle

Soderlund, C, Descour, A, Kudrna, DA, Bomhoff, M, Boyd, L, Currie, J, Angelova, A, Collura, K, Wissotski, M, Ashley, E, Morrow, D, Fernandes, J, Walbot, V & Yu, Y 2009, 'Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs', PLoS Genetics, vol. 5, no. 11, e1000740. https://doi.org/10.1371/journal.pgen.1000740
Soderlund, Carol ; Descour, Anne ; Kudrna, David A ; Bomhoff, Matthew ; Boyd, Lomax ; Currie, Jennifer ; Angelova, Angelina ; Collura, Kristi ; Wissotski, Marina ; Ashley, Elizabeth ; Morrow, Darren ; Fernandes, John ; Walbot, Virginia ; Yu, Yeisoo. / Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. In: PLoS Genetics. 2009 ; Vol. 5, No. 11.
@article{58c419155e8e438f8d57db0a011db3fd,
title = "Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs",
abstract = "Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 59 and 39 UTR, respectively, with 8.6{\%} of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94{\%} of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6{\%} of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2{\%} of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88{\%} have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org).",
author = "Carol Soderlund and Anne Descour and Kudrna, {David A} and Matthew Bomhoff and Lomax Boyd and Jennifer Currie and Angelina Angelova and Kristi Collura and Marina Wissotski and Elizabeth Ashley and Darren Morrow and John Fernandes and Virginia Walbot and Yeisoo Yu",
year = "2009",
month = "11",
doi = "10.1371/journal.pgen.1000740",
language = "English (US)",
volume = "5",
journal = "PLoS Genetics",
issn = "1553-7390",
publisher = "Public Library of Science",
number = "11",

}

TY - JOUR

T1 - Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs

AU - Soderlund, Carol

AU - Descour, Anne

AU - Kudrna, David A

AU - Bomhoff, Matthew

AU - Boyd, Lomax

AU - Currie, Jennifer

AU - Angelova, Angelina

AU - Collura, Kristi

AU - Wissotski, Marina

AU - Ashley, Elizabeth

AU - Morrow, Darren

AU - Fernandes, John

AU - Walbot, Virginia

AU - Yu, Yeisoo

PY - 2009/11

Y1 - 2009/11

N2 - Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 59 and 39 UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org).

AB - Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 59 and 39 UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org).

UR - http://www.scopus.com/inward/record.url?scp=73649094751&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73649094751&partnerID=8YFLogxK

U2 - 10.1371/journal.pgen.1000740

DO - 10.1371/journal.pgen.1000740

M3 - Article

C2 - 19936069

AN - SCOPUS:73649094751

VL - 5

JO - PLoS Genetics

JF - PLoS Genetics

SN - 1553-7390

IS - 11

M1 - e1000740

ER -