Full-length messenger RNA sequences greatly improve genome annotation.

Brian J. Haas, Natalia Volfovsky, Christopher D. Town, Maxim Troukhan, Nickolai Alexandrov, Kenneth A Feldmann, Richard B. Flavell, Owen White, Steven L. Salzberg

Research output: Contribution to journalArticle

138 Citations (Scopus)

Abstract

BACKGROUND: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism. RESULTS: Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation. CONCLUSIONS: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.

Original languageEnglish (US)
JournalGenome Biology
Volume3
Issue number6
StatePublished - 2002
Externally publishedYes

Fingerprint

messenger RNA
RNA
genome
Genome
nucleotide sequences
Messenger RNA
gene
Genes
genes
Untranslated Regions
Transcription Initiation Site
Alternative Splicing
Arabidopsis
Introns
exons
introns
Exons
Software
transcription (genetics)
case studies

ASJC Scopus subject areas

  • Genetics

Cite this

Haas, B. J., Volfovsky, N., Town, C. D., Troukhan, M., Alexandrov, N., Feldmann, K. A., ... Salzberg, S. L. (2002). Full-length messenger RNA sequences greatly improve genome annotation. Genome Biology, 3(6).

Full-length messenger RNA sequences greatly improve genome annotation. / Haas, Brian J.; Volfovsky, Natalia; Town, Christopher D.; Troukhan, Maxim; Alexandrov, Nickolai; Feldmann, Kenneth A; Flavell, Richard B.; White, Owen; Salzberg, Steven L.

In: Genome Biology, Vol. 3, No. 6, 2002.

Research output: Contribution to journalArticle

Haas, BJ, Volfovsky, N, Town, CD, Troukhan, M, Alexandrov, N, Feldmann, KA, Flavell, RB, White, O & Salzberg, SL 2002, 'Full-length messenger RNA sequences greatly improve genome annotation.', Genome Biology, vol. 3, no. 6.
Haas BJ, Volfovsky N, Town CD, Troukhan M, Alexandrov N, Feldmann KA et al. Full-length messenger RNA sequences greatly improve genome annotation. Genome Biology. 2002;3(6).
Haas, Brian J. ; Volfovsky, Natalia ; Town, Christopher D. ; Troukhan, Maxim ; Alexandrov, Nickolai ; Feldmann, Kenneth A ; Flavell, Richard B. ; White, Owen ; Salzberg, Steven L. / Full-length messenger RNA sequences greatly improve genome annotation. In: Genome Biology. 2002 ; Vol. 3, No. 6.
@article{603a2fa96de04bc4950db3eb49a6f6fc,
title = "Full-length messenger RNA sequences greatly improve genome annotation.",
abstract = "BACKGROUND: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism. RESULTS: Approximately 35{\%} of the transcripts indicated that previously annotated genes needed modification, and 5{\%} of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation. CONCLUSIONS: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.",
author = "Haas, {Brian J.} and Natalia Volfovsky and Town, {Christopher D.} and Maxim Troukhan and Nickolai Alexandrov and Feldmann, {Kenneth A} and Flavell, {Richard B.} and Owen White and Salzberg, {Steven L.}",
year = "2002",
language = "English (US)",
volume = "3",
journal = "Genome Biology",
issn = "1474-7596",
publisher = "BioMed Central",
number = "6",

}

TY - JOUR

T1 - Full-length messenger RNA sequences greatly improve genome annotation.

AU - Haas, Brian J.

AU - Volfovsky, Natalia

AU - Town, Christopher D.

AU - Troukhan, Maxim

AU - Alexandrov, Nickolai

AU - Feldmann, Kenneth A

AU - Flavell, Richard B.

AU - White, Owen

AU - Salzberg, Steven L.

PY - 2002

Y1 - 2002

N2 - BACKGROUND: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism. RESULTS: Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation. CONCLUSIONS: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.

AB - BACKGROUND: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism. RESULTS: Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation. CONCLUSIONS: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.

UR - http://www.scopus.com/inward/record.url?scp=17344374428&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17344374428&partnerID=8YFLogxK

M3 - Article

C2 - 12093376

AN - SCOPUS:17344374428

VL - 3

JO - Genome Biology

JF - Genome Biology

SN - 1474-7596

IS - 6

ER -