Arabidopsis intragenomic conserved noncoding sequence

Brian C. Thomas, Lakshmi Rapaka, Eric H Lyons, Brent Pedersen, Michael Freeling

Research output: Contribution to journalArticle

42 Citations (Scopus)

Abstract

After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequences from the bl2seq data, we created a database of 14,944 intragenomic Arabidopsis CNSs. The mean CNS length is 31 bp, ranging from 15 to 285 bp. There are ≈1.7 CNSs associated with a typical gene, and Arabidopsis CNSs are found in all areas around exons, most frequently in the 5′ upstream region. Gene ontology classifications related to transcription, regulation, or "response to..." external or endogenous stimuli, especially hormones, tend to be significantly overrepresented among genes containing a large number of CNSs, whereas protein localization, transport, and metabolism are common among genes with no CNSs. There is a 1.5% overlap between these CNSs and the 218,982 putative RNAs in the Arabidopsis Small RNA Project database, allowing for two mismatches. These CNSs provide a unique set of noncoding sequences enriched for function. CMS function is implied by evolutionary conservation and independently supported because CNS-richness predicts regulatory gene ontology categories.

Original languageEnglish (US)
Pages (from-to)3348-3353
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume104
Issue number9
DOIs
StatePublished - Feb 27 2007
Externally publishedYes

Fingerprint

Conserved Sequence
Arabidopsis
Genes
Gene Ontology
Exons
Tetraploidy
Nucleic Acid Databases
Protein Transport
Regulator Genes
Genome
Databases
Hormones
RNA

Keywords

  • Gene regulation
  • Small RNA
  • Transcription factor

ASJC Scopus subject areas

  • Genetics
  • General

Cite this

Arabidopsis intragenomic conserved noncoding sequence. / Thomas, Brian C.; Rapaka, Lakshmi; Lyons, Eric H; Pedersen, Brent; Freeling, Michael.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 104, No. 9, 27.02.2007, p. 3348-3353.

Research output: Contribution to journalArticle

Thomas, Brian C. ; Rapaka, Lakshmi ; Lyons, Eric H ; Pedersen, Brent ; Freeling, Michael. / Arabidopsis intragenomic conserved noncoding sequence. In: Proceedings of the National Academy of Sciences of the United States of America. 2007 ; Vol. 104, No. 9. pp. 3348-3353.
@article{fd2a87b54e7048f09453d2a8b6466d6c,
title = "Arabidopsis intragenomic conserved noncoding sequence",
abstract = "After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequences from the bl2seq data, we created a database of 14,944 intragenomic Arabidopsis CNSs. The mean CNS length is 31 bp, ranging from 15 to 285 bp. There are ≈1.7 CNSs associated with a typical gene, and Arabidopsis CNSs are found in all areas around exons, most frequently in the 5′ upstream region. Gene ontology classifications related to transcription, regulation, or {"}response to...{"} external or endogenous stimuli, especially hormones, tend to be significantly overrepresented among genes containing a large number of CNSs, whereas protein localization, transport, and metabolism are common among genes with no CNSs. There is a 1.5{\%} overlap between these CNSs and the 218,982 putative RNAs in the Arabidopsis Small RNA Project database, allowing for two mismatches. These CNSs provide a unique set of noncoding sequences enriched for function. CMS function is implied by evolutionary conservation and independently supported because CNS-richness predicts regulatory gene ontology categories.",
keywords = "Gene regulation, Small RNA, Transcription factor",
author = "Thomas, {Brian C.} and Lakshmi Rapaka and Lyons, {Eric H} and Brent Pedersen and Michael Freeling",
year = "2007",
month = "2",
day = "27",
doi = "10.1073/pnas.0611574104",
language = "English (US)",
volume = "104",
pages = "3348--3353",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "9",

}

TY - JOUR

T1 - Arabidopsis intragenomic conserved noncoding sequence

AU - Thomas, Brian C.

AU - Rapaka, Lakshmi

AU - Lyons, Eric H

AU - Pedersen, Brent

AU - Freeling, Michael

PY - 2007/2/27

Y1 - 2007/2/27

N2 - After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequences from the bl2seq data, we created a database of 14,944 intragenomic Arabidopsis CNSs. The mean CNS length is 31 bp, ranging from 15 to 285 bp. There are ≈1.7 CNSs associated with a typical gene, and Arabidopsis CNSs are found in all areas around exons, most frequently in the 5′ upstream region. Gene ontology classifications related to transcription, regulation, or "response to..." external or endogenous stimuli, especially hormones, tend to be significantly overrepresented among genes containing a large number of CNSs, whereas protein localization, transport, and metabolism are common among genes with no CNSs. There is a 1.5% overlap between these CNSs and the 218,982 putative RNAs in the Arabidopsis Small RNA Project database, allowing for two mismatches. These CNSs provide a unique set of noncoding sequences enriched for function. CMS function is implied by evolutionary conservation and independently supported because CNS-richness predicts regulatory gene ontology categories.

AB - After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequences from the bl2seq data, we created a database of 14,944 intragenomic Arabidopsis CNSs. The mean CNS length is 31 bp, ranging from 15 to 285 bp. There are ≈1.7 CNSs associated with a typical gene, and Arabidopsis CNSs are found in all areas around exons, most frequently in the 5′ upstream region. Gene ontology classifications related to transcription, regulation, or "response to..." external or endogenous stimuli, especially hormones, tend to be significantly overrepresented among genes containing a large number of CNSs, whereas protein localization, transport, and metabolism are common among genes with no CNSs. There is a 1.5% overlap between these CNSs and the 218,982 putative RNAs in the Arabidopsis Small RNA Project database, allowing for two mismatches. These CNSs provide a unique set of noncoding sequences enriched for function. CMS function is implied by evolutionary conservation and independently supported because CNS-richness predicts regulatory gene ontology categories.

KW - Gene regulation

KW - Small RNA

KW - Transcription factor

UR - http://www.scopus.com/inward/record.url?scp=33847643636&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847643636&partnerID=8YFLogxK

U2 - 10.1073/pnas.0611574104

DO - 10.1073/pnas.0611574104

M3 - Article

C2 - 17301222

AN - SCOPUS:33847643636

VL - 104

SP - 3348

EP - 3353

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 9

ER -