The First Plant Genome Sequence-Arabidopsis thaliana

Research output: Contribution to journalArticle

2 Scopus citations


The Arabidopsis thaliana genome was the first plant genome to be sequenced. The substrates for sequencing consisted of a minimum tiling path of BAC, P1, YAC, TAC and cosmid clones, anchored to the genetic map. Using these substrates, 10 contigs were developed from 1569 clones. Annotation at the time the sequence was finished identified 25,498 protein-coding genes. With the continued development of software trained on Arabidopsis genes, along with the availability of large numbers of ESTs and additional plant genome sequences, the number of annotated genes has increased. The final TAIR (TAIR10) genome annotation release contains 27,202 nuclear protein-coding genes, 4827 pseudogenes and transposable element genes and 1359 noncoding RNAs. Gene density (kb/gene) is 4.35, with 5.89 exons/gene, an average exon length of 296. nt and an average intron length of 165. nt. Gene density decreases and transposon density increases near the centromeres. Multiple splice variants have been identified for >. 60% of intron-containing genes. Arabidopsis has experienced a genome triplication and two duplication events during its evolution, giving rise to multiple segmental duplications. These polyploidizations, along with tandem and dispersed single-gene duplications, have contributed to the expansion of gene families and provided raw material for functional divergence.

Original languageEnglish (US)
Pages (from-to)91-117
Number of pages27
JournalAdvances in Botanical Research
Publication statusPublished - 2014



  • Arabidopsis thaliana
  • CDNAs
  • Gene number
  • Genome sequence
  • Protein families
  • Segmental duplications
  • Tandem duplications

ASJC Scopus subject areas

  • Plant Science

Cite this