The utility of a genome sequence in biological research depends entirely on the comprehensive description of all of its functional elements. Analysis of genome sequences is still predominantly gene-centric (i.e., identifying gene models/open reading frames). In this article, we describe a proteomics-based method for identifying open reading frames that are missed by computational algorithms. Mass spectrometry-based identification of peptides and proteins from biological samples provide evidence for the expression of the genome sequence at the protein level. This proteogenomic annotation method combines computationally predicted ORFs and the genome sequence with proteomics to identify novel gene models. We also describe our proteogenomic mapping pipeline - a set of computational tools that automate the proteogenomic annotation work flow. This pipeline is available for download at www.agbase.msstate.edu/tools/ .
|Original language||English (US)|
|Number of pages||8|
|Journal||Methods in molecular biology (Clifton, N.J.)|
|State||Published - 2010|
ASJC Scopus subject areas
- Molecular Biology