Proteogenomic mapping for structural annotation of prokaryote genomes

Nan Wang, Shane C Burgess, Mark Lawrence, Susan Bridges

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Structural annotation of genomes is one of major goals of genomics research. Most popular tools for structural annotation of genomes are determined by computational pipelines. It is well-known that these computational methods have a number of shortcomings including false identifications and incorrect identification of gene boundaries. Proteomic data can used to confirm the identification of genes identified by computational methods and correct mistakes. A Proteogenomic mapping method has been developed, which uses peptides identified from mass spectrometry for structural annotation of genomes. Spectra are matched against both a protein database and the genome database translated in all six reading frames. Those peptides that match the genome but not the protein database potentially represent novel protein coding genes, annotation errors. These short experimentally derived peptides are used to discover potential novel protein coding genes called expressed Protein Sequence Tags (ePSTs) by aligning the peptides to the genomic DNA and extending the translation in the 3' and 5' direction. In the paper, an enhanced pipeline, has been designed and developed for discovering and evaluating of potential novel protein coding genes: 1) a distance-based outlier detection method for validating peptides identified from MS/MS, 2) a proteogenomic mapping for discovery of potential novel protein coding genes, 3) collection of evidence from a number of sources and automatically evaluate potential novel protein coding genes by using machine learning techniques, such as Neural Network, Support Vector Machine, Naïve Bayes etc.

Original languageEnglish (US)
Title of host publicationProceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
Pages103-106
Number of pages4
DOIs
StatePublished - 2009
Externally publishedYes
Event2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009 - Shanghai, China
Duration: Aug 3 2009Aug 5 2009

Other

Other2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
CountryChina
CityShanghai
Period8/3/098/5/09

Fingerprint

Genes
Proteins
Peptides
Computational methods
Proteomics
Pipelines
Mass spectrometry
Support vector machines
Learning systems
DNA
Neural networks

Keywords

  • Bayesian network
  • Expressed protein sequence tags
  • Naïve bayes
  • Neural network
  • Outlier detection
  • Peptide validation
  • Potential genes
  • Proteogenomic mapping
  • PST
  • Support vector machine
  • Target decoy strategy

ASJC Scopus subject areas

  • Software
  • Biomedical Engineering

Cite this

Wang, N., Burgess, S. C., Lawrence, M., & Bridges, S. (2009). Proteogenomic mapping for structural annotation of prokaryote genomes. In Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009 (pp. 103-106). [5260732] https://doi.org/10.1109/IJCBS.2009.126

Proteogenomic mapping for structural annotation of prokaryote genomes. / Wang, Nan; Burgess, Shane C; Lawrence, Mark; Bridges, Susan.

Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009. 2009. p. 103-106 5260732.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, N, Burgess, SC, Lawrence, M & Bridges, S 2009, Proteogenomic mapping for structural annotation of prokaryote genomes. in Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009., 5260732, pp. 103-106, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009, Shanghai, China, 8/3/09. https://doi.org/10.1109/IJCBS.2009.126
Wang N, Burgess SC, Lawrence M, Bridges S. Proteogenomic mapping for structural annotation of prokaryote genomes. In Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009. 2009. p. 103-106. 5260732 https://doi.org/10.1109/IJCBS.2009.126
Wang, Nan ; Burgess, Shane C ; Lawrence, Mark ; Bridges, Susan. / Proteogenomic mapping for structural annotation of prokaryote genomes. Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009. 2009. pp. 103-106
@inproceedings{43c53aa4f6984a72beb1c41b617d7365,
title = "Proteogenomic mapping for structural annotation of prokaryote genomes",
abstract = "Structural annotation of genomes is one of major goals of genomics research. Most popular tools for structural annotation of genomes are determined by computational pipelines. It is well-known that these computational methods have a number of shortcomings including false identifications and incorrect identification of gene boundaries. Proteomic data can used to confirm the identification of genes identified by computational methods and correct mistakes. A Proteogenomic mapping method has been developed, which uses peptides identified from mass spectrometry for structural annotation of genomes. Spectra are matched against both a protein database and the genome database translated in all six reading frames. Those peptides that match the genome but not the protein database potentially represent novel protein coding genes, annotation errors. These short experimentally derived peptides are used to discover potential novel protein coding genes called expressed Protein Sequence Tags (ePSTs) by aligning the peptides to the genomic DNA and extending the translation in the 3' and 5' direction. In the paper, an enhanced pipeline, has been designed and developed for discovering and evaluating of potential novel protein coding genes: 1) a distance-based outlier detection method for validating peptides identified from MS/MS, 2) a proteogenomic mapping for discovery of potential novel protein coding genes, 3) collection of evidence from a number of sources and automatically evaluate potential novel protein coding genes by using machine learning techniques, such as Neural Network, Support Vector Machine, Na{\"i}ve Bayes etc.",
keywords = "Bayesian network, Expressed protein sequence tags, Na{\"i}ve bayes, Neural network, Outlier detection, Peptide validation, Potential genes, Proteogenomic mapping, PST, Support vector machine, Target decoy strategy",
author = "Nan Wang and Burgess, {Shane C} and Mark Lawrence and Susan Bridges",
year = "2009",
doi = "10.1109/IJCBS.2009.126",
language = "English (US)",
isbn = "9780769537399",
pages = "103--106",
booktitle = "Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009",

}

TY - GEN

T1 - Proteogenomic mapping for structural annotation of prokaryote genomes

AU - Wang, Nan

AU - Burgess, Shane C

AU - Lawrence, Mark

AU - Bridges, Susan

PY - 2009

Y1 - 2009

N2 - Structural annotation of genomes is one of major goals of genomics research. Most popular tools for structural annotation of genomes are determined by computational pipelines. It is well-known that these computational methods have a number of shortcomings including false identifications and incorrect identification of gene boundaries. Proteomic data can used to confirm the identification of genes identified by computational methods and correct mistakes. A Proteogenomic mapping method has been developed, which uses peptides identified from mass spectrometry for structural annotation of genomes. Spectra are matched against both a protein database and the genome database translated in all six reading frames. Those peptides that match the genome but not the protein database potentially represent novel protein coding genes, annotation errors. These short experimentally derived peptides are used to discover potential novel protein coding genes called expressed Protein Sequence Tags (ePSTs) by aligning the peptides to the genomic DNA and extending the translation in the 3' and 5' direction. In the paper, an enhanced pipeline, has been designed and developed for discovering and evaluating of potential novel protein coding genes: 1) a distance-based outlier detection method for validating peptides identified from MS/MS, 2) a proteogenomic mapping for discovery of potential novel protein coding genes, 3) collection of evidence from a number of sources and automatically evaluate potential novel protein coding genes by using machine learning techniques, such as Neural Network, Support Vector Machine, Naïve Bayes etc.

AB - Structural annotation of genomes is one of major goals of genomics research. Most popular tools for structural annotation of genomes are determined by computational pipelines. It is well-known that these computational methods have a number of shortcomings including false identifications and incorrect identification of gene boundaries. Proteomic data can used to confirm the identification of genes identified by computational methods and correct mistakes. A Proteogenomic mapping method has been developed, which uses peptides identified from mass spectrometry for structural annotation of genomes. Spectra are matched against both a protein database and the genome database translated in all six reading frames. Those peptides that match the genome but not the protein database potentially represent novel protein coding genes, annotation errors. These short experimentally derived peptides are used to discover potential novel protein coding genes called expressed Protein Sequence Tags (ePSTs) by aligning the peptides to the genomic DNA and extending the translation in the 3' and 5' direction. In the paper, an enhanced pipeline, has been designed and developed for discovering and evaluating of potential novel protein coding genes: 1) a distance-based outlier detection method for validating peptides identified from MS/MS, 2) a proteogenomic mapping for discovery of potential novel protein coding genes, 3) collection of evidence from a number of sources and automatically evaluate potential novel protein coding genes by using machine learning techniques, such as Neural Network, Support Vector Machine, Naïve Bayes etc.

KW - Bayesian network

KW - Expressed protein sequence tags

KW - Naïve bayes

KW - Neural network

KW - Outlier detection

KW - Peptide validation

KW - Potential genes

KW - Proteogenomic mapping

KW - PST

KW - Support vector machine

KW - Target decoy strategy

UR - http://www.scopus.com/inward/record.url?scp=70450158420&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70450158420&partnerID=8YFLogxK

U2 - 10.1109/IJCBS.2009.126

DO - 10.1109/IJCBS.2009.126

M3 - Conference contribution

AN - SCOPUS:70450158420

SN - 9780769537399

SP - 103

EP - 106

BT - Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009

ER -