A nearest neighbor approach for automated transporter prediction and categorization from protein sequences

Haiquan Li, Xinbin Dai, Xuechun Zhao

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

Motivation: Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Results: Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.

Original languageEnglish (US)
Pages (from-to)1129-1136
Number of pages8
JournalBioinformatics
Volume24
Issue number9
DOIs
StatePublished - May 2008
Externally publishedYes

Fingerprint

Protein Sequence
Categorization
Nearest Neighbor
Databases
Proteins
Prediction
Membrane
Biological membranes
Nearest Neighbor Method
Membrane Transport Proteins
Experimental Validation
Macromolecules
Cross-validation
Overlapping
Learning systems
Machine Learning
Fold
Integrate
Molecules
Ions

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

A nearest neighbor approach for automated transporter prediction and categorization from protein sequences. / Li, Haiquan; Dai, Xinbin; Zhao, Xuechun.

In: Bioinformatics, Vol. 24, No. 9, 05.2008, p. 1129-1136.

Research output: Contribution to journalArticle

@article{fc14a27fdcf546cfab8688548ca0d138,
title = "A nearest neighbor approach for automated transporter prediction and categorization from protein sequences",
abstract = "Motivation: Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Results: Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3{\%} on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8{\%} of our predictions on the 11 organisms, including 55.9{\%} of our predictions overlapping with 83.6{\%} of the predicted transporters in TransportDB.",
author = "Haiquan Li and Xinbin Dai and Xuechun Zhao",
year = "2008",
month = "5",
doi = "10.1093/bioinformatics/btn099",
language = "English (US)",
volume = "24",
pages = "1129--1136",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "9",

}

TY - JOUR

T1 - A nearest neighbor approach for automated transporter prediction and categorization from protein sequences

AU - Li, Haiquan

AU - Dai, Xinbin

AU - Zhao, Xuechun

PY - 2008/5

Y1 - 2008/5

N2 - Motivation: Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Results: Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.

AB - Motivation: Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Results: Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.

UR - http://www.scopus.com/inward/record.url?scp=42649092537&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42649092537&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btn099

DO - 10.1093/bioinformatics/btn099

M3 - Article

C2 - 18337257

AN - SCOPUS:42649092537

VL - 24

SP - 1129

EP - 1136

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 9

ER -