Support vector machine for prediction of horizontal gene transfers in bacteria genomes

Jian Sheng Wu, Jian Ming Xie, Tong Zhou, Jian Hong Weng, Xiao Sun

Research output: Contribution to journalArticle

1 Scopus citations

Abstract

Horizontal gene transfer (HGT), also Lateral gene transfer (LGT), is any process in which an organism transfers genetic material to another species that is not its offspring. With the increase of available genomic data, it has become more convenient to study the way to detect the genes, which are products of horizontal transfers among a given genome. There are few data about known horizontal gene transfers in three bacterium genomes under consideration, so the experiments, which simulated gene transfer by artificially inserting phage genes, were carried out. Combining the feature analysis methods of gene sequences with support vector machine (SVM), a novel method was developed for identifying horizontal gene transfers (HGT) in 3 fully sequenced bacterium genomes (Escherichia coli K12, Borrelia burgdorferi, Bacillus cereus ZK). According to our previous work, codon use frequency (FCU) was selected as the sequence feature, in respect that it is inherently the fusion of both codon usage bias and amino acid composition signals. In addition, another computational method was proposed considering strand asymmetry and predicting horizontal gene transfers of leading strand and lagging strand of genomes under consideration, respectively. To avoid the occasionality of simulating gene transfer through artificially inserting phage genes, 100 times of the transfer-and-recover experiment were repeated and arithmetic average of measurement for each genome being considered were reported to evaluate algorithm's performance. Ten-fold cross-validation was used for both parameter and accuracy estimation. The best results were obtained for C-Support Vector Classification (C-SVC) type by using the radial basis function kernel with γ=100, while for one-class SVM type the best performance was obtained using the polynomial kernel of three degree. The performance of the approach was compared with that of Tsirigos' method, which is one of the best predictive approachs to date in detecting of horizontal transfer genes. Firstly, for the original method that did not consider the strand asymmetry, the C-SVC type has a high relative improvement(RI) of 31.47% on hit ratio for Escherichia coli K12, while the one-class SVM type has RI of 11.61% for Borrelia burgdorferi. Moreover, as theoretically expected, the method considering the strand asymmetry resulted in higher RI than the original method. In order to examine the approach's performance in detecting factual gene transfer events, the approach was applied in genome of Enterococcus faecalis V583. It is not only succeed in recovering all the seven factual horizontally transferred genes, also found that the whole segment from 7 kb upstream of gene EF2293 to 38 kb downstream of gene EF2299 was probably transferred into E. faecalis V583 genome simultaneously with the above seven genes.

Original languageEnglish (US)
Pages (from-to)724-731
Number of pages8
JournalProgress in Biochemistry and Biophysics
Volume34
Issue number7
StatePublished - Jul 1 2007

    Fingerprint

Keywords

  • Bacteria genomes
  • Codon use frequency (FCU)
  • Horizontal gene transfer (HGT)
  • Support vector machine (SVM)

ASJC Scopus subject areas

  • Biophysics
  • Biochemistry

Cite this