Support vector machine for prediction of horizontal gene transfers in bacteria genomes

Jian Sheng Wu, Jian Ming Xie, Tong Zhou, Jian Hong Weng, Xiao Sun

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Horizontal gene transfer (HGT), also Lateral gene transfer (LGT), is any process in which an organism transfers genetic material to another species that is not its offspring. With the increase of available genomic data, it has become more convenient to study the way to detect the genes, which are products of horizontal transfers among a given genome. There are few data about known horizontal gene transfers in three bacterium genomes under consideration, so the experiments, which simulated gene transfer by artificially inserting phage genes, were carried out. Combining the feature analysis methods of gene sequences with support vector machine (SVM), a novel method was developed for identifying horizontal gene transfers (HGT) in 3 fully sequenced bacterium genomes (Escherichia coli K12, Borrelia burgdorferi, Bacillus cereus ZK). According to our previous work, codon use frequency (FCU) was selected as the sequence feature, in respect that it is inherently the fusion of both codon usage bias and amino acid composition signals. In addition, another computational method was proposed considering strand asymmetry and predicting horizontal gene transfers of leading strand and lagging strand of genomes under consideration, respectively. To avoid the occasionality of simulating gene transfer through artificially inserting phage genes, 100 times of the transfer-and-recover experiment were repeated and arithmetic average of measurement for each genome being considered were reported to evaluate algorithm's performance. Ten-fold cross-validation was used for both parameter and accuracy estimation. The best results were obtained for C-Support Vector Classification (C-SVC) type by using the radial basis function kernel with γ=100, while for one-class SVM type the best performance was obtained using the polynomial kernel of three degree. The performance of the approach was compared with that of Tsirigos' method, which is one of the best predictive approachs to date in detecting of horizontal transfer genes. Firstly, for the original method that did not consider the strand asymmetry, the C-SVC type has a high relative improvement(RI) of 31.47% on hit ratio for Escherichia coli K12, while the one-class SVM type has RI of 11.61% for Borrelia burgdorferi. Moreover, as theoretically expected, the method considering the strand asymmetry resulted in higher RI than the original method. In order to examine the approach's performance in detecting factual gene transfer events, the approach was applied in genome of Enterococcus faecalis V583. It is not only succeed in recovering all the seven factual horizontally transferred genes, also found that the whole segment from 7 kb upstream of gene EF2293 to 38 kb downstream of gene EF2299 was probably transferred into E. faecalis V583 genome simultaneously with the above seven genes.

Original languageEnglish (US)
Pages (from-to)724-731
Number of pages8
JournalProgress in Biochemistry and Biophysics
Volume34
Issue number7
StatePublished - Jul 2007
Externally publishedYes

Fingerprint

Gene transfer
Horizontal Gene Transfer
Support vector machines
Bacteria
Genes
Genome
Escherichia coli K12
Borrelia burgdorferi
Enterococcus faecalis
Bacteriophages
Codon
Support Vector Machine
Escherichia coli
Bacillus cereus
Computational methods

Keywords

  • Bacteria genomes
  • Codon use frequency (FCU)
  • Horizontal gene transfer (HGT)
  • Support vector machine (SVM)

ASJC Scopus subject areas

  • Biochemistry
  • Biophysics

Cite this

Support vector machine for prediction of horizontal gene transfers in bacteria genomes. / Wu, Jian Sheng; Xie, Jian Ming; Zhou, Tong; Weng, Jian Hong; Sun, Xiao.

In: Progress in Biochemistry and Biophysics, Vol. 34, No. 7, 07.2007, p. 724-731.

Research output: Contribution to journalArticle

Wu, Jian Sheng ; Xie, Jian Ming ; Zhou, Tong ; Weng, Jian Hong ; Sun, Xiao. / Support vector machine for prediction of horizontal gene transfers in bacteria genomes. In: Progress in Biochemistry and Biophysics. 2007 ; Vol. 34, No. 7. pp. 724-731.
@article{c35bd8bd4e4a458a904674c11d6567a1,
title = "Support vector machine for prediction of horizontal gene transfers in bacteria genomes",
abstract = "Horizontal gene transfer (HGT), also Lateral gene transfer (LGT), is any process in which an organism transfers genetic material to another species that is not its offspring. With the increase of available genomic data, it has become more convenient to study the way to detect the genes, which are products of horizontal transfers among a given genome. There are few data about known horizontal gene transfers in three bacterium genomes under consideration, so the experiments, which simulated gene transfer by artificially inserting phage genes, were carried out. Combining the feature analysis methods of gene sequences with support vector machine (SVM), a novel method was developed for identifying horizontal gene transfers (HGT) in 3 fully sequenced bacterium genomes (Escherichia coli K12, Borrelia burgdorferi, Bacillus cereus ZK). According to our previous work, codon use frequency (FCU) was selected as the sequence feature, in respect that it is inherently the fusion of both codon usage bias and amino acid composition signals. In addition, another computational method was proposed considering strand asymmetry and predicting horizontal gene transfers of leading strand and lagging strand of genomes under consideration, respectively. To avoid the occasionality of simulating gene transfer through artificially inserting phage genes, 100 times of the transfer-and-recover experiment were repeated and arithmetic average of measurement for each genome being considered were reported to evaluate algorithm's performance. Ten-fold cross-validation was used for both parameter and accuracy estimation. The best results were obtained for C-Support Vector Classification (C-SVC) type by using the radial basis function kernel with γ=100, while for one-class SVM type the best performance was obtained using the polynomial kernel of three degree. The performance of the approach was compared with that of Tsirigos' method, which is one of the best predictive approachs to date in detecting of horizontal transfer genes. Firstly, for the original method that did not consider the strand asymmetry, the C-SVC type has a high relative improvement(RI) of 31.47{\%} on hit ratio for Escherichia coli K12, while the one-class SVM type has RI of 11.61{\%} for Borrelia burgdorferi. Moreover, as theoretically expected, the method considering the strand asymmetry resulted in higher RI than the original method. In order to examine the approach's performance in detecting factual gene transfer events, the approach was applied in genome of Enterococcus faecalis V583. It is not only succeed in recovering all the seven factual horizontally transferred genes, also found that the whole segment from 7 kb upstream of gene EF2293 to 38 kb downstream of gene EF2299 was probably transferred into E. faecalis V583 genome simultaneously with the above seven genes.",
keywords = "Bacteria genomes, Codon use frequency (FCU), Horizontal gene transfer (HGT), Support vector machine (SVM)",
author = "Wu, {Jian Sheng} and Xie, {Jian Ming} and Tong Zhou and Weng, {Jian Hong} and Xiao Sun",
year = "2007",
month = "7",
language = "English (US)",
volume = "34",
pages = "724--731",
journal = "Progress in Biochemistry and Biophysics",
issn = "1000-3282",
publisher = "Science Press",
number = "7",

}

TY - JOUR

T1 - Support vector machine for prediction of horizontal gene transfers in bacteria genomes

AU - Wu, Jian Sheng

AU - Xie, Jian Ming

AU - Zhou, Tong

AU - Weng, Jian Hong

AU - Sun, Xiao

PY - 2007/7

Y1 - 2007/7

N2 - Horizontal gene transfer (HGT), also Lateral gene transfer (LGT), is any process in which an organism transfers genetic material to another species that is not its offspring. With the increase of available genomic data, it has become more convenient to study the way to detect the genes, which are products of horizontal transfers among a given genome. There are few data about known horizontal gene transfers in three bacterium genomes under consideration, so the experiments, which simulated gene transfer by artificially inserting phage genes, were carried out. Combining the feature analysis methods of gene sequences with support vector machine (SVM), a novel method was developed for identifying horizontal gene transfers (HGT) in 3 fully sequenced bacterium genomes (Escherichia coli K12, Borrelia burgdorferi, Bacillus cereus ZK). According to our previous work, codon use frequency (FCU) was selected as the sequence feature, in respect that it is inherently the fusion of both codon usage bias and amino acid composition signals. In addition, another computational method was proposed considering strand asymmetry and predicting horizontal gene transfers of leading strand and lagging strand of genomes under consideration, respectively. To avoid the occasionality of simulating gene transfer through artificially inserting phage genes, 100 times of the transfer-and-recover experiment were repeated and arithmetic average of measurement for each genome being considered were reported to evaluate algorithm's performance. Ten-fold cross-validation was used for both parameter and accuracy estimation. The best results were obtained for C-Support Vector Classification (C-SVC) type by using the radial basis function kernel with γ=100, while for one-class SVM type the best performance was obtained using the polynomial kernel of three degree. The performance of the approach was compared with that of Tsirigos' method, which is one of the best predictive approachs to date in detecting of horizontal transfer genes. Firstly, for the original method that did not consider the strand asymmetry, the C-SVC type has a high relative improvement(RI) of 31.47% on hit ratio for Escherichia coli K12, while the one-class SVM type has RI of 11.61% for Borrelia burgdorferi. Moreover, as theoretically expected, the method considering the strand asymmetry resulted in higher RI than the original method. In order to examine the approach's performance in detecting factual gene transfer events, the approach was applied in genome of Enterococcus faecalis V583. It is not only succeed in recovering all the seven factual horizontally transferred genes, also found that the whole segment from 7 kb upstream of gene EF2293 to 38 kb downstream of gene EF2299 was probably transferred into E. faecalis V583 genome simultaneously with the above seven genes.

AB - Horizontal gene transfer (HGT), also Lateral gene transfer (LGT), is any process in which an organism transfers genetic material to another species that is not its offspring. With the increase of available genomic data, it has become more convenient to study the way to detect the genes, which are products of horizontal transfers among a given genome. There are few data about known horizontal gene transfers in three bacterium genomes under consideration, so the experiments, which simulated gene transfer by artificially inserting phage genes, were carried out. Combining the feature analysis methods of gene sequences with support vector machine (SVM), a novel method was developed for identifying horizontal gene transfers (HGT) in 3 fully sequenced bacterium genomes (Escherichia coli K12, Borrelia burgdorferi, Bacillus cereus ZK). According to our previous work, codon use frequency (FCU) was selected as the sequence feature, in respect that it is inherently the fusion of both codon usage bias and amino acid composition signals. In addition, another computational method was proposed considering strand asymmetry and predicting horizontal gene transfers of leading strand and lagging strand of genomes under consideration, respectively. To avoid the occasionality of simulating gene transfer through artificially inserting phage genes, 100 times of the transfer-and-recover experiment were repeated and arithmetic average of measurement for each genome being considered were reported to evaluate algorithm's performance. Ten-fold cross-validation was used for both parameter and accuracy estimation. The best results were obtained for C-Support Vector Classification (C-SVC) type by using the radial basis function kernel with γ=100, while for one-class SVM type the best performance was obtained using the polynomial kernel of three degree. The performance of the approach was compared with that of Tsirigos' method, which is one of the best predictive approachs to date in detecting of horizontal transfer genes. Firstly, for the original method that did not consider the strand asymmetry, the C-SVC type has a high relative improvement(RI) of 31.47% on hit ratio for Escherichia coli K12, while the one-class SVM type has RI of 11.61% for Borrelia burgdorferi. Moreover, as theoretically expected, the method considering the strand asymmetry resulted in higher RI than the original method. In order to examine the approach's performance in detecting factual gene transfer events, the approach was applied in genome of Enterococcus faecalis V583. It is not only succeed in recovering all the seven factual horizontally transferred genes, also found that the whole segment from 7 kb upstream of gene EF2293 to 38 kb downstream of gene EF2299 was probably transferred into E. faecalis V583 genome simultaneously with the above seven genes.

KW - Bacteria genomes

KW - Codon use frequency (FCU)

KW - Horizontal gene transfer (HGT)

KW - Support vector machine (SVM)

UR - http://www.scopus.com/inward/record.url?scp=35348943456&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35348943456&partnerID=8YFLogxK

M3 - Article

VL - 34

SP - 724

EP - 731

JO - Progress in Biochemistry and Biophysics

JF - Progress in Biochemistry and Biophysics

SN - 1000-3282

IS - 7

ER -