A new method for EST clustering

Li da Zhang, De jun Yuan, Jianwei - Zhang, Shi Ping Wang, Qi Fa Zhang

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

We developed an EST (expressed sequence tag) clustering method, ESTClustering, to generate high- quality unique expressed sequence based on large-scale EST sequencing. The method uses consensus sequences to sequence analyze with megablast and assemble each cluster with phrap in clustering process. The clustering strategy can efficiently identify gene family and alternate splicing forms of expressed sequences. It can also reduce the adverse effects caused by sequence errors. The ESTClustering method tends to provide more expressed gene forms comparing with the UniGene clustering method of the National Center for Biotechnology Information. Analysis of the 112 256 ESTs of Arabidopsis with ESTClustering produced 23 581 EST clusters. Among these Arabidopsis EST clusters, 13 597 have corresponding genome coding sequences and this number is close to the number of genes predicted with Arabidopsis ESTs. Using this clustering method, a total of 147 191 rice ESTs were clustered into 33 896 groups.

Original languageEnglish (US)
Pages (from-to)147-153
Number of pages7
JournalActa Genetica Sinica
Volume30
Issue number2
StatePublished - Feb 1 2003
Externally publishedYes

Fingerprint

Expressed Sequence Tags
Cluster Analysis
Arabidopsis
Genes
Information Centers
Consensus Sequence
Alternative Splicing
Biotechnology
Sequence Analysis
Genome

Keywords

  • Consensus sequence
  • EST clustering
  • Non-redundant cDNA library

ASJC Scopus subject areas

  • Genetics

Cite this

Zhang, L. D., Yuan, D. J., Zhang, J. ., Wang, S. P., & Zhang, Q. F. (2003). A new method for EST clustering. Acta Genetica Sinica, 30(2), 147-153.

A new method for EST clustering. / Zhang, Li da; Yuan, De jun; Zhang, Jianwei -; Wang, Shi Ping; Zhang, Qi Fa.

In: Acta Genetica Sinica, Vol. 30, No. 2, 01.02.2003, p. 147-153.

Research output: Contribution to journalArticle

Zhang, LD, Yuan, DJ, Zhang, J, Wang, SP & Zhang, QF 2003, 'A new method for EST clustering', Acta Genetica Sinica, vol. 30, no. 2, pp. 147-153.
Zhang LD, Yuan DJ, Zhang J, Wang SP, Zhang QF. A new method for EST clustering. Acta Genetica Sinica. 2003 Feb 1;30(2):147-153.
Zhang, Li da ; Yuan, De jun ; Zhang, Jianwei - ; Wang, Shi Ping ; Zhang, Qi Fa. / A new method for EST clustering. In: Acta Genetica Sinica. 2003 ; Vol. 30, No. 2. pp. 147-153.
@article{a10e25a1e9264afa913c2f2aeb248957,
title = "A new method for EST clustering",
abstract = "We developed an EST (expressed sequence tag) clustering method, ESTClustering, to generate high- quality unique expressed sequence based on large-scale EST sequencing. The method uses consensus sequences to sequence analyze with megablast and assemble each cluster with phrap in clustering process. The clustering strategy can efficiently identify gene family and alternate splicing forms of expressed sequences. It can also reduce the adverse effects caused by sequence errors. The ESTClustering method tends to provide more expressed gene forms comparing with the UniGene clustering method of the National Center for Biotechnology Information. Analysis of the 112 256 ESTs of Arabidopsis with ESTClustering produced 23 581 EST clusters. Among these Arabidopsis EST clusters, 13 597 have corresponding genome coding sequences and this number is close to the number of genes predicted with Arabidopsis ESTs. Using this clustering method, a total of 147 191 rice ESTs were clustered into 33 896 groups.",
keywords = "Consensus sequence, EST clustering, Non-redundant cDNA library",
author = "Zhang, {Li da} and Yuan, {De jun} and Zhang, {Jianwei -} and Wang, {Shi Ping} and Zhang, {Qi Fa}",
year = "2003",
month = "2",
day = "1",
language = "English (US)",
volume = "30",
pages = "147--153",
journal = "Journal of Genetics and Genomics",
issn = "1673-8527",
publisher = "Institute of Genetics and Developmental Biology",
number = "2",

}

TY - JOUR

T1 - A new method for EST clustering

AU - Zhang, Li da

AU - Yuan, De jun

AU - Zhang, Jianwei -

AU - Wang, Shi Ping

AU - Zhang, Qi Fa

PY - 2003/2/1

Y1 - 2003/2/1

N2 - We developed an EST (expressed sequence tag) clustering method, ESTClustering, to generate high- quality unique expressed sequence based on large-scale EST sequencing. The method uses consensus sequences to sequence analyze with megablast and assemble each cluster with phrap in clustering process. The clustering strategy can efficiently identify gene family and alternate splicing forms of expressed sequences. It can also reduce the adverse effects caused by sequence errors. The ESTClustering method tends to provide more expressed gene forms comparing with the UniGene clustering method of the National Center for Biotechnology Information. Analysis of the 112 256 ESTs of Arabidopsis with ESTClustering produced 23 581 EST clusters. Among these Arabidopsis EST clusters, 13 597 have corresponding genome coding sequences and this number is close to the number of genes predicted with Arabidopsis ESTs. Using this clustering method, a total of 147 191 rice ESTs were clustered into 33 896 groups.

AB - We developed an EST (expressed sequence tag) clustering method, ESTClustering, to generate high- quality unique expressed sequence based on large-scale EST sequencing. The method uses consensus sequences to sequence analyze with megablast and assemble each cluster with phrap in clustering process. The clustering strategy can efficiently identify gene family and alternate splicing forms of expressed sequences. It can also reduce the adverse effects caused by sequence errors. The ESTClustering method tends to provide more expressed gene forms comparing with the UniGene clustering method of the National Center for Biotechnology Information. Analysis of the 112 256 ESTs of Arabidopsis with ESTClustering produced 23 581 EST clusters. Among these Arabidopsis EST clusters, 13 597 have corresponding genome coding sequences and this number is close to the number of genes predicted with Arabidopsis ESTs. Using this clustering method, a total of 147 191 rice ESTs were clustered into 33 896 groups.

KW - Consensus sequence

KW - EST clustering

KW - Non-redundant cDNA library

UR - http://www.scopus.com/inward/record.url?scp=0037331774&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037331774&partnerID=8YFLogxK

M3 - Article

VL - 30

SP - 147

EP - 153

JO - Journal of Genetics and Genomics

JF - Journal of Genetics and Genomics

SN - 1673-8527

IS - 2

ER -