Quantifying circular RNA expression from RNA-seq data using model-based framework

Musheng Li, Xueying Xie, Jing Zhou, Mengying Sheng, Xiaofeng Yin, Eun A. Ko, Tong Zhou, Wanjun Gu

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Motivation: Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Results: Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir.

Original languageEnglish (US)
Pages (from-to)2131-2139
Number of pages9
JournalBioinformatics
Volume33
Issue number14
DOIs
StatePublished - Jul 15 2017
Externally publishedYes

Fingerprint

RNA
Data Model
Model-based
Ribosomal RNA
Quantification
Biological Phenomena
Untranslated RNA
Tissue
Framework
circular RNA
Cell Line
Genes
Cell
Cells
Throughput
Estimate
High Throughput
Count
Quantify
Transform

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Li, M., Xie, X., Zhou, J., Sheng, M., Yin, X., Ko, E. A., ... Gu, W. (2017). Quantifying circular RNA expression from RNA-seq data using model-based framework. Bioinformatics, 33(14), 2131-2139. https://doi.org/10.1093/bioinformatics/btx129

Quantifying circular RNA expression from RNA-seq data using model-based framework. / Li, Musheng; Xie, Xueying; Zhou, Jing; Sheng, Mengying; Yin, Xiaofeng; Ko, Eun A.; Zhou, Tong; Gu, Wanjun.

In: Bioinformatics, Vol. 33, No. 14, 15.07.2017, p. 2131-2139.

Research output: Contribution to journalArticle

Li, M, Xie, X, Zhou, J, Sheng, M, Yin, X, Ko, EA, Zhou, T & Gu, W 2017, 'Quantifying circular RNA expression from RNA-seq data using model-based framework', Bioinformatics, vol. 33, no. 14, pp. 2131-2139. https://doi.org/10.1093/bioinformatics/btx129
Li, Musheng ; Xie, Xueying ; Zhou, Jing ; Sheng, Mengying ; Yin, Xiaofeng ; Ko, Eun A. ; Zhou, Tong ; Gu, Wanjun. / Quantifying circular RNA expression from RNA-seq data using model-based framework. In: Bioinformatics. 2017 ; Vol. 33, No. 14. pp. 2131-2139.
@article{e0b216596e274d56a02f1b2149a969c9,
title = "Quantifying circular RNA expression from RNA-seq data using model-based framework",
abstract = "Motivation: Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Results: Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir.",
author = "Musheng Li and Xueying Xie and Jing Zhou and Mengying Sheng and Xiaofeng Yin and Ko, {Eun A.} and Tong Zhou and Wanjun Gu",
year = "2017",
month = "7",
day = "15",
doi = "10.1093/bioinformatics/btx129",
language = "English (US)",
volume = "33",
pages = "2131--2139",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "14",

}

TY - JOUR

T1 - Quantifying circular RNA expression from RNA-seq data using model-based framework

AU - Li, Musheng

AU - Xie, Xueying

AU - Zhou, Jing

AU - Sheng, Mengying

AU - Yin, Xiaofeng

AU - Ko, Eun A.

AU - Zhou, Tong

AU - Gu, Wanjun

PY - 2017/7/15

Y1 - 2017/7/15

N2 - Motivation: Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Results: Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir.

AB - Motivation: Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Results: Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir.

UR - http://www.scopus.com/inward/record.url?scp=85020877053&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020877053&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx129

DO - 10.1093/bioinformatics/btx129

M3 - Article

C2 - 28334396

AN - SCOPUS:85020877053

VL - 33

SP - 2131

EP - 2139

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 14

ER -