Evolinc

A tool for the identification and evolutionary comparison of long intergenic non-coding RNAs

Andrew D.L. Nelson, Upendra K. Devisetty, Kyle Palos, Asher K. Haug-Baltzell, Eric H Lyons, Mark A Beilstein

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Long intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90% of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on their emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analysis workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Here we validate lincRNA catalogs generated with Evolinc-I against previously annotated Arabidopsis and human lincRNA data. Evolinc-I recapitulated earlier findings and uncovered an additional 70 Arabidopsis and 43 human lincRNAs. We demonstrate the usefulness of Evolinc-II by examining the evolutionary histories of a public dataset of 5,361 Arabidopsis lincRNAs. We used Evolinc-II to winnow this dataset to 40 lincRNAs conserved across species in Brassicaceae. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). These latter analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse's Discovery Environment and is free for use by researchers.

Original languageEnglish (US)
Article number52
JournalFrontiers in Genetics
Volume8
Issue numberMAY
DOIs
StatePublished - May 15 2017

Fingerprint

Long Noncoding RNA
Arabidopsis
Workflow
Mammals
Genome
Brassicaceae

Keywords

  • Comparative genomics
  • Comparative transcriptomics
  • Evolution
  • LincRNAs
  • Molecular
  • Pipeline

ASJC Scopus subject areas

  • Molecular Medicine
  • Genetics
  • Genetics(clinical)

Cite this

Evolinc : A tool for the identification and evolutionary comparison of long intergenic non-coding RNAs. / Nelson, Andrew D.L.; Devisetty, Upendra K.; Palos, Kyle; Haug-Baltzell, Asher K.; Lyons, Eric H; Beilstein, Mark A.

In: Frontiers in Genetics, Vol. 8, No. MAY, 52, 15.05.2017.

Research output: Contribution to journalArticle

Nelson, Andrew D.L. ; Devisetty, Upendra K. ; Palos, Kyle ; Haug-Baltzell, Asher K. ; Lyons, Eric H ; Beilstein, Mark A. / Evolinc : A tool for the identification and evolutionary comparison of long intergenic non-coding RNAs. In: Frontiers in Genetics. 2017 ; Vol. 8, No. MAY.
@article{71894f7863d24aadab5baeae70a4904f,
title = "Evolinc: A tool for the identification and evolutionary comparison of long intergenic non-coding RNAs",
abstract = "Long intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90{\%} of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on their emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analysis workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Here we validate lincRNA catalogs generated with Evolinc-I against previously annotated Arabidopsis and human lincRNA data. Evolinc-I recapitulated earlier findings and uncovered an additional 70 Arabidopsis and 43 human lincRNAs. We demonstrate the usefulness of Evolinc-II by examining the evolutionary histories of a public dataset of 5,361 Arabidopsis lincRNAs. We used Evolinc-II to winnow this dataset to 40 lincRNAs conserved across species in Brassicaceae. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). These latter analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse's Discovery Environment and is free for use by researchers.",
keywords = "Comparative genomics, Comparative transcriptomics, Evolution, LincRNAs, Molecular, Pipeline",
author = "Nelson, {Andrew D.L.} and Devisetty, {Upendra K.} and Kyle Palos and Haug-Baltzell, {Asher K.} and Lyons, {Eric H} and Beilstein, {Mark A}",
year = "2017",
month = "5",
day = "15",
doi = "10.3389/fgene.2017.00052",
language = "English (US)",
volume = "8",
journal = "Frontiers in Genetics",
issn = "1664-8021",
publisher = "Frontiers Media S. A.",
number = "MAY",

}

TY - JOUR

T1 - Evolinc

T2 - A tool for the identification and evolutionary comparison of long intergenic non-coding RNAs

AU - Nelson, Andrew D.L.

AU - Devisetty, Upendra K.

AU - Palos, Kyle

AU - Haug-Baltzell, Asher K.

AU - Lyons, Eric H

AU - Beilstein, Mark A

PY - 2017/5/15

Y1 - 2017/5/15

N2 - Long intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90% of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on their emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analysis workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Here we validate lincRNA catalogs generated with Evolinc-I against previously annotated Arabidopsis and human lincRNA data. Evolinc-I recapitulated earlier findings and uncovered an additional 70 Arabidopsis and 43 human lincRNAs. We demonstrate the usefulness of Evolinc-II by examining the evolutionary histories of a public dataset of 5,361 Arabidopsis lincRNAs. We used Evolinc-II to winnow this dataset to 40 lincRNAs conserved across species in Brassicaceae. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). These latter analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse's Discovery Environment and is free for use by researchers.

AB - Long intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90% of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on their emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analysis workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Here we validate lincRNA catalogs generated with Evolinc-I against previously annotated Arabidopsis and human lincRNA data. Evolinc-I recapitulated earlier findings and uncovered an additional 70 Arabidopsis and 43 human lincRNAs. We demonstrate the usefulness of Evolinc-II by examining the evolutionary histories of a public dataset of 5,361 Arabidopsis lincRNAs. We used Evolinc-II to winnow this dataset to 40 lincRNAs conserved across species in Brassicaceae. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). These latter analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse's Discovery Environment and is free for use by researchers.

KW - Comparative genomics

KW - Comparative transcriptomics

KW - Evolution

KW - LincRNAs

KW - Molecular

KW - Pipeline

UR - http://www.scopus.com/inward/record.url?scp=85019843508&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019843508&partnerID=8YFLogxK

U2 - 10.3389/fgene.2017.00052

DO - 10.3389/fgene.2017.00052

M3 - Article

VL - 8

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

IS - MAY

M1 - 52

ER -