The recent de Novo origin of protein C-termini

Matthew E. Andreatta, Joshua A. Levine, Scott G. Foy, Lynette D. Guzman, Luke J. Kosinski, Matthew Hj Cordes, Joanna Masel

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo fromnoncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish fromfalse positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes,we are able to apply a variety of stringent quality filters to our annotations ofwhat is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of themrecent enough to still be polymorphic.We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (toADH1,ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.

Original languageEnglish (US)
Pages (from-to)1686-1701
Number of pages16
JournalGenome Biology and Evolution
Volume7
Issue number6
DOIs
StatePublished - 2015

Fingerprint

Protein C
Saccharomyces
protein
gene
Proteins
genes
proteins
Genes
true protein
protein depletion
gene fusion
Terminator Codon
stop codon
Gene Fusion
filter
domain structure
Drosophila
divergence
DNA

Keywords

  • Gene birth
  • Origin of novelty
  • Protein structure
  • Stop codon readthrough

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Medicine(all)

Cite this

Andreatta, M. E., Levine, J. A., Foy, S. G., Guzman, L. D., Kosinski, L. J., Cordes, M. H., & Masel, J. (2015). The recent de Novo origin of protein C-termini. Genome Biology and Evolution, 7(6), 1686-1701. https://doi.org/10.1093/gbe/evv098

The recent de Novo origin of protein C-termini. / Andreatta, Matthew E.; Levine, Joshua A.; Foy, Scott G.; Guzman, Lynette D.; Kosinski, Luke J.; Cordes, Matthew Hj; Masel, Joanna.

In: Genome Biology and Evolution, Vol. 7, No. 6, 2015, p. 1686-1701.

Research output: Contribution to journalArticle

Andreatta, ME, Levine, JA, Foy, SG, Guzman, LD, Kosinski, LJ, Cordes, MH & Masel, J 2015, 'The recent de Novo origin of protein C-termini', Genome Biology and Evolution, vol. 7, no. 6, pp. 1686-1701. https://doi.org/10.1093/gbe/evv098
Andreatta ME, Levine JA, Foy SG, Guzman LD, Kosinski LJ, Cordes MH et al. The recent de Novo origin of protein C-termini. Genome Biology and Evolution. 2015;7(6):1686-1701. https://doi.org/10.1093/gbe/evv098
Andreatta, Matthew E. ; Levine, Joshua A. ; Foy, Scott G. ; Guzman, Lynette D. ; Kosinski, Luke J. ; Cordes, Matthew Hj ; Masel, Joanna. / The recent de Novo origin of protein C-termini. In: Genome Biology and Evolution. 2015 ; Vol. 7, No. 6. pp. 1686-1701.
@article{9bdbc77bf1a24d0590359b6f0030cb8f,
title = "The recent de Novo origin of protein C-termini",
abstract = "Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo fromnoncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish fromfalse positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes,we are able to apply a variety of stringent quality filters to our annotations ofwhat is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of themrecent enough to still be polymorphic.We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (toADH1,ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.",
keywords = "Gene birth, Origin of novelty, Protein structure, Stop codon readthrough",
author = "Andreatta, {Matthew E.} and Levine, {Joshua A.} and Foy, {Scott G.} and Guzman, {Lynette D.} and Kosinski, {Luke J.} and Cordes, {Matthew Hj} and Joanna Masel",
year = "2015",
doi = "10.1093/gbe/evv098",
language = "English (US)",
volume = "7",
pages = "1686--1701",
journal = "Genome Biology and Evolution",
issn = "1759-6653",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - The recent de Novo origin of protein C-termini

AU - Andreatta, Matthew E.

AU - Levine, Joshua A.

AU - Foy, Scott G.

AU - Guzman, Lynette D.

AU - Kosinski, Luke J.

AU - Cordes, Matthew Hj

AU - Masel, Joanna

PY - 2015

Y1 - 2015

N2 - Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo fromnoncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish fromfalse positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes,we are able to apply a variety of stringent quality filters to our annotations ofwhat is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of themrecent enough to still be polymorphic.We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (toADH1,ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.

AB - Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo fromnoncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish fromfalse positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes,we are able to apply a variety of stringent quality filters to our annotations ofwhat is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of themrecent enough to still be polymorphic.We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (toADH1,ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.

KW - Gene birth

KW - Origin of novelty

KW - Protein structure

KW - Stop codon readthrough

UR - http://www.scopus.com/inward/record.url?scp=84979854784&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979854784&partnerID=8YFLogxK

U2 - 10.1093/gbe/evv098

DO - 10.1093/gbe/evv098

M3 - Article

C2 - 26002864

AN - SCOPUS:84979854784

VL - 7

SP - 1686

EP - 1701

JO - Genome Biology and Evolution

JF - Genome Biology and Evolution

SN - 1759-6653

IS - 6

ER -