Gene families as soft cliques with backbones

Amborellacontrasted with other flowering plants

Chunfang Zheng, Alexey Kononenko, Jim Leebens-Mack, Eric H Lyons, David Sankoff

Research output: Contribution to journalArticle

Abstract

Background: Chaining is a major problem in constructing gene families. Results: We define a new kind of cluster on graphs with strong and weak edges: soft cliques with backbones (SCWiB). This differs from other definitions in how it controls the "chaining effect", by ensuring clusters satisfy a tolerant edge density criterion that takes into account cluster size. We implement algorithms for decomposing a graph of similarities into SCWiBs. We compare examples of output from SCWiB and the Markov Cluster Algorithm (MCL), and also compare some curated Arabidopsis thaliana gene families with the results of automatic clustering. We apply our method to 44 published angiosperm genomes with annotation, and discover that Amborella trichopoda is distinct from all the others in having substantially and systematically smaller proportions of moderate- and large-size gene families. Conclusions: We offer several possible evolutionary explanations for this result.

Original languageEnglish (US)
Article numberS8
JournalBMC Genomics
Volume15
Issue number6
DOIs
StatePublished - Oct 17 2014

Fingerprint

Genes
Angiosperms
Arabidopsis
Cluster Analysis
Genome

Keywords

  • Amborella trichopeda
  • Angiosperms
  • Clustering
  • Gene families
  • S-plex

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Gene families as soft cliques with backbones : Amborellacontrasted with other flowering plants. / Zheng, Chunfang; Kononenko, Alexey; Leebens-Mack, Jim; Lyons, Eric H; Sankoff, David.

In: BMC Genomics, Vol. 15, No. 6, S8, 17.10.2014.

Research output: Contribution to journalArticle

Zheng, Chunfang ; Kononenko, Alexey ; Leebens-Mack, Jim ; Lyons, Eric H ; Sankoff, David. / Gene families as soft cliques with backbones : Amborellacontrasted with other flowering plants. In: BMC Genomics. 2014 ; Vol. 15, No. 6.
@article{aede8e31ac84442bba356217cb2ee63f,
title = "Gene families as soft cliques with backbones: Amborellacontrasted with other flowering plants",
abstract = "Background: Chaining is a major problem in constructing gene families. Results: We define a new kind of cluster on graphs with strong and weak edges: soft cliques with backbones (SCWiB). This differs from other definitions in how it controls the {"}chaining effect{"}, by ensuring clusters satisfy a tolerant edge density criterion that takes into account cluster size. We implement algorithms for decomposing a graph of similarities into SCWiBs. We compare examples of output from SCWiB and the Markov Cluster Algorithm (MCL), and also compare some curated Arabidopsis thaliana gene families with the results of automatic clustering. We apply our method to 44 published angiosperm genomes with annotation, and discover that Amborella trichopoda is distinct from all the others in having substantially and systematically smaller proportions of moderate- and large-size gene families. Conclusions: We offer several possible evolutionary explanations for this result.",
keywords = "Amborella trichopeda, Angiosperms, Clustering, Gene families, S-plex",
author = "Chunfang Zheng and Alexey Kononenko and Jim Leebens-Mack and Lyons, {Eric H} and David Sankoff",
year = "2014",
month = "10",
day = "17",
doi = "10.1186/1471-2164-15-S6-S8",
language = "English (US)",
volume = "15",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "6",

}

TY - JOUR

T1 - Gene families as soft cliques with backbones

T2 - Amborellacontrasted with other flowering plants

AU - Zheng, Chunfang

AU - Kononenko, Alexey

AU - Leebens-Mack, Jim

AU - Lyons, Eric H

AU - Sankoff, David

PY - 2014/10/17

Y1 - 2014/10/17

N2 - Background: Chaining is a major problem in constructing gene families. Results: We define a new kind of cluster on graphs with strong and weak edges: soft cliques with backbones (SCWiB). This differs from other definitions in how it controls the "chaining effect", by ensuring clusters satisfy a tolerant edge density criterion that takes into account cluster size. We implement algorithms for decomposing a graph of similarities into SCWiBs. We compare examples of output from SCWiB and the Markov Cluster Algorithm (MCL), and also compare some curated Arabidopsis thaliana gene families with the results of automatic clustering. We apply our method to 44 published angiosperm genomes with annotation, and discover that Amborella trichopoda is distinct from all the others in having substantially and systematically smaller proportions of moderate- and large-size gene families. Conclusions: We offer several possible evolutionary explanations for this result.

AB - Background: Chaining is a major problem in constructing gene families. Results: We define a new kind of cluster on graphs with strong and weak edges: soft cliques with backbones (SCWiB). This differs from other definitions in how it controls the "chaining effect", by ensuring clusters satisfy a tolerant edge density criterion that takes into account cluster size. We implement algorithms for decomposing a graph of similarities into SCWiBs. We compare examples of output from SCWiB and the Markov Cluster Algorithm (MCL), and also compare some curated Arabidopsis thaliana gene families with the results of automatic clustering. We apply our method to 44 published angiosperm genomes with annotation, and discover that Amborella trichopoda is distinct from all the others in having substantially and systematically smaller proportions of moderate- and large-size gene families. Conclusions: We offer several possible evolutionary explanations for this result.

KW - Amborella trichopeda

KW - Angiosperms

KW - Clustering

KW - Gene families

KW - S-plex

UR - http://www.scopus.com/inward/record.url?scp=84971229222&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971229222&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-15-S6-S8

DO - 10.1186/1471-2164-15-S6-S8

M3 - Article

VL - 15

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 6

M1 - S8

ER -