Extracting conflict-free information from multi-labeled trees

Akshay Deepak, David Fernández-Baca, Michelle M Mcmahon

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.

Original languageEnglish (US)
Article number18
JournalAlgorithms for Molecular Biology
Volume8
Issue number1
DOIs
StatePublished - Jul 9 2013

Fingerprint

Labeled Trees
Labels
Information Content
Data reduction
Topology
Leaves
Experiments
Conflict
Data Reduction
Phylogenetic Tree
Phylogenetics
Equivalence relation
Pruning
Names
Multiplicity
Efficient Algorithms
Imply

Keywords

  • Evolutionary trees
  • Multi-labeled trees
  • Phylogenetic trees
  • Reduction
  • Singly-labeled trees

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics
  • Molecular Biology
  • Structural Biology

Cite this

Extracting conflict-free information from multi-labeled trees. / Deepak, Akshay; Fernández-Baca, David; Mcmahon, Michelle M.

In: Algorithms for Molecular Biology, Vol. 8, No. 1, 18, 09.07.2013.

Research output: Contribution to journalArticle

@article{54ae86ee4b0c4f98a5270dbe59414e22,
title = "Extracting conflict-free information from multi-labeled trees",
abstract = "Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.",
keywords = "Evolutionary trees, Multi-labeled trees, Phylogenetic trees, Reduction, Singly-labeled trees",
author = "Akshay Deepak and David Fern{\'a}ndez-Baca and Mcmahon, {Michelle M}",
year = "2013",
month = "7",
day = "9",
doi = "10.1186/1748-7188-8-18",
language = "English (US)",
volume = "8",
journal = "Algorithms for Molecular Biology",
issn = "1748-7188",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Extracting conflict-free information from multi-labeled trees

AU - Deepak, Akshay

AU - Fernández-Baca, David

AU - Mcmahon, Michelle M

PY - 2013/7/9

Y1 - 2013/7/9

N2 - Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.

AB - Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.

KW - Evolutionary trees

KW - Multi-labeled trees

KW - Phylogenetic trees

KW - Reduction

KW - Singly-labeled trees

UR - http://www.scopus.com/inward/record.url?scp=84880001000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880001000&partnerID=8YFLogxK

U2 - 10.1186/1748-7188-8-18

DO - 10.1186/1748-7188-8-18

M3 - Article

VL - 8

JO - Algorithms for Molecular Biology

JF - Algorithms for Molecular Biology

SN - 1748-7188

IS - 1

M1 - 18

ER -