EvoMiner

frequent subtree mining in phylogenetic databases

Akshay Deepak, David Fernández-Baca, Srikanta Tirthapura, Michael Sanderson, Michelle M Mcmahon

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like levelwise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest-common-ancestor-based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speedups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth-first enumeration mode to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority-rule trees—two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set.

Original languageEnglish (US)
Pages (from-to)559-590
Number of pages32
JournalKnowledge and Information Systems
Volume41
Issue number3
DOIs
StatePublished - Nov 7 2014

Fingerprint

Data storage equipment
Phylogeny

Keywords

  • Data mining
  • Evolutionary bioinformatics
  • Maximum agreement subtree
  • Pattern discovery
  • Phylogenetics

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Information Systems
  • Hardware and Architecture
  • Human-Computer Interaction

Cite this

EvoMiner : frequent subtree mining in phylogenetic databases. / Deepak, Akshay; Fernández-Baca, David; Tirthapura, Srikanta; Sanderson, Michael; Mcmahon, Michelle M.

In: Knowledge and Information Systems, Vol. 41, No. 3, 07.11.2014, p. 559-590.

Research output: Contribution to journalArticle

Deepak, Akshay ; Fernández-Baca, David ; Tirthapura, Srikanta ; Sanderson, Michael ; Mcmahon, Michelle M. / EvoMiner : frequent subtree mining in phylogenetic databases. In: Knowledge and Information Systems. 2014 ; Vol. 41, No. 3. pp. 559-590.
@article{0a7c589f433b4e4286204b91907a2581,
title = "EvoMiner: frequent subtree mining in phylogenetic databases",
abstract = "The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like levelwise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest-common-ancestor-based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speedups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth-first enumeration mode to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority-rule trees—two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set.",
keywords = "Data mining, Evolutionary bioinformatics, Maximum agreement subtree, Pattern discovery, Phylogenetics",
author = "Akshay Deepak and David Fern{\'a}ndez-Baca and Srikanta Tirthapura and Michael Sanderson and Mcmahon, {Michelle M}",
year = "2014",
month = "11",
day = "7",
doi = "10.1007/s10115-013-0676-0",
language = "English (US)",
volume = "41",
pages = "559--590",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer London",
number = "3",

}

TY - JOUR

T1 - EvoMiner

T2 - frequent subtree mining in phylogenetic databases

AU - Deepak, Akshay

AU - Fernández-Baca, David

AU - Tirthapura, Srikanta

AU - Sanderson, Michael

AU - Mcmahon, Michelle M

PY - 2014/11/7

Y1 - 2014/11/7

N2 - The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like levelwise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest-common-ancestor-based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speedups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth-first enumeration mode to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority-rule trees—two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set.

AB - The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like levelwise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest-common-ancestor-based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speedups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth-first enumeration mode to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority-rule trees—two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set.

KW - Data mining

KW - Evolutionary bioinformatics

KW - Maximum agreement subtree

KW - Pattern discovery

KW - Phylogenetics

UR - http://www.scopus.com/inward/record.url?scp=84911978030&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911978030&partnerID=8YFLogxK

U2 - 10.1007/s10115-013-0676-0

DO - 10.1007/s10115-013-0676-0

M3 - Article

VL - 41

SP - 559

EP - 590

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 3

ER -