The small-world dynamics of tree networks and data mining in phyloinformatics

William H. Piel, Michael Sanderson, Michael J. Donoghue

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Motivation: A noble and ultimate objective of phyloinformatic research is to assemble, synthesize, and explore the evolutionary history of life on earth. Data mining methods for performing these tasks are not yet well developed, but one avenue of research suggests that network connectivity dynamics will play an important role in future methods. Analysis of disordered networks, such as small-world networks, has applications as diverse as disease propagation, collaborative networks, and power grids. Here we apply similar analyses to networks of phylogenetic trees in order to understand how synthetic information can emerge from a database of phylogenies. Results: Analyses of tree network connectivity in Tree-BASE show that a collection of phylogenetic trees behaves as a small-world network-while on the one hand the trees are clustered, like a non-random lattice, on the other hand they have short characteristic path lengths, like a random graph. Tree connectivities follow a dual-scale power-law distribution (first power-law exponent ≈1.87; second ≈4.82). This unusual pattern is due, in part, to the presence of alternative tree topologies that enter the database with each published study. As expected, small collections of trees decrease connectivity as new trees are added, while large collections of trees increase connectivity. However, the inflection point is surprisingly low: after about 600 trees the network suddenly jumps to a higher level of coherence. More stringent definitions of 'neighbour' greatly delay the threshold whence a database achieves sufficient maturity for a coherent network to emerge. However, more stringent definitions of 'neighbour' would also likely show improved focus in data mining.

Original languageEnglish (US)
Pages (from-to)1162-1168
Number of pages7
JournalBioinformatics
Volume19
Issue number9
DOIs
StatePublished - Jun 12 2003
Externally publishedYes

Fingerprint

Tree Networks
Data Mining
Small World
Small-world networks
Data mining
Connectivity
Databases
Network Connectivity
Small-world Network
Phylogenetic Tree
Earth (planet)
Topology
Point of inflection
Phylogeny
Research
Power-law Distribution
Path Length
Random Graphs
History
Power Law

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

The small-world dynamics of tree networks and data mining in phyloinformatics. / Piel, William H.; Sanderson, Michael; Donoghue, Michael J.

In: Bioinformatics, Vol. 19, No. 9, 12.06.2003, p. 1162-1168.

Research output: Contribution to journalArticle

Piel, William H. ; Sanderson, Michael ; Donoghue, Michael J. / The small-world dynamics of tree networks and data mining in phyloinformatics. In: Bioinformatics. 2003 ; Vol. 19, No. 9. pp. 1162-1168.
@article{37fc39aec6514ce2a86ef3ce13e11bc4,
title = "The small-world dynamics of tree networks and data mining in phyloinformatics",
abstract = "Motivation: A noble and ultimate objective of phyloinformatic research is to assemble, synthesize, and explore the evolutionary history of life on earth. Data mining methods for performing these tasks are not yet well developed, but one avenue of research suggests that network connectivity dynamics will play an important role in future methods. Analysis of disordered networks, such as small-world networks, has applications as diverse as disease propagation, collaborative networks, and power grids. Here we apply similar analyses to networks of phylogenetic trees in order to understand how synthetic information can emerge from a database of phylogenies. Results: Analyses of tree network connectivity in Tree-BASE show that a collection of phylogenetic trees behaves as a small-world network-while on the one hand the trees are clustered, like a non-random lattice, on the other hand they have short characteristic path lengths, like a random graph. Tree connectivities follow a dual-scale power-law distribution (first power-law exponent ≈1.87; second ≈4.82). This unusual pattern is due, in part, to the presence of alternative tree topologies that enter the database with each published study. As expected, small collections of trees decrease connectivity as new trees are added, while large collections of trees increase connectivity. However, the inflection point is surprisingly low: after about 600 trees the network suddenly jumps to a higher level of coherence. More stringent definitions of 'neighbour' greatly delay the threshold whence a database achieves sufficient maturity for a coherent network to emerge. However, more stringent definitions of 'neighbour' would also likely show improved focus in data mining.",
author = "Piel, {William H.} and Michael Sanderson and Donoghue, {Michael J.}",
year = "2003",
month = "6",
day = "12",
doi = "10.1093/bioinformatics/btg131",
language = "English (US)",
volume = "19",
pages = "1162--1168",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "9",

}

TY - JOUR

T1 - The small-world dynamics of tree networks and data mining in phyloinformatics

AU - Piel, William H.

AU - Sanderson, Michael

AU - Donoghue, Michael J.

PY - 2003/6/12

Y1 - 2003/6/12

N2 - Motivation: A noble and ultimate objective of phyloinformatic research is to assemble, synthesize, and explore the evolutionary history of life on earth. Data mining methods for performing these tasks are not yet well developed, but one avenue of research suggests that network connectivity dynamics will play an important role in future methods. Analysis of disordered networks, such as small-world networks, has applications as diverse as disease propagation, collaborative networks, and power grids. Here we apply similar analyses to networks of phylogenetic trees in order to understand how synthetic information can emerge from a database of phylogenies. Results: Analyses of tree network connectivity in Tree-BASE show that a collection of phylogenetic trees behaves as a small-world network-while on the one hand the trees are clustered, like a non-random lattice, on the other hand they have short characteristic path lengths, like a random graph. Tree connectivities follow a dual-scale power-law distribution (first power-law exponent ≈1.87; second ≈4.82). This unusual pattern is due, in part, to the presence of alternative tree topologies that enter the database with each published study. As expected, small collections of trees decrease connectivity as new trees are added, while large collections of trees increase connectivity. However, the inflection point is surprisingly low: after about 600 trees the network suddenly jumps to a higher level of coherence. More stringent definitions of 'neighbour' greatly delay the threshold whence a database achieves sufficient maturity for a coherent network to emerge. However, more stringent definitions of 'neighbour' would also likely show improved focus in data mining.

AB - Motivation: A noble and ultimate objective of phyloinformatic research is to assemble, synthesize, and explore the evolutionary history of life on earth. Data mining methods for performing these tasks are not yet well developed, but one avenue of research suggests that network connectivity dynamics will play an important role in future methods. Analysis of disordered networks, such as small-world networks, has applications as diverse as disease propagation, collaborative networks, and power grids. Here we apply similar analyses to networks of phylogenetic trees in order to understand how synthetic information can emerge from a database of phylogenies. Results: Analyses of tree network connectivity in Tree-BASE show that a collection of phylogenetic trees behaves as a small-world network-while on the one hand the trees are clustered, like a non-random lattice, on the other hand they have short characteristic path lengths, like a random graph. Tree connectivities follow a dual-scale power-law distribution (first power-law exponent ≈1.87; second ≈4.82). This unusual pattern is due, in part, to the presence of alternative tree topologies that enter the database with each published study. As expected, small collections of trees decrease connectivity as new trees are added, while large collections of trees increase connectivity. However, the inflection point is surprisingly low: after about 600 trees the network suddenly jumps to a higher level of coherence. More stringent definitions of 'neighbour' greatly delay the threshold whence a database achieves sufficient maturity for a coherent network to emerge. However, more stringent definitions of 'neighbour' would also likely show improved focus in data mining.

UR - http://www.scopus.com/inward/record.url?scp=0038729534&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038729534&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btg131

DO - 10.1093/bioinformatics/btg131

M3 - Article

C2 - 12801879

AN - SCOPUS:0038729534

VL - 19

SP - 1162

EP - 1168

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 9

ER -