Latent space inference of internet-scale networks

Qirong Ho, Junming Yin, Eric P. Xing

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The rise of Internet-scale networks, such as web graphs and social media with hundreds of millions to billions of nodes, presents new scientific opportunities, such as overlapping community detection to discover the structure of the Internet, or to analyze trends in on-line social behavior. However, many existing probabilistic network models are difficult or impossible to deploy at these massive scales. We propose a scalable approach for modeling and inferring latent spaces in Internet-scale networks, with an eye towards overlapping community detection as a key application. By applying a succinct representation of networks as a bag of triangular motifs, developing a parsimonious statistical model, deriving an efficient stochastic variational inference algorithm, and implementing it as a distributed cluster program via the Petuum parameter server system, we demonstrate overlapping community detection on real networks with up to 100 million nodes and 1000 communities on 5 machines in under 40 hours. Compared to other state-of-the-art probabilistic network approaches, our method is several orders of magnitude faster, with competitive or improved accuracy at overlapping community detection.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume17
StatePublished - Apr 1 2016

Fingerprint

Community Detection
Internet
Overlapping
Web Graph
Computer systems
Servers
Social Behavior
Social Media
Vertex of a graph
Probabilistic Model
Statistical Model
Network Model
Triangular
Server
Modeling
Demonstrate

Keywords

  • Big data
  • Distributed computation
  • Probabilistic network models
  • Stochastic variational inference
  • Triangular modeling

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Latent space inference of internet-scale networks. / Ho, Qirong; Yin, Junming; Xing, Eric P.

In: Journal of Machine Learning Research, Vol. 17, 01.04.2016.

Research output: Contribution to journalArticle

@article{9888243d9873431a81d036f5f5acc3ec,
title = "Latent space inference of internet-scale networks",
abstract = "The rise of Internet-scale networks, such as web graphs and social media with hundreds of millions to billions of nodes, presents new scientific opportunities, such as overlapping community detection to discover the structure of the Internet, or to analyze trends in on-line social behavior. However, many existing probabilistic network models are difficult or impossible to deploy at these massive scales. We propose a scalable approach for modeling and inferring latent spaces in Internet-scale networks, with an eye towards overlapping community detection as a key application. By applying a succinct representation of networks as a bag of triangular motifs, developing a parsimonious statistical model, deriving an efficient stochastic variational inference algorithm, and implementing it as a distributed cluster program via the Petuum parameter server system, we demonstrate overlapping community detection on real networks with up to 100 million nodes and 1000 communities on 5 machines in under 40 hours. Compared to other state-of-the-art probabilistic network approaches, our method is several orders of magnitude faster, with competitive or improved accuracy at overlapping community detection.",
keywords = "Big data, Distributed computation, Probabilistic network models, Stochastic variational inference, Triangular modeling",
author = "Qirong Ho and Junming Yin and Xing, {Eric P.}",
year = "2016",
month = "4",
day = "1",
language = "English (US)",
volume = "17",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Latent space inference of internet-scale networks

AU - Ho, Qirong

AU - Yin, Junming

AU - Xing, Eric P.

PY - 2016/4/1

Y1 - 2016/4/1

N2 - The rise of Internet-scale networks, such as web graphs and social media with hundreds of millions to billions of nodes, presents new scientific opportunities, such as overlapping community detection to discover the structure of the Internet, or to analyze trends in on-line social behavior. However, many existing probabilistic network models are difficult or impossible to deploy at these massive scales. We propose a scalable approach for modeling and inferring latent spaces in Internet-scale networks, with an eye towards overlapping community detection as a key application. By applying a succinct representation of networks as a bag of triangular motifs, developing a parsimonious statistical model, deriving an efficient stochastic variational inference algorithm, and implementing it as a distributed cluster program via the Petuum parameter server system, we demonstrate overlapping community detection on real networks with up to 100 million nodes and 1000 communities on 5 machines in under 40 hours. Compared to other state-of-the-art probabilistic network approaches, our method is several orders of magnitude faster, with competitive or improved accuracy at overlapping community detection.

AB - The rise of Internet-scale networks, such as web graphs and social media with hundreds of millions to billions of nodes, presents new scientific opportunities, such as overlapping community detection to discover the structure of the Internet, or to analyze trends in on-line social behavior. However, many existing probabilistic network models are difficult or impossible to deploy at these massive scales. We propose a scalable approach for modeling and inferring latent spaces in Internet-scale networks, with an eye towards overlapping community detection as a key application. By applying a succinct representation of networks as a bag of triangular motifs, developing a parsimonious statistical model, deriving an efficient stochastic variational inference algorithm, and implementing it as a distributed cluster program via the Petuum parameter server system, we demonstrate overlapping community detection on real networks with up to 100 million nodes and 1000 communities on 5 machines in under 40 hours. Compared to other state-of-the-art probabilistic network approaches, our method is several orders of magnitude faster, with competitive or improved accuracy at overlapping community detection.

KW - Big data

KW - Distributed computation

KW - Probabilistic network models

KW - Stochastic variational inference

KW - Triangular modeling

UR - http://www.scopus.com/inward/record.url?scp=84979872981&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979872981&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84979872981

VL - 17

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -