HiWalk: Learning node embeddings from heterogeneous networks

Jie Bai, Linjing Li, Dajun Zeng

Research output: Contribution to journalArticle

Abstract

Heterogeneous networks, such as bibliographical networks and online business networks, are ubiquitous in everyday life. Nevertheless, analyzing them for high-level semantic understanding still poses a great challenge for modern information systems. In this paper, we propose HiWalk to learn distributed vector representations of the nodes in heterogeneous networks. HiWalk is inspired by the state-of-the-art representation learning algorithms employed in the context of both homogeneous networks and heterogeneous networks, based on word embedding learning models. Different from existing methods in the literature, the purpose of HiWalk is to learn vector representations of the targeted set of nodes by leveraging the other nodes as “background knowledge” which maximizes the structural correlations of contiguous nodes. HiWalk decomposes the adjacent probabilities of the nodes and adopts a hierarchical random walk strategy, which makes it more effective, efficient and concentrated when applied to practical large-scale heterogeneous networks. HiWalk can be widely applied in heterogeneous networks environments to analyze targeted types of nodes. We further validate the effectiveness of the proposed HiWalk through multiple tasks conducted on two real-world datasets.

Original languageEnglish (US)
Pages (from-to)82-91
Number of pages10
JournalInformation Systems
Volume81
DOIs
StatePublished - Mar 1 2019

Fingerprint

Heterogeneous networks
Learning algorithms
Information systems
Semantics
Industry

Keywords

  • Behavioral analysis
  • Heterogeneous network
  • Network analysis
  • Random walk
  • Representation learning

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

HiWalk : Learning node embeddings from heterogeneous networks. / Bai, Jie; Li, Linjing; Zeng, Dajun.

In: Information Systems, Vol. 81, 01.03.2019, p. 82-91.

Research output: Contribution to journalArticle

@article{defa3c32d6084058a4328975164b5ec6,
title = "HiWalk: Learning node embeddings from heterogeneous networks",
abstract = "Heterogeneous networks, such as bibliographical networks and online business networks, are ubiquitous in everyday life. Nevertheless, analyzing them for high-level semantic understanding still poses a great challenge for modern information systems. In this paper, we propose HiWalk to learn distributed vector representations of the nodes in heterogeneous networks. HiWalk is inspired by the state-of-the-art representation learning algorithms employed in the context of both homogeneous networks and heterogeneous networks, based on word embedding learning models. Different from existing methods in the literature, the purpose of HiWalk is to learn vector representations of the targeted set of nodes by leveraging the other nodes as “background knowledge” which maximizes the structural correlations of contiguous nodes. HiWalk decomposes the adjacent probabilities of the nodes and adopts a hierarchical random walk strategy, which makes it more effective, efficient and concentrated when applied to practical large-scale heterogeneous networks. HiWalk can be widely applied in heterogeneous networks environments to analyze targeted types of nodes. We further validate the effectiveness of the proposed HiWalk through multiple tasks conducted on two real-world datasets.",
keywords = "Behavioral analysis, Heterogeneous network, Network analysis, Random walk, Representation learning",
author = "Jie Bai and Linjing Li and Dajun Zeng",
year = "2019",
month = "3",
day = "1",
doi = "10.1016/j.is.2018.11.008",
language = "English (US)",
volume = "81",
pages = "82--91",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - HiWalk

T2 - Learning node embeddings from heterogeneous networks

AU - Bai, Jie

AU - Li, Linjing

AU - Zeng, Dajun

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Heterogeneous networks, such as bibliographical networks and online business networks, are ubiquitous in everyday life. Nevertheless, analyzing them for high-level semantic understanding still poses a great challenge for modern information systems. In this paper, we propose HiWalk to learn distributed vector representations of the nodes in heterogeneous networks. HiWalk is inspired by the state-of-the-art representation learning algorithms employed in the context of both homogeneous networks and heterogeneous networks, based on word embedding learning models. Different from existing methods in the literature, the purpose of HiWalk is to learn vector representations of the targeted set of nodes by leveraging the other nodes as “background knowledge” which maximizes the structural correlations of contiguous nodes. HiWalk decomposes the adjacent probabilities of the nodes and adopts a hierarchical random walk strategy, which makes it more effective, efficient and concentrated when applied to practical large-scale heterogeneous networks. HiWalk can be widely applied in heterogeneous networks environments to analyze targeted types of nodes. We further validate the effectiveness of the proposed HiWalk through multiple tasks conducted on two real-world datasets.

AB - Heterogeneous networks, such as bibliographical networks and online business networks, are ubiquitous in everyday life. Nevertheless, analyzing them for high-level semantic understanding still poses a great challenge for modern information systems. In this paper, we propose HiWalk to learn distributed vector representations of the nodes in heterogeneous networks. HiWalk is inspired by the state-of-the-art representation learning algorithms employed in the context of both homogeneous networks and heterogeneous networks, based on word embedding learning models. Different from existing methods in the literature, the purpose of HiWalk is to learn vector representations of the targeted set of nodes by leveraging the other nodes as “background knowledge” which maximizes the structural correlations of contiguous nodes. HiWalk decomposes the adjacent probabilities of the nodes and adopts a hierarchical random walk strategy, which makes it more effective, efficient and concentrated when applied to practical large-scale heterogeneous networks. HiWalk can be widely applied in heterogeneous networks environments to analyze targeted types of nodes. We further validate the effectiveness of the proposed HiWalk through multiple tasks conducted on two real-world datasets.

KW - Behavioral analysis

KW - Heterogeneous network

KW - Network analysis

KW - Random walk

KW - Representation learning

UR - http://www.scopus.com/inward/record.url?scp=85058025085&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058025085&partnerID=8YFLogxK

U2 - 10.1016/j.is.2018.11.008

DO - 10.1016/j.is.2018.11.008

M3 - Article

AN - SCOPUS:85058025085

VL - 81

SP - 82

EP - 91

JO - Information Systems

JF - Information Systems

SN - 0306-4379

ER -