A hierarchical Naïve Bayes model for approximate identity matching

G. Alan Wang, Homa Atabakhsh, Hsinchun Chen

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Organizations often manage identity information for their customers, vendors, and employees. Identity management is critical to various organizational practices ranging from customer relationship management to crime investigation. The task of searching for a specific identity is difficult because disparate identity information may exist due to the issues related to unintentional errors and intentional deception. In this paper we propose a hierarchical Naïve Bayes model that improves existing identity matching techniques in terms of searching effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based matching technique. With 50% training instances labeled, the proposed semi-supervised learning achieves a performance comparable to the fully supervised record comparison algorithm. The semi-supervised learning greatly reduces the efforts of manually labeling training instances without significant performance degradation.

Original languageEnglish (US)
Pages (from-to)413-423
Number of pages11
JournalDecision Support Systems
Volume51
Issue number3
DOIs
StatePublished - Jun 2011

Fingerprint

Supervised learning
Crime
Deception
Labeling
Organizations
Personnel
Degradation
Experiments
Supervised Machine Learning
Bayes Model
Nave
Hierarchical Bayes model
Semi-supervised learning

Keywords

  • EM algorithm
  • Entity matching
  • Hierarchical Naïve Bayes model
  • Identity management
  • Semi-supervised learning

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems
  • Information Systems and Management

Cite this

A hierarchical Naïve Bayes model for approximate identity matching. / Wang, G. Alan; Atabakhsh, Homa; Chen, Hsinchun.

In: Decision Support Systems, Vol. 51, No. 3, 06.2011, p. 413-423.

Research output: Contribution to journalArticle

Wang, G. Alan ; Atabakhsh, Homa ; Chen, Hsinchun. / A hierarchical Naïve Bayes model for approximate identity matching. In: Decision Support Systems. 2011 ; Vol. 51, No. 3. pp. 413-423.
@article{0b4578000dd44531b109ea61bbc6967b,
title = "A hierarchical Na{\"i}ve Bayes model for approximate identity matching",
abstract = "Organizations often manage identity information for their customers, vendors, and employees. Identity management is critical to various organizational practices ranging from customer relationship management to crime investigation. The task of searching for a specific identity is difficult because disparate identity information may exist due to the issues related to unintentional errors and intentional deception. In this paper we propose a hierarchical Na{\"i}ve Bayes model that improves existing identity matching techniques in terms of searching effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based matching technique. With 50{\%} training instances labeled, the proposed semi-supervised learning achieves a performance comparable to the fully supervised record comparison algorithm. The semi-supervised learning greatly reduces the efforts of manually labeling training instances without significant performance degradation.",
keywords = "EM algorithm, Entity matching, Hierarchical Na{\"i}ve Bayes model, Identity management, Semi-supervised learning",
author = "Wang, {G. Alan} and Homa Atabakhsh and Hsinchun Chen",
year = "2011",
month = "6",
doi = "10.1016/j.dss.2011.01.007",
language = "English (US)",
volume = "51",
pages = "413--423",
journal = "Decision Support Systems",
issn = "0167-9236",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - A hierarchical Naïve Bayes model for approximate identity matching

AU - Wang, G. Alan

AU - Atabakhsh, Homa

AU - Chen, Hsinchun

PY - 2011/6

Y1 - 2011/6

N2 - Organizations often manage identity information for their customers, vendors, and employees. Identity management is critical to various organizational practices ranging from customer relationship management to crime investigation. The task of searching for a specific identity is difficult because disparate identity information may exist due to the issues related to unintentional errors and intentional deception. In this paper we propose a hierarchical Naïve Bayes model that improves existing identity matching techniques in terms of searching effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based matching technique. With 50% training instances labeled, the proposed semi-supervised learning achieves a performance comparable to the fully supervised record comparison algorithm. The semi-supervised learning greatly reduces the efforts of manually labeling training instances without significant performance degradation.

AB - Organizations often manage identity information for their customers, vendors, and employees. Identity management is critical to various organizational practices ranging from customer relationship management to crime investigation. The task of searching for a specific identity is difficult because disparate identity information may exist due to the issues related to unintentional errors and intentional deception. In this paper we propose a hierarchical Naïve Bayes model that improves existing identity matching techniques in terms of searching effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based matching technique. With 50% training instances labeled, the proposed semi-supervised learning achieves a performance comparable to the fully supervised record comparison algorithm. The semi-supervised learning greatly reduces the efforts of manually labeling training instances without significant performance degradation.

KW - EM algorithm

KW - Entity matching

KW - Hierarchical Naïve Bayes model

KW - Identity management

KW - Semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=79955913156&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955913156&partnerID=8YFLogxK

U2 - 10.1016/j.dss.2011.01.007

DO - 10.1016/j.dss.2011.01.007

M3 - Article

AN - SCOPUS:79955913156

VL - 51

SP - 413

EP - 423

JO - Decision Support Systems

JF - Decision Support Systems

SN - 0167-9236

IS - 3

ER -