A probabilistic model for approximate identity matching

G. Alan Wang, Hsinchun Chen, Homa Atabakhsh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

Original languageEnglish (US)
Title of host publicationACM International Conference Proceeding Series
Pages462-463
Number of pages2
Volume151
DOIs
StatePublished - 2006
Event7th Annual International Conference on Digital Government Research, Dg.o 2006 - San Diego, CA, United States
Duration: May 21 2006May 24 2006

Other

Other7th Annual International Conference on Digital Government Research, Dg.o 2006
CountryUnited States
CitySan Diego, CA
Period5/21/065/24/06

Fingerprint

Supervised learning
Unsupervised learning
National security
Labeling
Statistical Models
Experiments

Keywords

  • Identity matching
  • Naïve Bayes model
  • Semi-supervised learning

ASJC Scopus subject areas

  • Human-Computer Interaction

Cite this

Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A probabilistic model for approximate identity matching. In ACM International Conference Proceeding Series (Vol. 151, pp. 462-463) https://doi.org/10.1145/1146598.1146750

A probabilistic model for approximate identity matching. / Wang, G. Alan; Chen, Hsinchun; Atabakhsh, Homa.

ACM International Conference Proceeding Series. Vol. 151 2006. p. 462-463.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, GA, Chen, H & Atabakhsh, H 2006, A probabilistic model for approximate identity matching. in ACM International Conference Proceeding Series. vol. 151, pp. 462-463, 7th Annual International Conference on Digital Government Research, Dg.o 2006, San Diego, CA, United States, 5/21/06. https://doi.org/10.1145/1146598.1146750
Wang GA, Chen H, Atabakhsh H. A probabilistic model for approximate identity matching. In ACM International Conference Proceeding Series. Vol. 151. 2006. p. 462-463 https://doi.org/10.1145/1146598.1146750
Wang, G. Alan ; Chen, Hsinchun ; Atabakhsh, Homa. / A probabilistic model for approximate identity matching. ACM International Conference Proceeding Series. Vol. 151 2006. pp. 462-463
@inproceedings{34e9b17fa3ca4153ae587d5937a739e8,
title = "A probabilistic model for approximate identity matching",
abstract = "Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Na{\"i}ve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10{\%} labeled instances, our model achieves a performance comparable to that of a fully supervised learning.",
keywords = "Identity matching, Na{\"i}ve Bayes model, Semi-supervised learning",
author = "Wang, {G. Alan} and Hsinchun Chen and Homa Atabakhsh",
year = "2006",
doi = "10.1145/1146598.1146750",
language = "English (US)",
volume = "151",
pages = "462--463",
booktitle = "ACM International Conference Proceeding Series",

}

TY - GEN

T1 - A probabilistic model for approximate identity matching

AU - Wang, G. Alan

AU - Chen, Hsinchun

AU - Atabakhsh, Homa

PY - 2006

Y1 - 2006

N2 - Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

AB - Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

KW - Identity matching

KW - Naïve Bayes model

KW - Semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=34250716412&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250716412&partnerID=8YFLogxK

U2 - 10.1145/1146598.1146750

DO - 10.1145/1146598.1146750

M3 - Conference contribution

AN - SCOPUS:34250716412

VL - 151

SP - 462

EP - 463

BT - ACM International Conference Proceeding Series

ER -