A multi-layer Naïve Bayes model for approximate identity matching

G. Alan Wang, Hsinchun Chen, Homa Atabakhsh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Naïve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique, In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages479-484
Number of pages6
Volume3975 LNCS
StatePublished - 2006
EventIEEE International Conference on Intelligence and Security Informatics, ISI 2006 - San Diego, CA, United States
Duration: May 23 2006May 24 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3975 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

OtherIEEE International Conference on Intelligence and Security Informatics, ISI 2006
CountryUnited States
CitySan Diego, CA
Period5/23/065/24/06

Fingerprint

Approximate Identity
Bayes
Multilayer
Supervised learning
Learning
Supervised Learning
Deception
Identity Management
Homeland Security
Model Matching
Semi-supervised Learning
Unsupervised Learning
Unsupervised learning
Labeling
National security
Model
Experiment
Training
Experiments
Supervised Machine Learning

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A multi-layer Naïve Bayes model for approximate identity matching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3975 LNCS, pp. 479-484). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3975 LNCS).

A multi-layer Naïve Bayes model for approximate identity matching. / Wang, G. Alan; Chen, Hsinchun; Atabakhsh, Homa.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3975 LNCS 2006. p. 479-484 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3975 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, GA, Chen, H & Atabakhsh, H 2006, A multi-layer Naïve Bayes model for approximate identity matching. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3975 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3975 LNCS, pp. 479-484, IEEE International Conference on Intelligence and Security Informatics, ISI 2006, San Diego, CA, United States, 5/23/06.
Wang GA, Chen H, Atabakhsh H. A multi-layer Naïve Bayes model for approximate identity matching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3975 LNCS. 2006. p. 479-484. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Wang, G. Alan ; Chen, Hsinchun ; Atabakhsh, Homa. / A multi-layer Naïve Bayes model for approximate identity matching. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3975 LNCS 2006. pp. 479-484 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b126861e09e6473d8f98a5f17cb76af5,
title = "A multi-layer Na{\"i}ve Bayes model for approximate identity matching",
abstract = "Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Na{\"i}ve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique, In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30{\%} labeled instances, our model achieves a performance comparable to that of a fully supervised learning.",
author = "Wang, {G. Alan} and Hsinchun Chen and Homa Atabakhsh",
year = "2006",
language = "English (US)",
isbn = "3540344780",
volume = "3975 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "479--484",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A multi-layer Naïve Bayes model for approximate identity matching

AU - Wang, G. Alan

AU - Chen, Hsinchun

AU - Atabakhsh, Homa

PY - 2006

Y1 - 2006

N2 - Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Naïve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique, In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

AB - Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Naïve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique, In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

UR - http://www.scopus.com/inward/record.url?scp=33745875307&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745875307&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33745875307

SN - 3540344780

SN - 9783540344780

VL - 3975 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 479

EP - 484

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -