Entity identification for heterogeneous database integration - A multiple classifier system approach and empirical evaluation

Huimin Zhao, Sudha Ram

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Entity identification, i.e., detecting semantically corresponding records from heterogeneous data sources, is a critical step in integrating the data sources. The objective of this research is to develop and evaluate a novel multiple classifier system approach that improves entity identification accuracy. We apply various classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks to determine whether two records from different data sources represent the same real-world entity. We further employ a variety of ways to combine multiple classifiers for improved classification accuracy. In this paper, we report on some promising empirical results that demonstrate performance improvement by combining multiple classifiers.

Original languageEnglish (US)
Pages (from-to)119-132
Number of pages14
JournalInformation Systems
Volume30
Issue number2
DOIs
StatePublished - Apr 2005

Fingerprint

Classifiers
Pattern recognition
Learning systems
Neural networks
Systems approach
Data sources
Empirical evaluation
Data base
Classifier
Empirical results
Artificial neural network
Performance improvement
Machine learning

Keywords

  • Entity identification
  • Heterogeneous database integration
  • Multiple classifier system

ASJC Scopus subject areas

  • Management Information Systems
  • Management of Technology and Innovation
  • Hardware and Architecture
  • Information Systems
  • Software

Cite this

Entity identification for heterogeneous database integration - A multiple classifier system approach and empirical evaluation. / Zhao, Huimin; Ram, Sudha.

In: Information Systems, Vol. 30, No. 2, 04.2005, p. 119-132.

Research output: Contribution to journalArticle

@article{c461cf5093714b1a84b66cffe789ae13,
title = "Entity identification for heterogeneous database integration - A multiple classifier system approach and empirical evaluation",
abstract = "Entity identification, i.e., detecting semantically corresponding records from heterogeneous data sources, is a critical step in integrating the data sources. The objective of this research is to develop and evaluate a novel multiple classifier system approach that improves entity identification accuracy. We apply various classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks to determine whether two records from different data sources represent the same real-world entity. We further employ a variety of ways to combine multiple classifiers for improved classification accuracy. In this paper, we report on some promising empirical results that demonstrate performance improvement by combining multiple classifiers.",
keywords = "Entity identification, Heterogeneous database integration, Multiple classifier system",
author = "Huimin Zhao and Sudha Ram",
year = "2005",
month = "4",
doi = "10.1016/j.is.2003.11.001",
language = "English (US)",
volume = "30",
pages = "119--132",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier Limited",
number = "2",

}

TY - JOUR

T1 - Entity identification for heterogeneous database integration - A multiple classifier system approach and empirical evaluation

AU - Zhao, Huimin

AU - Ram, Sudha

PY - 2005/4

Y1 - 2005/4

N2 - Entity identification, i.e., detecting semantically corresponding records from heterogeneous data sources, is a critical step in integrating the data sources. The objective of this research is to develop and evaluate a novel multiple classifier system approach that improves entity identification accuracy. We apply various classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks to determine whether two records from different data sources represent the same real-world entity. We further employ a variety of ways to combine multiple classifiers for improved classification accuracy. In this paper, we report on some promising empirical results that demonstrate performance improvement by combining multiple classifiers.

AB - Entity identification, i.e., detecting semantically corresponding records from heterogeneous data sources, is a critical step in integrating the data sources. The objective of this research is to develop and evaluate a novel multiple classifier system approach that improves entity identification accuracy. We apply various classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks to determine whether two records from different data sources represent the same real-world entity. We further employ a variety of ways to combine multiple classifiers for improved classification accuracy. In this paper, we report on some promising empirical results that demonstrate performance improvement by combining multiple classifiers.

KW - Entity identification

KW - Heterogeneous database integration

KW - Multiple classifier system

UR - http://www.scopus.com/inward/record.url?scp=5644287747&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=5644287747&partnerID=8YFLogxK

U2 - 10.1016/j.is.2003.11.001

DO - 10.1016/j.is.2003.11.001

M3 - Article

VL - 30

SP - 119

EP - 132

JO - Information Systems

JF - Information Systems

SN - 0306-4379

IS - 2

ER -