Entity identification for heterogeneous database integration - A multiple classifier system approach and empirical evaluation

Huimin Zhao, Sudha Ram

Research output: Contribution to journalArticle

35 Scopus citations

Abstract

Entity identification, i.e., detecting semantically corresponding records from heterogeneous data sources, is a critical step in integrating the data sources. The objective of this research is to develop and evaluate a novel multiple classifier system approach that improves entity identification accuracy. We apply various classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks to determine whether two records from different data sources represent the same real-world entity. We further employ a variety of ways to combine multiple classifiers for improved classification accuracy. In this paper, we report on some promising empirical results that demonstrate performance improvement by combining multiple classifiers.

Original languageEnglish (US)
Pages (from-to)119-132
Number of pages14
JournalInformation Systems
Volume30
Issue number2
DOIs
StatePublished - Apr 1 2005

Keywords

  • Entity identification
  • Heterogeneous database integration
  • Multiple classifier system

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Entity identification for heterogeneous database integration - A multiple classifier system approach and empirical evaluation'. Together they form a unique fingerprint.

  • Cite this