Clustering similar schema elements across heterogeneous databases: A first step in database integration

Huimin Zhao, Sudha Ram

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Citations (Scopus)

Abstract

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on multiple types of features, such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. We describe an SOM prototype we have developed that provides users with a visualization tool for displaying clustering results and for incremental evaluation of potentially similar elements. We also report on some empirical results demonstrating the utility of the proposed approach.

Original languageEnglish (US)
Title of host publicationAdvanced Topics in Database Research
PublisherIGI Global
Pages227-248
Number of pages22
Volume5
ISBN (Print)9781591409359
DOIs
StatePublished - 2006

Fingerprint

cluster analysis
neural network
visualization
interaction
evaluation
time

ASJC Scopus subject areas

  • Social Sciences(all)

Cite this

Clustering similar schema elements across heterogeneous databases : A first step in database integration. / Zhao, Huimin; Ram, Sudha.

Advanced Topics in Database Research. Vol. 5 IGI Global, 2006. p. 227-248.

Research output: Chapter in Book/Report/Conference proceedingChapter

Zhao, Huimin ; Ram, Sudha. / Clustering similar schema elements across heterogeneous databases : A first step in database integration. Advanced Topics in Database Research. Vol. 5 IGI Global, 2006. pp. 227-248
@inbook{417be60686384696948dd48a7f4ea0f3,
title = "Clustering similar schema elements across heterogeneous databases: A first step in database integration",
abstract = "Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on multiple types of features, such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. We describe an SOM prototype we have developed that provides users with a visualization tool for displaying clustering results and for incremental evaluation of potentially similar elements. We also report on some empirical results demonstrating the utility of the proposed approach.",
author = "Huimin Zhao and Sudha Ram",
year = "2006",
doi = "10.4018/978-1-59140-935-9.ch013",
language = "English (US)",
isbn = "9781591409359",
volume = "5",
pages = "227--248",
booktitle = "Advanced Topics in Database Research",
publisher = "IGI Global",

}

TY - CHAP

T1 - Clustering similar schema elements across heterogeneous databases

T2 - A first step in database integration

AU - Zhao, Huimin

AU - Ram, Sudha

PY - 2006

Y1 - 2006

N2 - Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on multiple types of features, such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. We describe an SOM prototype we have developed that provides users with a visualization tool for displaying clustering results and for incremental evaluation of potentially similar elements. We also report on some empirical results demonstrating the utility of the proposed approach.

AB - Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on multiple types of features, such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. We describe an SOM prototype we have developed that provides users with a visualization tool for displaying clustering results and for incremental evaluation of potentially similar elements. We also report on some empirical results demonstrating the utility of the proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=33947183147&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947183147&partnerID=8YFLogxK

U2 - 10.4018/978-1-59140-935-9.ch013

DO - 10.4018/978-1-59140-935-9.ch013

M3 - Chapter

AN - SCOPUS:33947183147

SN - 9781591409359

VL - 5

SP - 227

EP - 248

BT - Advanced Topics in Database Research

PB - IGI Global

ER -