Schema-less, semantics-based change detection for XML documents

Shuohao Zhang, Curtis Dyreson, Richard T. Snodgrass

Research output: Contribution to journalArticle

4 Scopus citations

Abstract

Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.

Original languageEnglish (US)
Pages (from-to)279-290
Number of pages12
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3306
StatePublished - Dec 1 2004

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Schema-less, semantics-based change detection for XML documents'. Together they form a unique fingerprint.

  • Cite this