Schema-less, semantics-based change detection for XML documents

Shuohao Zhang, Curtis Dyreson, Richard Thomas Snodgrass

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.

Original languageEnglish (US)
Pages (from-to)279-290
Number of pages12
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3306
StatePublished - 2004

Fingerprint

Change Detection
Semantics
XML
Schema
Structural Change
Historical Data
Breakdown
Query
Research

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

@article{b38240fd8cda40e69f7d5eb3219deb50,
title = "Schema-less, semantics-based change detection for XML documents",
abstract = "Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.",
author = "Shuohao Zhang and Curtis Dyreson and Snodgrass, {Richard Thomas}",
year = "2004",
language = "English (US)",
volume = "3306",
pages = "279--290",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Schema-less, semantics-based change detection for XML documents

AU - Zhang, Shuohao

AU - Dyreson, Curtis

AU - Snodgrass, Richard Thomas

PY - 2004

Y1 - 2004

N2 - Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.

AB - Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.

UR - http://www.scopus.com/inward/record.url?scp=33845196117&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33845196117&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33845196117

VL - 3306

SP - 279

EP - 290

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -