Sequenced subset operators: Definition and implementation

Joseph Dunn, Sean Davey, Anne Descour, Richard Thomas Snodgrass

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Difference, intersection, semi-join and anti-semi-join may be considered binary subset operators, in that they all return a subset of their left-hand argument. These operators are useful for implementing SQL’s EXCEPT, INTERSECT, NOT IN and NOT EXISTS, distributed queries and referential integrity. Difference-all and intersection-all operate on multi-sets and track the number of duplicates in both argument relations; they are used to implement SQL’s EXCEPT ALL and INTERSECT ALL. Their temporarily sequenced analogues, which effectively apply the subset operator at each point in time, are needed for implementing these constructs in temporal databases. These SQL expressions are complex; most necessitate at least a three-way join, with nested NOT EXISTS clauses. We consider how to implement these operators directly in a DBMS. These operators are interesting in that they can fragment the left-hand validity periods (sequenced difference-all also fragments the right-hand periods) and thus introduce memory complications found neither in their nontemporal counterparts nor in temporal joins and semi-joins. This paper introduces novel algorithms for implementing these operators by ordering the computation so that fragments need not be retained in main memory. We evaluate these algorithms and demonstrate that they are no more expensive than a single conventional join.

Original languageEnglish (US)
Pages (from-to)81-92
Number of pages12
JournalProceedings - International Conference on Data Engineering
DOIs
StatePublished - 2002

Fingerprint

Data storage equipment
Mathematical operators

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Sequenced subset operators : Definition and implementation. / Dunn, Joseph; Davey, Sean; Descour, Anne; Snodgrass, Richard Thomas.

In: Proceedings - International Conference on Data Engineering, 2002, p. 81-92.

Research output: Contribution to journalArticle

@article{92a797e31e274089913c639747ed7037,
title = "Sequenced subset operators: Definition and implementation",
abstract = "Difference, intersection, semi-join and anti-semi-join may be considered binary subset operators, in that they all return a subset of their left-hand argument. These operators are useful for implementing SQL’s EXCEPT, INTERSECT, NOT IN and NOT EXISTS, distributed queries and referential integrity. Difference-all and intersection-all operate on multi-sets and track the number of duplicates in both argument relations; they are used to implement SQL’s EXCEPT ALL and INTERSECT ALL. Their temporarily sequenced analogues, which effectively apply the subset operator at each point in time, are needed for implementing these constructs in temporal databases. These SQL expressions are complex; most necessitate at least a three-way join, with nested NOT EXISTS clauses. We consider how to implement these operators directly in a DBMS. These operators are interesting in that they can fragment the left-hand validity periods (sequenced difference-all also fragments the right-hand periods) and thus introduce memory complications found neither in their nontemporal counterparts nor in temporal joins and semi-joins. This paper introduces novel algorithms for implementing these operators by ordering the computation so that fragments need not be retained in main memory. We evaluate these algorithms and demonstrate that they are no more expensive than a single conventional join.",
author = "Joseph Dunn and Sean Davey and Anne Descour and Snodgrass, {Richard Thomas}",
year = "2002",
doi = "10.1109/ICDE.2002.994699",
language = "English (US)",
pages = "81--92",
journal = "Proceedings - International Conference on Data Engineering",
issn = "1084-4627",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Sequenced subset operators

T2 - Definition and implementation

AU - Dunn, Joseph

AU - Davey, Sean

AU - Descour, Anne

AU - Snodgrass, Richard Thomas

PY - 2002

Y1 - 2002

N2 - Difference, intersection, semi-join and anti-semi-join may be considered binary subset operators, in that they all return a subset of their left-hand argument. These operators are useful for implementing SQL’s EXCEPT, INTERSECT, NOT IN and NOT EXISTS, distributed queries and referential integrity. Difference-all and intersection-all operate on multi-sets and track the number of duplicates in both argument relations; they are used to implement SQL’s EXCEPT ALL and INTERSECT ALL. Their temporarily sequenced analogues, which effectively apply the subset operator at each point in time, are needed for implementing these constructs in temporal databases. These SQL expressions are complex; most necessitate at least a three-way join, with nested NOT EXISTS clauses. We consider how to implement these operators directly in a DBMS. These operators are interesting in that they can fragment the left-hand validity periods (sequenced difference-all also fragments the right-hand periods) and thus introduce memory complications found neither in their nontemporal counterparts nor in temporal joins and semi-joins. This paper introduces novel algorithms for implementing these operators by ordering the computation so that fragments need not be retained in main memory. We evaluate these algorithms and demonstrate that they are no more expensive than a single conventional join.

AB - Difference, intersection, semi-join and anti-semi-join may be considered binary subset operators, in that they all return a subset of their left-hand argument. These operators are useful for implementing SQL’s EXCEPT, INTERSECT, NOT IN and NOT EXISTS, distributed queries and referential integrity. Difference-all and intersection-all operate on multi-sets and track the number of duplicates in both argument relations; they are used to implement SQL’s EXCEPT ALL and INTERSECT ALL. Their temporarily sequenced analogues, which effectively apply the subset operator at each point in time, are needed for implementing these constructs in temporal databases. These SQL expressions are complex; most necessitate at least a three-way join, with nested NOT EXISTS clauses. We consider how to implement these operators directly in a DBMS. These operators are interesting in that they can fragment the left-hand validity periods (sequenced difference-all also fragments the right-hand periods) and thus introduce memory complications found neither in their nontemporal counterparts nor in temporal joins and semi-joins. This paper introduces novel algorithms for implementing these operators by ordering the computation so that fragments need not be retained in main memory. We evaluate these algorithms and demonstrate that they are no more expensive than a single conventional join.

UR - http://www.scopus.com/inward/record.url?scp=0036204133&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036204133&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2002.994699

DO - 10.1109/ICDE.2002.994699

M3 - Article

AN - SCOPUS:0036204133

SP - 81

EP - 92

JO - Proceedings - International Conference on Data Engineering

JF - Proceedings - International Conference on Data Engineering

SN - 1084-4627

ER -