Skew handling techniques in sort-merge join

Wei Li, Dengfeng Gao, Richard Thomas Snodgrass

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
EditorsM.F.B. Moon, A. Ailamaki
Pages169-180
Number of pages12
StatePublished - 2002
EventACM SIGMOD 2002 Proceedings of the ACM SIGMOD International Conference on Managment of Data - Madison, WI, United States
Duration: Jun 3 2002Jun 6 2002

Other

OtherACM SIGMOD 2002 Proceedings of the ACM SIGMOD International Conference on Managment of Data
CountryUnited States
CityMadison, WI
Period6/3/026/6/02

Fingerprint

Degradation
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Li, W., Gao, D., & Snodgrass, R. T. (2002). Skew handling techniques in sort-merge join. In M. F. B. Moon, & A. Ailamaki (Eds.), Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 169-180)

Skew handling techniques in sort-merge join. / Li, Wei; Gao, Dengfeng; Snodgrass, Richard Thomas.

Proceedings of the ACM SIGMOD International Conference on Management of Data. ed. / M.F.B. Moon; A. Ailamaki. 2002. p. 169-180.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, W, Gao, D & Snodgrass, RT 2002, Skew handling techniques in sort-merge join. in MFB Moon & A Ailamaki (eds), Proceedings of the ACM SIGMOD International Conference on Management of Data. pp. 169-180, ACM SIGMOD 2002 Proceedings of the ACM SIGMOD International Conference on Managment of Data, Madison, WI, United States, 6/3/02.
Li W, Gao D, Snodgrass RT. Skew handling techniques in sort-merge join. In Moon MFB, Ailamaki A, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data. 2002. p. 169-180
Li, Wei ; Gao, Dengfeng ; Snodgrass, Richard Thomas. / Skew handling techniques in sort-merge join. Proceedings of the ACM SIGMOD International Conference on Management of Data. editor / M.F.B. Moon ; A. Ailamaki. 2002. pp. 169-180
@inproceedings{0809d6a27a594edc90995d670294d120,
title = "Skew handling techniques in sort-merge join",
abstract = "Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.",
author = "Wei Li and Dengfeng Gao and Snodgrass, {Richard Thomas}",
year = "2002",
language = "English (US)",
pages = "169--180",
editor = "M.F.B. Moon and A. Ailamaki",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - Skew handling techniques in sort-merge join

AU - Li, Wei

AU - Gao, Dengfeng

AU - Snodgrass, Richard Thomas

PY - 2002

Y1 - 2002

N2 - Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.

AB - Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.

UR - http://www.scopus.com/inward/record.url?scp=0036361163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036361163&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0036361163

SP - 169

EP - 180

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

A2 - Moon, M.F.B.

A2 - Ailamaki, A.

ER -