Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2

Aniruddha Marathe, Rachel Harris, David K Lowenthal, Bronis R. De Supinski, Barry Rountree, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

30 Citations (Scopus)

Abstract

The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44% cheaper than the best non-redundant, spot-market algorithm.

Original languageEnglish (US)
Title of host publicationHPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery
Pages279-290
Number of pages12
ISBN (Print)9781450327480
DOIs
StatePublished - 2014
Event23rd ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2014 - Vancouver, BC, Canada
Duration: Jun 23 2014Jun 27 2014

Other

Other23rd ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2014
CountryCanada
CityVancouver, BC
Period6/23/146/27/14

Fingerprint

Redundancy
Adaptive algorithms
Costs
Supercomputers
Scheduling algorithms

Keywords

  • Cloud
  • Cost
  • Fault-tolerance
  • Resource provisioning

ASJC Scopus subject areas

  • Software

Cite this

Marathe, A., Harris, R., Lowenthal, D. K., De Supinski, B. R., Rountree, B., & Schulz, M. (2014). Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2. In HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (pp. 279-290). Association for Computing Machinery. https://doi.org/10.1145/2600212.2600226

Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2. / Marathe, Aniruddha; Harris, Rachel; Lowenthal, David K; De Supinski, Bronis R.; Rountree, Barry; Schulz, Martin.

HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, 2014. p. 279-290.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Marathe, A, Harris, R, Lowenthal, DK, De Supinski, BR, Rountree, B & Schulz, M 2014, Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2. in HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, pp. 279-290, 23rd ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2014, Vancouver, BC, Canada, 6/23/14. https://doi.org/10.1145/2600212.2600226
Marathe A, Harris R, Lowenthal DK, De Supinski BR, Rountree B, Schulz M. Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2. In HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery. 2014. p. 279-290 https://doi.org/10.1145/2600212.2600226
Marathe, Aniruddha ; Harris, Rachel ; Lowenthal, David K ; De Supinski, Bronis R. ; Rountree, Barry ; Schulz, Martin. / Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2. HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, 2014. pp. 279-290
@inproceedings{8264d4bb3ace455692e2a5a8330f41e0,
title = "Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2",
abstract = "The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44{\%} cheaper than the best non-redundant, spot-market algorithm.",
keywords = "Cloud, Cost, Fault-tolerance, Resource provisioning",
author = "Aniruddha Marathe and Rachel Harris and Lowenthal, {David K} and {De Supinski}, {Bronis R.} and Barry Rountree and Martin Schulz",
year = "2014",
doi = "10.1145/2600212.2600226",
language = "English (US)",
isbn = "9781450327480",
pages = "279--290",
booktitle = "HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2

AU - Marathe, Aniruddha

AU - Harris, Rachel

AU - Lowenthal, David K

AU - De Supinski, Bronis R.

AU - Rountree, Barry

AU - Schulz, Martin

PY - 2014

Y1 - 2014

N2 - The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44% cheaper than the best non-redundant, spot-market algorithm.

AB - The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44% cheaper than the best non-redundant, spot-market algorithm.

KW - Cloud

KW - Cost

KW - Fault-tolerance

KW - Resource provisioning

UR - http://www.scopus.com/inward/record.url?scp=84904438124&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904438124&partnerID=8YFLogxK

U2 - 10.1145/2600212.2600226

DO - 10.1145/2600212.2600226

M3 - Conference contribution

SN - 9781450327480

SP - 279

EP - 290

BT - HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing

PB - Association for Computing Machinery

ER -