Adaptive power reallocation for value-oriented schedulers in power-constrained HPC

Nirmal Kumbhare, Aniruddha Marathe, Ali Akoglu, Salim Hariri, Ghaleb Abdulla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In the exascale era, HPC systems are expected to operate under different system-wide power-constraints. For such power-constrained systems, improving per-job flops-per-watt may not be sufficient to improve the total HPC productivity as more number of scientific applications with different compute intensities are migrating to the HPC systems. To measure HPC productivity for such applications, we utilize a monotonically decreasing time-dependent value function, called job-value, with each application. A job-value function represents the value of completing a job for an organization. We begin by exploring the trade-off between two commonly used static power allocation strategies (uniform and greedy) in a power-constrained oversubscribed system. We simulate a large-scale system and demonstrate that, at the tightest power constraint, the greedy allocation can lead to 30% higher productivity compared to the uniform allocation whereas, the uniform allocation can gain up to 6% higher productivity at the relaxed power constraint. We then propose a new dynamic power allocation strategy that utilizes power-performance models derived from offline data. We use these models for reallocating power from running jobs to newly arrived jobs to increase overall system utilization and productivity. In our simulation study, we show that compared to static allocation, the dynamic power allocation policy improves node utilization and job completion rates by 20% and 9%, respectively, at the tightest power constraint. Our dynamic approach consistently earns up to 8% higher productivity compared to the best performing static strategy under different power constraints.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
EditorsHui Tian, Hong Shen, Wee Lum Tan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages133-139
Number of pages7
ISBN (Electronic)9781728126166
DOIs
StatePublished - Dec 2019
Event20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 - Gold Coast, Australia
Duration: Dec 5 2019Dec 7 2019

Publication series

NameProceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019

Conference

Conference20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
CountryAustralia
CityGold Coast
Period12/5/1912/7/19

Keywords

  • Cloud computing
  • HPC productivity
  • High performance computing
  • Power-aware scheduling
  • Power-constrained computing
  • Value heuristics

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Adaptive power reallocation for value-oriented schedulers in power-constrained HPC'. Together they form a unique fingerprint.

Cite this