A Value-Oriented Job Scheduling Approach for Power-Constrained and Oversubscribed HPC Systems

Nirmal Kumbhare, Aniruddha Marathe, Ali Akoglu, Howard Jay Siegel, Ghaleb Abdulla, Salim Hariri

Research output: Contribution to journalArticle

Abstract

In this article, we investigate limitations in the traditional value-based algorithms for a power-constrained HPC system and evaluate their impact on HPC productivity. We expose the trade-off between allocating system-wide power budget uniformly and greedily under different system-wide power constraints in an oversubscribed system. We experimentally demonstrate that, under the tightest power constraint, the mean productivity of the greedy allocation is 38 percent higher than the uniform allocation whereas, under the intermediate power constraint, the uniform allocation has a mean productivity of 6 percent higher than the greedy allocation. We then propose a new algorithm that adapts its behavior to deliver the combined benefits of the two allocation strategies. We design a methodology with online retraining capability to create application-specific power-execution time models for a class of HPC applications. These models are used in predicting the execution time of an application on the available resources at the time of making scheduling decisions in the power-aware algorithms. We evaluate the proposed algorithm using emulation and simulation environments, and show that our adaptive strategy results in improving HPC resource utilization while delivering a mean productivity that is almost the same as the best performing algorithm across various system-wide power constraints.

Original languageEnglish (US)
Article number8961147
Pages (from-to)1419-1433
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume31
Issue number6
DOIs
StatePublished - Jun 1 2020

Keywords

  • High performance computing
  • HPC productivity
  • power-aware scheduling
  • power-constrained computing
  • value heuristics

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'A Value-Oriented Job Scheduling Approach for Power-Constrained and Oversubscribed HPC Systems'. Together they form a unique fingerprint.

  • Cite this