There goes the neighborhood: Performance degradation due to nearby jobs

Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, Katherine E. Isaacs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

82 Citations (Scopus)

Abstract

Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2013
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Print)9781450323789
DOIs
StatePublished - Jan 1 2013
Externally publishedYes
Event2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013 - Denver, CO, United States
Duration: Nov 17 2013Nov 22 2013

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
CountryUnited States
CityDenver, CO
Period11/17/1311/22/13

Fingerprint

Degradation
Jitter
Genes
Throughput
Communication
Costs
Experiments

Keywords

  • Communication performance
  • Interference
  • Resource management
  • System noise
  • Torus networks

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Bhatele, A., Mohror, K., Langer, S. H., & Isaacs, K. E. (2013). There goes the neighborhood: Performance degradation due to nearby jobs. In Proceedings of SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis [41] (International Conference for High Performance Computing, Networking, Storage and Analysis, SC). IEEE Computer Society. https://doi.org/10.1145/2503210.2503247

There goes the neighborhood : Performance degradation due to nearby jobs. / Bhatele, Abhinav; Mohror, Kathryn; Langer, Steven H.; Isaacs, Katherine E.

Proceedings of SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2013. 41 (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bhatele, A, Mohror, K, Langer, SH & Isaacs, KE 2013, There goes the neighborhood: Performance degradation due to nearby jobs. in Proceedings of SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis., 41, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, IEEE Computer Society, 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, Denver, CO, United States, 11/17/13. https://doi.org/10.1145/2503210.2503247
Bhatele A, Mohror K, Langer SH, Isaacs KE. There goes the neighborhood: Performance degradation due to nearby jobs. In Proceedings of SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society. 2013. 41. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC). https://doi.org/10.1145/2503210.2503247
Bhatele, Abhinav ; Mohror, Kathryn ; Langer, Steven H. ; Isaacs, Katherine E. / There goes the neighborhood : Performance degradation due to nearby jobs. Proceedings of SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2013. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).
@inproceedings{853957f694f04223b75a15d0285365d8,
title = "There goes the neighborhood: Performance degradation due to nearby jobs",
abstract = "Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28{\%} faster to 41{\%} slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.",
keywords = "Communication performance, Interference, Resource management, System noise, Torus networks",
author = "Abhinav Bhatele and Kathryn Mohror and Langer, {Steven H.} and Isaacs, {Katherine E.}",
year = "2013",
month = "1",
day = "1",
doi = "10.1145/2503210.2503247",
language = "English (US)",
isbn = "9781450323789",
series = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",
publisher = "IEEE Computer Society",
booktitle = "Proceedings of SC 2013",

}

TY - GEN

T1 - There goes the neighborhood

T2 - Performance degradation due to nearby jobs

AU - Bhatele, Abhinav

AU - Mohror, Kathryn

AU - Langer, Steven H.

AU - Isaacs, Katherine E.

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.

AB - Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.

KW - Communication performance

KW - Interference

KW - Resource management

KW - System noise

KW - Torus networks

UR - http://www.scopus.com/inward/record.url?scp=84899698707&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899698707&partnerID=8YFLogxK

U2 - 10.1145/2503210.2503247

DO - 10.1145/2503210.2503247

M3 - Conference contribution

AN - SCOPUS:84899698707

SN - 9781450323789

T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

BT - Proceedings of SC 2013

PB - IEEE Computer Society

ER -