Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model

Yeliang Zhang, Vinod Tipparaju, Jarek Nieplocha, Salim A Hariri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.

Original languageEnglish (US)
Title of host publicationProceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
Volume2005
DOIs
StatePublished - 2005
Event19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005 - Denver, CO, United States
Duration: Apr 4 2005Apr 8 2005

Other

Other19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
CountryUnited States
CityDenver, CO
Period4/4/054/8/05

Fingerprint

Computer programming
Data storage equipment

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Zhang, Y., Tipparaju, V., Nieplocha, J., & Hariri, S. A. (2005). Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model. In Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005 (Vol. 2005). [1420054] https://doi.org/10.1109/IPDPS.2005.331

Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model. / Zhang, Yeliang; Tipparaju, Vinod; Nieplocha, Jarek; Hariri, Salim A.

Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005. Vol. 2005 2005. 1420054.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, Y, Tipparaju, V, Nieplocha, J & Hariri, SA 2005, Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model. in Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005. vol. 2005, 1420054, 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005, Denver, CO, United States, 4/4/05. https://doi.org/10.1109/IPDPS.2005.331
Zhang Y, Tipparaju V, Nieplocha J, Hariri SA. Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model. In Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005. Vol. 2005. 2005. 1420054 https://doi.org/10.1109/IPDPS.2005.331
Zhang, Yeliang ; Tipparaju, Vinod ; Nieplocha, Jarek ; Hariri, Salim A. / Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model. Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005. Vol. 2005 2005.
@inproceedings{a8b7729fe6834abb9a3b31e5a576b12f,
title = "Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model",
abstract = "The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.",
author = "Yeliang Zhang and Vinod Tipparaju and Jarek Nieplocha and Hariri, {Salim A}",
year = "2005",
doi = "10.1109/IPDPS.2005.331",
language = "English (US)",
isbn = "0769523129",
volume = "2005",
booktitle = "Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005",

}

TY - GEN

T1 - Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model

AU - Zhang, Yeliang

AU - Tipparaju, Vinod

AU - Nieplocha, Jarek

AU - Hariri, Salim A

PY - 2005

Y1 - 2005

N2 - The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.

AB - The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.

UR - http://www.scopus.com/inward/record.url?scp=33746318153&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746318153&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2005.331

DO - 10.1109/IPDPS.2005.331

M3 - Conference contribution

AN - SCOPUS:33746318153

SN - 0769523129

SN - 0769523129

SN - 9780769523125

VL - 2005

BT - Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005

ER -