Run-time selection of block size in pipelined parallel programs

David K Lowenthal, Michael James

Research output: Chapter in Book/Report/Conference proceedingChapter

6 Citations (Scopus)

Abstract

Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where cross-processor data dependencies can inhibit efficient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor (node) performs its computation in blocks; after each, it sends data to the next processor in the pipeline. The amount of computation before sending a message is called the block size; its choice, although difficult for a compiler to make, is critical to the efficiency of the program. Compilers typically use a static estimation of workload, which cannot always produce an effective block size. This paper describes a flexible run-time approach to choosing the block size. Our system takes measurements during the first iteration of the program and then uses the results to build an execution model and choose an appropriate block size which, unlike those chosen by compiler analysis, may be nonuniform. Performance on a network of workstations shows that programs using our run-time analysis outperform those that use static block sizes when the workload is either unbalanced or unanalyzable. On more regular programs, our programs are competitive with their static counterparts.

Original languageEnglish (US)
Title of host publicationProceedings of the International Parallel Processing Symposium, IPPS
PublisherIEEE
Pages82-87
Number of pages6
StatePublished - 1999
Externally publishedYes
EventProceedings of the 1999 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing - San Juan
Duration: Apr 12 1999Apr 16 1999

Other

OtherProceedings of the 1999 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
CitySan Juan
Period4/12/994/16/99

Fingerprint

Pipelines

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Lowenthal, D. K., & James, M. (1999). Run-time selection of block size in pipelined parallel programs. In Proceedings of the International Parallel Processing Symposium, IPPS (pp. 82-87). IEEE.

Run-time selection of block size in pipelined parallel programs. / Lowenthal, David K; James, Michael.

Proceedings of the International Parallel Processing Symposium, IPPS. IEEE, 1999. p. 82-87.

Research output: Chapter in Book/Report/Conference proceedingChapter

Lowenthal, DK & James, M 1999, Run-time selection of block size in pipelined parallel programs. in Proceedings of the International Parallel Processing Symposium, IPPS. IEEE, pp. 82-87, Proceedings of the 1999 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, San Juan, 4/12/99.
Lowenthal DK, James M. Run-time selection of block size in pipelined parallel programs. In Proceedings of the International Parallel Processing Symposium, IPPS. IEEE. 1999. p. 82-87
Lowenthal, David K ; James, Michael. / Run-time selection of block size in pipelined parallel programs. Proceedings of the International Parallel Processing Symposium, IPPS. IEEE, 1999. pp. 82-87
@inbook{ddfb3d5ee36e46929f20ee7e221e2f71,
title = "Run-time selection of block size in pipelined parallel programs",
abstract = "Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where cross-processor data dependencies can inhibit efficient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor (node) performs its computation in blocks; after each, it sends data to the next processor in the pipeline. The amount of computation before sending a message is called the block size; its choice, although difficult for a compiler to make, is critical to the efficiency of the program. Compilers typically use a static estimation of workload, which cannot always produce an effective block size. This paper describes a flexible run-time approach to choosing the block size. Our system takes measurements during the first iteration of the program and then uses the results to build an execution model and choose an appropriate block size which, unlike those chosen by compiler analysis, may be nonuniform. Performance on a network of workstations shows that programs using our run-time analysis outperform those that use static block sizes when the workload is either unbalanced or unanalyzable. On more regular programs, our programs are competitive with their static counterparts.",
author = "Lowenthal, {David K} and Michael James",
year = "1999",
language = "English (US)",
pages = "82--87",
booktitle = "Proceedings of the International Parallel Processing Symposium, IPPS",
publisher = "IEEE",

}

TY - CHAP

T1 - Run-time selection of block size in pipelined parallel programs

AU - Lowenthal, David K

AU - James, Michael

PY - 1999

Y1 - 1999

N2 - Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where cross-processor data dependencies can inhibit efficient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor (node) performs its computation in blocks; after each, it sends data to the next processor in the pipeline. The amount of computation before sending a message is called the block size; its choice, although difficult for a compiler to make, is critical to the efficiency of the program. Compilers typically use a static estimation of workload, which cannot always produce an effective block size. This paper describes a flexible run-time approach to choosing the block size. Our system takes measurements during the first iteration of the program and then uses the results to build an execution model and choose an appropriate block size which, unlike those chosen by compiler analysis, may be nonuniform. Performance on a network of workstations shows that programs using our run-time analysis outperform those that use static block sizes when the workload is either unbalanced or unanalyzable. On more regular programs, our programs are competitive with their static counterparts.

AB - Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where cross-processor data dependencies can inhibit efficient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor (node) performs its computation in blocks; after each, it sends data to the next processor in the pipeline. The amount of computation before sending a message is called the block size; its choice, although difficult for a compiler to make, is critical to the efficiency of the program. Compilers typically use a static estimation of workload, which cannot always produce an effective block size. This paper describes a flexible run-time approach to choosing the block size. Our system takes measurements during the first iteration of the program and then uses the results to build an execution model and choose an appropriate block size which, unlike those chosen by compiler analysis, may be nonuniform. Performance on a network of workstations shows that programs using our run-time analysis outperform those that use static block sizes when the workload is either unbalanced or unanalyzable. On more regular programs, our programs are competitive with their static counterparts.

UR - http://www.scopus.com/inward/record.url?scp=0032676515&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032676515&partnerID=8YFLogxK

M3 - Chapter

SP - 82

EP - 87

BT - Proceedings of the International Parallel Processing Symposium, IPPS

PB - IEEE

ER -