Using fine-grain threads and run-time decision making in parallel computing

David K Lowenthal, Vincent W. Freeh, Gregory R. Andrews

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run-time decisions leads to a simpler interface - because decisions are implicit - and it can lead to better decisions -because more information is available. This paper examines the costs, benefits, and details of making decisions at run time. The starting point is explicit fine-grain parallelism with any number (even thousands) of threads. Five specific techniques are considered: (1) implicitly coarsening the granularity of parallelism, (2) using implicit communication implemented by a distributed shared memory, (3) overlapping computation and communication, (4) adaptively moving threads and data between nodes to minimize communication and balance load, and (5) dynamically remapping data to pages to avoid false sharing. Details are given on the performance of each of these techniques as well as on their overall performance for several scientific applications.

Original languageEnglish (US)
Pages (from-to)41-54
Number of pages14
JournalJournal of Parallel and Distributed Computing
Volume37
Issue number1
DOIs
StatePublished - Aug 25 1996

Fingerprint

Parallel processing systems
Parallel Computing
Thread
Decision making
Decision Making
Communication
Parallelism
Data storage equipment
Computer workstations
Distributed Memory multiprocessors
Coarsening
Computer networks
Computer programming
Network of Workstations
Distributed Shared Memory
Load Balance
Even number
Granularity
Compiler
Overlapping

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

Using fine-grain threads and run-time decision making in parallel computing. / Lowenthal, David K; Freeh, Vincent W.; Andrews, Gregory R.

In: Journal of Parallel and Distributed Computing, Vol. 37, No. 1, 25.08.1996, p. 41-54.

Research output: Contribution to journalArticle

@article{8f970bcb7a7343f2a9fd3ecd77e1f75c,
title = "Using fine-grain threads and run-time decision making in parallel computing",
abstract = "Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run-time decisions leads to a simpler interface - because decisions are implicit - and it can lead to better decisions -because more information is available. This paper examines the costs, benefits, and details of making decisions at run time. The starting point is explicit fine-grain parallelism with any number (even thousands) of threads. Five specific techniques are considered: (1) implicitly coarsening the granularity of parallelism, (2) using implicit communication implemented by a distributed shared memory, (3) overlapping computation and communication, (4) adaptively moving threads and data between nodes to minimize communication and balance load, and (5) dynamically remapping data to pages to avoid false sharing. Details are given on the performance of each of these techniques as well as on their overall performance for several scientific applications.",
author = "Lowenthal, {David K} and Freeh, {Vincent W.} and Andrews, {Gregory R.}",
year = "1996",
month = "8",
day = "25",
doi = "10.1006/jpdc.1996.0106",
language = "English (US)",
volume = "37",
pages = "41--54",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Using fine-grain threads and run-time decision making in parallel computing

AU - Lowenthal, David K

AU - Freeh, Vincent W.

AU - Andrews, Gregory R.

PY - 1996/8/25

Y1 - 1996/8/25

N2 - Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run-time decisions leads to a simpler interface - because decisions are implicit - and it can lead to better decisions -because more information is available. This paper examines the costs, benefits, and details of making decisions at run time. The starting point is explicit fine-grain parallelism with any number (even thousands) of threads. Five specific techniques are considered: (1) implicitly coarsening the granularity of parallelism, (2) using implicit communication implemented by a distributed shared memory, (3) overlapping computation and communication, (4) adaptively moving threads and data between nodes to minimize communication and balance load, and (5) dynamically remapping data to pages to avoid false sharing. Details are given on the performance of each of these techniques as well as on their overall performance for several scientific applications.

AB - Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run-time decisions leads to a simpler interface - because decisions are implicit - and it can lead to better decisions -because more information is available. This paper examines the costs, benefits, and details of making decisions at run time. The starting point is explicit fine-grain parallelism with any number (even thousands) of threads. Five specific techniques are considered: (1) implicitly coarsening the granularity of parallelism, (2) using implicit communication implemented by a distributed shared memory, (3) overlapping computation and communication, (4) adaptively moving threads and data between nodes to minimize communication and balance load, and (5) dynamically remapping data to pages to avoid false sharing. Details are given on the performance of each of these techniques as well as on their overall performance for several scientific applications.

UR - http://www.scopus.com/inward/record.url?scp=0030601294&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030601294&partnerID=8YFLogxK

U2 - 10.1006/jpdc.1996.0106

DO - 10.1006/jpdc.1996.0106

M3 - Article

VL - 37

SP - 41

EP - 54

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 1

ER -