Quantitative trait locus analysis using a partitioned linear model on a GPU cluster

Peter E. Bailey, Tapasya Patki, Gregory M. Striemer, Ali Akoglu, David K Lowenthal, Peter Bradbury, Matt Vaughn, Liya Wang, Stephen A Goff

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Quantitative Trait Locus (QTL) analysis is a statistical technique that allows understanding of the relationship between plant genotypes and the resultant continuous phenotypes in non-constant environments. This requires generation and processing of large datasets, which makes analysis challenging and slow. One approach, which is the subject of this paper, is Partitioned Linear Modeling (PLM), lends itself well to parallelization, both by MPI between nodes and by GPU within nodes. Large input datasets make this parallelization on the GPU non-trivial. This paper compares several candidate integrated MPI/GPU parallel implementations of PLM on a cluster of GPUs for varied data sets. We compare them to a naive implementation and show that while that implementation is quite efficient on small data sets, when the data set is large, data-transfer overhead dominates an all-GPU implementation of PLM. We show that an MPI implementation that selectively uses the GPU for a relative minority of the code performs best and results in a 64 improvement over the MPI/CPU version. As a first implementation of PLM on GPUs, our work serves as a reminder that different GPU implementations are needed, depending on the size of the working set, and that data intensive applications are not necessarily trivially parallelizable with GPUs.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
Pages752-760
Number of pages9
DOIs
StatePublished - 2012
Event2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 - Shanghai, China
Duration: May 21 2012May 25 2012

Other

Other2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
CountryChina
CityShanghai
Period5/21/125/25/12

Fingerprint

Graphics processing unit
Data transfer
Program processors
Processing

ASJC Scopus subject areas

  • Software

Cite this

Bailey, P. E., Patki, T., Striemer, G. M., Akoglu, A., Lowenthal, D. K., Bradbury, P., ... Goff, S. A. (2012). Quantitative trait locus analysis using a partitioned linear model on a GPU cluster. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 (pp. 752-760). [6270715] https://doi.org/10.1109/IPDPSW.2012.93

Quantitative trait locus analysis using a partitioned linear model on a GPU cluster. / Bailey, Peter E.; Patki, Tapasya; Striemer, Gregory M.; Akoglu, Ali; Lowenthal, David K; Bradbury, Peter; Vaughn, Matt; Wang, Liya; Goff, Stephen A.

Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012. 2012. p. 752-760 6270715.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bailey, PE, Patki, T, Striemer, GM, Akoglu, A, Lowenthal, DK, Bradbury, P, Vaughn, M, Wang, L & Goff, SA 2012, Quantitative trait locus analysis using a partitioned linear model on a GPU cluster. in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012., 6270715, pp. 752-760, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012, Shanghai, China, 5/21/12. https://doi.org/10.1109/IPDPSW.2012.93
Bailey PE, Patki T, Striemer GM, Akoglu A, Lowenthal DK, Bradbury P et al. Quantitative trait locus analysis using a partitioned linear model on a GPU cluster. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012. 2012. p. 752-760. 6270715 https://doi.org/10.1109/IPDPSW.2012.93
Bailey, Peter E. ; Patki, Tapasya ; Striemer, Gregory M. ; Akoglu, Ali ; Lowenthal, David K ; Bradbury, Peter ; Vaughn, Matt ; Wang, Liya ; Goff, Stephen A. / Quantitative trait locus analysis using a partitioned linear model on a GPU cluster. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012. 2012. pp. 752-760
@inproceedings{ddfd77658b734fb3857b301748fe0324,
title = "Quantitative trait locus analysis using a partitioned linear model on a GPU cluster",
abstract = "Quantitative Trait Locus (QTL) analysis is a statistical technique that allows understanding of the relationship between plant genotypes and the resultant continuous phenotypes in non-constant environments. This requires generation and processing of large datasets, which makes analysis challenging and slow. One approach, which is the subject of this paper, is Partitioned Linear Modeling (PLM), lends itself well to parallelization, both by MPI between nodes and by GPU within nodes. Large input datasets make this parallelization on the GPU non-trivial. This paper compares several candidate integrated MPI/GPU parallel implementations of PLM on a cluster of GPUs for varied data sets. We compare them to a naive implementation and show that while that implementation is quite efficient on small data sets, when the data set is large, data-transfer overhead dominates an all-GPU implementation of PLM. We show that an MPI implementation that selectively uses the GPU for a relative minority of the code performs best and results in a 64 improvement over the MPI/CPU version. As a first implementation of PLM on GPUs, our work serves as a reminder that different GPU implementations are needed, depending on the size of the working set, and that data intensive applications are not necessarily trivially parallelizable with GPUs.",
author = "Bailey, {Peter E.} and Tapasya Patki and Striemer, {Gregory M.} and Ali Akoglu and Lowenthal, {David K} and Peter Bradbury and Matt Vaughn and Liya Wang and Goff, {Stephen A}",
year = "2012",
doi = "10.1109/IPDPSW.2012.93",
language = "English (US)",
isbn = "9780769546766",
pages = "752--760",
booktitle = "Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012",

}

TY - GEN

T1 - Quantitative trait locus analysis using a partitioned linear model on a GPU cluster

AU - Bailey, Peter E.

AU - Patki, Tapasya

AU - Striemer, Gregory M.

AU - Akoglu, Ali

AU - Lowenthal, David K

AU - Bradbury, Peter

AU - Vaughn, Matt

AU - Wang, Liya

AU - Goff, Stephen A

PY - 2012

Y1 - 2012

N2 - Quantitative Trait Locus (QTL) analysis is a statistical technique that allows understanding of the relationship between plant genotypes and the resultant continuous phenotypes in non-constant environments. This requires generation and processing of large datasets, which makes analysis challenging and slow. One approach, which is the subject of this paper, is Partitioned Linear Modeling (PLM), lends itself well to parallelization, both by MPI between nodes and by GPU within nodes. Large input datasets make this parallelization on the GPU non-trivial. This paper compares several candidate integrated MPI/GPU parallel implementations of PLM on a cluster of GPUs for varied data sets. We compare them to a naive implementation and show that while that implementation is quite efficient on small data sets, when the data set is large, data-transfer overhead dominates an all-GPU implementation of PLM. We show that an MPI implementation that selectively uses the GPU for a relative minority of the code performs best and results in a 64 improvement over the MPI/CPU version. As a first implementation of PLM on GPUs, our work serves as a reminder that different GPU implementations are needed, depending on the size of the working set, and that data intensive applications are not necessarily trivially parallelizable with GPUs.

AB - Quantitative Trait Locus (QTL) analysis is a statistical technique that allows understanding of the relationship between plant genotypes and the resultant continuous phenotypes in non-constant environments. This requires generation and processing of large datasets, which makes analysis challenging and slow. One approach, which is the subject of this paper, is Partitioned Linear Modeling (PLM), lends itself well to parallelization, both by MPI between nodes and by GPU within nodes. Large input datasets make this parallelization on the GPU non-trivial. This paper compares several candidate integrated MPI/GPU parallel implementations of PLM on a cluster of GPUs for varied data sets. We compare them to a naive implementation and show that while that implementation is quite efficient on small data sets, when the data set is large, data-transfer overhead dominates an all-GPU implementation of PLM. We show that an MPI implementation that selectively uses the GPU for a relative minority of the code performs best and results in a 64 improvement over the MPI/CPU version. As a first implementation of PLM on GPUs, our work serves as a reminder that different GPU implementations are needed, depending on the size of the working set, and that data intensive applications are not necessarily trivially parallelizable with GPUs.

UR - http://www.scopus.com/inward/record.url?scp=84867411359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867411359&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2012.93

DO - 10.1109/IPDPSW.2012.93

M3 - Conference contribution

AN - SCOPUS:84867411359

SN - 9780769546766

SP - 752

EP - 760

BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

ER -