Performance analysis of IBM Cell Broadband Engine on sequence alignment

Yang Song, Gregory M. Striemer, Ali Akoglu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it's computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache-based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture's fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA's Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.

Original languageEnglish (US)
Title of host publicationProceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009
Pages439-446
Number of pages8
DOIs
StatePublished - 2009
Event2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009 - San Francisco, CA, United States
Duration: Jul 29 2009Aug 1 2009

Other

Other2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009
CountryUnited States
CitySan Francisco, CA
Period7/29/098/1/09

Fingerprint

Engines
Memory architecture
Processing
Computational complexity
DNA
Network protocols
Graphics processing unit

ASJC Scopus subject areas

  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

Song, Y., Striemer, G. M., & Akoglu, A. (2009). Performance analysis of IBM Cell Broadband Engine on sequence alignment. In Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009 (pp. 439-446). [5325421] https://doi.org/10.1109/AHS.2009.16

Performance analysis of IBM Cell Broadband Engine on sequence alignment. / Song, Yang; Striemer, Gregory M.; Akoglu, Ali.

Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009. 2009. p. 439-446 5325421.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Song, Y, Striemer, GM & Akoglu, A 2009, Performance analysis of IBM Cell Broadband Engine on sequence alignment. in Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009., 5325421, pp. 439-446, 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009, San Francisco, CA, United States, 7/29/09. https://doi.org/10.1109/AHS.2009.16
Song Y, Striemer GM, Akoglu A. Performance analysis of IBM Cell Broadband Engine on sequence alignment. In Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009. 2009. p. 439-446. 5325421 https://doi.org/10.1109/AHS.2009.16
Song, Yang ; Striemer, Gregory M. ; Akoglu, Ali. / Performance analysis of IBM Cell Broadband Engine on sequence alignment. Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009. 2009. pp. 439-446
@inproceedings{8972a61c9ddf4065b1ca043d5e13e4c0,
title = "Performance analysis of IBM Cell Broadband Engine on sequence alignment",
abstract = "The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it's computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache-based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture's fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA's Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.",
author = "Yang Song and Striemer, {Gregory M.} and Ali Akoglu",
year = "2009",
doi = "10.1109/AHS.2009.16",
language = "English (US)",
isbn = "9780769537146",
pages = "439--446",
booktitle = "Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009",

}

TY - GEN

T1 - Performance analysis of IBM Cell Broadband Engine on sequence alignment

AU - Song, Yang

AU - Striemer, Gregory M.

AU - Akoglu, Ali

PY - 2009

Y1 - 2009

N2 - The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it's computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache-based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture's fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA's Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.

AB - The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it's computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache-based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture's fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA's Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.

UR - http://www.scopus.com/inward/record.url?scp=72849150749&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72849150749&partnerID=8YFLogxK

U2 - 10.1109/AHS.2009.16

DO - 10.1109/AHS.2009.16

M3 - Conference contribution

SN - 9780769537146

SP - 439

EP - 446

BT - Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009

ER -