Shield

A Reliable Network-on-Chip Router Architecture for Chip Multiprocessors

Pavan Poluri, Ahmed Louri

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

The increasing number of cores on a chip has made the network on chip (NoC) concept the standard communication paradigm for chip multiprocessors. A fault in an NoC leads to undesirable ramifications that can severely impact the performance of a chip. Therefore, it is vital to design fault tolerant NoCs. In this paper, we present Shield , a reliable NoC router architecture that has the unique ability to tolerate both hard and soft errors in the routing pipeline using techniques such as spatial redundancy, exploitation of idle cycles, bypassing of faulty resources and selective hardening. Using Mean Time to Failure and Silicon Protection Factor metrics, we show that Shield is six times more reliable than the baseline-unprotected router and is at least 1.5 times more reliable than existing fault tolerant router architectures. We introduce a new metric called Soft Error Improvement Factor and show that the soft error tolerance of Shield has improved by three times in comparison to the baseline-unprotected router. This reliability improvement is accomplished by incurring an area and power overhead of 34 and 31 percent respectively. Latency analysis using SPLASH-2 and PARSEC reveals that in the presence of faults, latency increases by a modest 13 and 10 percent respectively.

Original languageEnglish (US)
Article number7390298
Pages (from-to)3058-3070
Number of pages13
JournalIEEE Transactions on Parallel and Distributed Systems
Volume27
Issue number10
DOIs
StatePublished - Oct 1 2016

Fingerprint

Routers
Redundancy
Hardening
Pipelines
Silicon
Network-on-chip
Communication

Keywords

  • hard faults
  • mean time to failure
  • Network-on-chip
  • router architecture
  • soft errors

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

Shield : A Reliable Network-on-Chip Router Architecture for Chip Multiprocessors. / Poluri, Pavan; Louri, Ahmed.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 27, No. 10, 7390298, 01.10.2016, p. 3058-3070.

Research output: Contribution to journalArticle

@article{5d55d8fa96a844dbbc51462bdaa97937,
title = "Shield: A Reliable Network-on-Chip Router Architecture for Chip Multiprocessors",
abstract = "The increasing number of cores on a chip has made the network on chip (NoC) concept the standard communication paradigm for chip multiprocessors. A fault in an NoC leads to undesirable ramifications that can severely impact the performance of a chip. Therefore, it is vital to design fault tolerant NoCs. In this paper, we present Shield , a reliable NoC router architecture that has the unique ability to tolerate both hard and soft errors in the routing pipeline using techniques such as spatial redundancy, exploitation of idle cycles, bypassing of faulty resources and selective hardening. Using Mean Time to Failure and Silicon Protection Factor metrics, we show that Shield is six times more reliable than the baseline-unprotected router and is at least 1.5 times more reliable than existing fault tolerant router architectures. We introduce a new metric called Soft Error Improvement Factor and show that the soft error tolerance of Shield has improved by three times in comparison to the baseline-unprotected router. This reliability improvement is accomplished by incurring an area and power overhead of 34 and 31 percent respectively. Latency analysis using SPLASH-2 and PARSEC reveals that in the presence of faults, latency increases by a modest 13 and 10 percent respectively.",
keywords = "hard faults, mean time to failure, Network-on-chip, router architecture, soft errors",
author = "Pavan Poluri and Ahmed Louri",
year = "2016",
month = "10",
day = "1",
doi = "10.1109/TPDS.2016.2521641",
language = "English (US)",
volume = "27",
pages = "3058--3070",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "10",

}

TY - JOUR

T1 - Shield

T2 - A Reliable Network-on-Chip Router Architecture for Chip Multiprocessors

AU - Poluri, Pavan

AU - Louri, Ahmed

PY - 2016/10/1

Y1 - 2016/10/1

N2 - The increasing number of cores on a chip has made the network on chip (NoC) concept the standard communication paradigm for chip multiprocessors. A fault in an NoC leads to undesirable ramifications that can severely impact the performance of a chip. Therefore, it is vital to design fault tolerant NoCs. In this paper, we present Shield , a reliable NoC router architecture that has the unique ability to tolerate both hard and soft errors in the routing pipeline using techniques such as spatial redundancy, exploitation of idle cycles, bypassing of faulty resources and selective hardening. Using Mean Time to Failure and Silicon Protection Factor metrics, we show that Shield is six times more reliable than the baseline-unprotected router and is at least 1.5 times more reliable than existing fault tolerant router architectures. We introduce a new metric called Soft Error Improvement Factor and show that the soft error tolerance of Shield has improved by three times in comparison to the baseline-unprotected router. This reliability improvement is accomplished by incurring an area and power overhead of 34 and 31 percent respectively. Latency analysis using SPLASH-2 and PARSEC reveals that in the presence of faults, latency increases by a modest 13 and 10 percent respectively.

AB - The increasing number of cores on a chip has made the network on chip (NoC) concept the standard communication paradigm for chip multiprocessors. A fault in an NoC leads to undesirable ramifications that can severely impact the performance of a chip. Therefore, it is vital to design fault tolerant NoCs. In this paper, we present Shield , a reliable NoC router architecture that has the unique ability to tolerate both hard and soft errors in the routing pipeline using techniques such as spatial redundancy, exploitation of idle cycles, bypassing of faulty resources and selective hardening. Using Mean Time to Failure and Silicon Protection Factor metrics, we show that Shield is six times more reliable than the baseline-unprotected router and is at least 1.5 times more reliable than existing fault tolerant router architectures. We introduce a new metric called Soft Error Improvement Factor and show that the soft error tolerance of Shield has improved by three times in comparison to the baseline-unprotected router. This reliability improvement is accomplished by incurring an area and power overhead of 34 and 31 percent respectively. Latency analysis using SPLASH-2 and PARSEC reveals that in the presence of faults, latency increases by a modest 13 and 10 percent respectively.

KW - hard faults

KW - mean time to failure

KW - Network-on-chip

KW - router architecture

KW - soft errors

UR - http://www.scopus.com/inward/record.url?scp=84987895209&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84987895209&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2016.2521641

DO - 10.1109/TPDS.2016.2521641

M3 - Article

VL - 27

SP - 3058

EP - 3070

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 10

M1 - 7390298

ER -