Dynamic error mitigation in NoCs using intelligent prediction techniques

Dominic DiTomaso, Travis Boraten, Avinash Kodi, Ahmed Louri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Network-on-chips (NoCs) are quickly becoming the standard communication fabric for multi-core systems. As technology continues to scale down into the nanometer regime, device behavior will become increasingly unreliable due to a combination of aging, soft errors, aggressive transistor design, and process-voltage-Temperature variations. Further, stringent timing constraints in NoCs are designed so that data can be pushed faster. The net result is an increase in errors which must be mitigated by the NoC. Typical techniques for handling faults are often reactive as they respond to faults after the error has occurred, making the recovery process inefficient in energy and time. In this paper, we take a different approach wherein we propose to use proactive, fault-Tolerant schemes to be employed before the fault affects the system. We propose to utilize machine learning techniques to train a decision tree which can be used to predict faults efficiently in the network. Based on the prediction model, we dynamically mitigate these predicted faults through error correction codes (ECC) and relaxed timing transmission. Our results indicate that, on average, we can accurately predict timing errors 60.6% better than a static single error correction and double error detection (SECDED) technique resulting in an average 26.8% reduction in retransmitted packets, a average net speedup of 3.31 x, and an average energy savings of 60.0% over other designs for real traffic patterns.

Original languageEnglish (US)
Title of host publicationMICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture
PublisherIEEE Computer Society
Volume2016-December
ISBN (Electronic)9781509035083
DOIs
StatePublished - Dec 14 2016
Externally publishedYes
Event49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016 - Taipei, Taiwan, Province of China
Duration: Oct 15 2016Oct 19 2016

Other

Other49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016
CountryTaiwan, Province of China
CityTaipei
Period10/15/1610/19/16

Fingerprint

Error correction
Error detection
Decision trees
Learning systems
Energy conservation
Transistors
Aging of materials
Recovery
Network-on-chip
Communication
Electric potential
Temperature

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

DiTomaso, D., Boraten, T., Kodi, A., & Louri, A. (2016). Dynamic error mitigation in NoCs using intelligent prediction techniques. In MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture (Vol. 2016-December). [7783734] IEEE Computer Society. https://doi.org/10.1109/MICRO.2016.7783734

Dynamic error mitigation in NoCs using intelligent prediction techniques. / DiTomaso, Dominic; Boraten, Travis; Kodi, Avinash; Louri, Ahmed.

MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture. Vol. 2016-December IEEE Computer Society, 2016. 7783734.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

DiTomaso, D, Boraten, T, Kodi, A & Louri, A 2016, Dynamic error mitigation in NoCs using intelligent prediction techniques. in MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture. vol. 2016-December, 7783734, IEEE Computer Society, 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, Province of China, 10/15/16. https://doi.org/10.1109/MICRO.2016.7783734
DiTomaso D, Boraten T, Kodi A, Louri A. Dynamic error mitigation in NoCs using intelligent prediction techniques. In MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture. Vol. 2016-December. IEEE Computer Society. 2016. 7783734 https://doi.org/10.1109/MICRO.2016.7783734
DiTomaso, Dominic ; Boraten, Travis ; Kodi, Avinash ; Louri, Ahmed. / Dynamic error mitigation in NoCs using intelligent prediction techniques. MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture. Vol. 2016-December IEEE Computer Society, 2016.
@inproceedings{2bbb729c0cb24764a69648724580d89c,
title = "Dynamic error mitigation in NoCs using intelligent prediction techniques",
abstract = "Network-on-chips (NoCs) are quickly becoming the standard communication fabric for multi-core systems. As technology continues to scale down into the nanometer regime, device behavior will become increasingly unreliable due to a combination of aging, soft errors, aggressive transistor design, and process-voltage-Temperature variations. Further, stringent timing constraints in NoCs are designed so that data can be pushed faster. The net result is an increase in errors which must be mitigated by the NoC. Typical techniques for handling faults are often reactive as they respond to faults after the error has occurred, making the recovery process inefficient in energy and time. In this paper, we take a different approach wherein we propose to use proactive, fault-Tolerant schemes to be employed before the fault affects the system. We propose to utilize machine learning techniques to train a decision tree which can be used to predict faults efficiently in the network. Based on the prediction model, we dynamically mitigate these predicted faults through error correction codes (ECC) and relaxed timing transmission. Our results indicate that, on average, we can accurately predict timing errors 60.6{\%} better than a static single error correction and double error detection (SECDED) technique resulting in an average 26.8{\%} reduction in retransmitted packets, a average net speedup of 3.31 x, and an average energy savings of 60.0{\%} over other designs for real traffic patterns.",
author = "Dominic DiTomaso and Travis Boraten and Avinash Kodi and Ahmed Louri",
year = "2016",
month = "12",
day = "14",
doi = "10.1109/MICRO.2016.7783734",
language = "English (US)",
volume = "2016-December",
booktitle = "MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Dynamic error mitigation in NoCs using intelligent prediction techniques

AU - DiTomaso, Dominic

AU - Boraten, Travis

AU - Kodi, Avinash

AU - Louri, Ahmed

PY - 2016/12/14

Y1 - 2016/12/14

N2 - Network-on-chips (NoCs) are quickly becoming the standard communication fabric for multi-core systems. As technology continues to scale down into the nanometer regime, device behavior will become increasingly unreliable due to a combination of aging, soft errors, aggressive transistor design, and process-voltage-Temperature variations. Further, stringent timing constraints in NoCs are designed so that data can be pushed faster. The net result is an increase in errors which must be mitigated by the NoC. Typical techniques for handling faults are often reactive as they respond to faults after the error has occurred, making the recovery process inefficient in energy and time. In this paper, we take a different approach wherein we propose to use proactive, fault-Tolerant schemes to be employed before the fault affects the system. We propose to utilize machine learning techniques to train a decision tree which can be used to predict faults efficiently in the network. Based on the prediction model, we dynamically mitigate these predicted faults through error correction codes (ECC) and relaxed timing transmission. Our results indicate that, on average, we can accurately predict timing errors 60.6% better than a static single error correction and double error detection (SECDED) technique resulting in an average 26.8% reduction in retransmitted packets, a average net speedup of 3.31 x, and an average energy savings of 60.0% over other designs for real traffic patterns.

AB - Network-on-chips (NoCs) are quickly becoming the standard communication fabric for multi-core systems. As technology continues to scale down into the nanometer regime, device behavior will become increasingly unreliable due to a combination of aging, soft errors, aggressive transistor design, and process-voltage-Temperature variations. Further, stringent timing constraints in NoCs are designed so that data can be pushed faster. The net result is an increase in errors which must be mitigated by the NoC. Typical techniques for handling faults are often reactive as they respond to faults after the error has occurred, making the recovery process inefficient in energy and time. In this paper, we take a different approach wherein we propose to use proactive, fault-Tolerant schemes to be employed before the fault affects the system. We propose to utilize machine learning techniques to train a decision tree which can be used to predict faults efficiently in the network. Based on the prediction model, we dynamically mitigate these predicted faults through error correction codes (ECC) and relaxed timing transmission. Our results indicate that, on average, we can accurately predict timing errors 60.6% better than a static single error correction and double error detection (SECDED) technique resulting in an average 26.8% reduction in retransmitted packets, a average net speedup of 3.31 x, and an average energy savings of 60.0% over other designs for real traffic patterns.

UR - http://www.scopus.com/inward/record.url?scp=85009391343&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009391343&partnerID=8YFLogxK

U2 - 10.1109/MICRO.2016.7783734

DO - 10.1109/MICRO.2016.7783734

M3 - Conference contribution

VL - 2016-December

BT - MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture

PB - IEEE Computer Society

ER -