Collision array based workload assignment for Network-on-Chip concurrency

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To improve Network-on-Chip (NoC) parallelism, this paper proposes a new collision array based workload assignment to increase data request cancellation. Through a task flow partitioning algorithm, we minimize sequential data access and then dynamically schedule tasks while minimizing router execution time. Experimental results show that this method can provide an average of 87.7% system throughput improvement and 41.4% router execution time reduction. This throughput improvement is the direct consequence of collision array. A 7x improvement was reported in [10] Fig. 7 when 32 threads are employed on a single core. The system can achieve 2.7 times of speedup. By investigating the performance-overhead tradeoff between different collision array sizes, we proved a maximum of 42.9% energy and area overheads saving, only with a cost of 23.6% performance degradation in term of router execution time.

Original languageEnglish (US)
Title of host publicationInternational System on Chip Conference
PublisherIEEE Computer Society
Pages188-191
Number of pages4
ISBN (Print)9781479933785
DOIs
StatePublished - Nov 5 2014
Event27th IEEE International System on Chip Conference, SOCC 2014 - Las Vegas, United States
Duration: Sep 2 2014Sep 5 2014

Other

Other27th IEEE International System on Chip Conference, SOCC 2014
CountryUnited States
CityLas Vegas
Period9/2/149/5/14

Fingerprint

Routers
Throughput
Degradation
Network-on-chip
Costs

Keywords

  • collision array
  • Network-on-Chip system
  • parallelism
  • workload assignment

ASJC Scopus subject areas

  • Hardware and Architecture
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Zhou, H., Powers, L. S., & Wang, M. (2014). Collision array based workload assignment for Network-on-Chip concurrency. In International System on Chip Conference (pp. 188-191). [6948924] IEEE Computer Society. https://doi.org/10.1109/SOCC.2014.6948924

Collision array based workload assignment for Network-on-Chip concurrency. / Zhou, He; Powers, Linda S; Wang, Meiling.

International System on Chip Conference. IEEE Computer Society, 2014. p. 188-191 6948924.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhou, H, Powers, LS & Wang, M 2014, Collision array based workload assignment for Network-on-Chip concurrency. in International System on Chip Conference., 6948924, IEEE Computer Society, pp. 188-191, 27th IEEE International System on Chip Conference, SOCC 2014, Las Vegas, United States, 9/2/14. https://doi.org/10.1109/SOCC.2014.6948924
Zhou H, Powers LS, Wang M. Collision array based workload assignment for Network-on-Chip concurrency. In International System on Chip Conference. IEEE Computer Society. 2014. p. 188-191. 6948924 https://doi.org/10.1109/SOCC.2014.6948924
Zhou, He ; Powers, Linda S ; Wang, Meiling. / Collision array based workload assignment for Network-on-Chip concurrency. International System on Chip Conference. IEEE Computer Society, 2014. pp. 188-191
@inproceedings{63c02b07f0c94ab192a97862c1d992d4,
title = "Collision array based workload assignment for Network-on-Chip concurrency",
abstract = "To improve Network-on-Chip (NoC) parallelism, this paper proposes a new collision array based workload assignment to increase data request cancellation. Through a task flow partitioning algorithm, we minimize sequential data access and then dynamically schedule tasks while minimizing router execution time. Experimental results show that this method can provide an average of 87.7{\%} system throughput improvement and 41.4{\%} router execution time reduction. This throughput improvement is the direct consequence of collision array. A 7x improvement was reported in [10] Fig. 7 when 32 threads are employed on a single core. The system can achieve 2.7 times of speedup. By investigating the performance-overhead tradeoff between different collision array sizes, we proved a maximum of 42.9{\%} energy and area overheads saving, only with a cost of 23.6{\%} performance degradation in term of router execution time.",
keywords = "collision array, Network-on-Chip system, parallelism, workload assignment",
author = "He Zhou and Powers, {Linda S} and Meiling Wang",
year = "2014",
month = "11",
day = "5",
doi = "10.1109/SOCC.2014.6948924",
language = "English (US)",
isbn = "9781479933785",
pages = "188--191",
booktitle = "International System on Chip Conference",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Collision array based workload assignment for Network-on-Chip concurrency

AU - Zhou, He

AU - Powers, Linda S

AU - Wang, Meiling

PY - 2014/11/5

Y1 - 2014/11/5

N2 - To improve Network-on-Chip (NoC) parallelism, this paper proposes a new collision array based workload assignment to increase data request cancellation. Through a task flow partitioning algorithm, we minimize sequential data access and then dynamically schedule tasks while minimizing router execution time. Experimental results show that this method can provide an average of 87.7% system throughput improvement and 41.4% router execution time reduction. This throughput improvement is the direct consequence of collision array. A 7x improvement was reported in [10] Fig. 7 when 32 threads are employed on a single core. The system can achieve 2.7 times of speedup. By investigating the performance-overhead tradeoff between different collision array sizes, we proved a maximum of 42.9% energy and area overheads saving, only with a cost of 23.6% performance degradation in term of router execution time.

AB - To improve Network-on-Chip (NoC) parallelism, this paper proposes a new collision array based workload assignment to increase data request cancellation. Through a task flow partitioning algorithm, we minimize sequential data access and then dynamically schedule tasks while minimizing router execution time. Experimental results show that this method can provide an average of 87.7% system throughput improvement and 41.4% router execution time reduction. This throughput improvement is the direct consequence of collision array. A 7x improvement was reported in [10] Fig. 7 when 32 threads are employed on a single core. The system can achieve 2.7 times of speedup. By investigating the performance-overhead tradeoff between different collision array sizes, we proved a maximum of 42.9% energy and area overheads saving, only with a cost of 23.6% performance degradation in term of router execution time.

KW - collision array

KW - Network-on-Chip system

KW - parallelism

KW - workload assignment

UR - http://www.scopus.com/inward/record.url?scp=84911945853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911945853&partnerID=8YFLogxK

U2 - 10.1109/SOCC.2014.6948924

DO - 10.1109/SOCC.2014.6948924

M3 - Conference contribution

AN - SCOPUS:84911945853

SN - 9781479933785

SP - 188

EP - 191

BT - International System on Chip Conference

PB - IEEE Computer Society

ER -