Repair Strategies for Mobile Storage Systems

Gokhan Calis, Swetha Shivaramaiah, Onur Ozan Koyluoglu, Loukas Lazos

Research output: Contribution to journalArticle

Abstract

We study the data reliability problem for devices forming a dynamic distributed storage system. Such systems are commonplace in traditional cloud storage applications where storage node failures and updates are frequent. We consider the application of regenerating codes for file maintenance. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost. We show that at a low departure-to-repair rate regime, in which repairs are initiated after several nodes have left the system outperforms if repairs are initiated after a single node departure. This optimality is reversed when the node turnover is high. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time. In addition, we examine cooperative repair strategies and show performance improvements. We investigate several models for the time needed for node repair including a simple fixed time model and a more realistic model that takes into account the number of repaired nodes. Finally, an extended model where additional failures are allowed during the repair process is investigated.

Original languageEnglish (US)
JournalIEEE Transactions on Cloud Computing
DOIs
StatePublished - Jan 1 2019

Fingerprint

Repair
Reconstruction (structural)
Bandwidth

Keywords

  • data reliability
  • Distributed storage
  • dynamic cloud
  • mobile cloud
  • regenerating codes

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Repair Strategies for Mobile Storage Systems. / Calis, Gokhan; Shivaramaiah, Swetha; Koyluoglu, Onur Ozan; Lazos, Loukas.

In: IEEE Transactions on Cloud Computing, 01.01.2019.

Research output: Contribution to journalArticle

@article{04c4a3c399c842f18c97277d2a74f71e,
title = "Repair Strategies for Mobile Storage Systems",
abstract = "We study the data reliability problem for devices forming a dynamic distributed storage system. Such systems are commonplace in traditional cloud storage applications where storage node failures and updates are frequent. We consider the application of regenerating codes for file maintenance. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost. We show that at a low departure-to-repair rate regime, in which repairs are initiated after several nodes have left the system outperforms if repairs are initiated after a single node departure. This optimality is reversed when the node turnover is high. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time. In addition, we examine cooperative repair strategies and show performance improvements. We investigate several models for the time needed for node repair including a simple fixed time model and a more realistic model that takes into account the number of repaired nodes. Finally, an extended model where additional failures are allowed during the repair process is investigated.",
keywords = "data reliability, Distributed storage, dynamic cloud, mobile cloud, regenerating codes",
author = "Gokhan Calis and Swetha Shivaramaiah and Koyluoglu, {Onur Ozan} and Loukas Lazos",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/TCC.2019.2914436",
language = "English (US)",
journal = "IEEE Transactions on Cloud Computing",
issn = "2168-7161",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Repair Strategies for Mobile Storage Systems

AU - Calis, Gokhan

AU - Shivaramaiah, Swetha

AU - Koyluoglu, Onur Ozan

AU - Lazos, Loukas

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We study the data reliability problem for devices forming a dynamic distributed storage system. Such systems are commonplace in traditional cloud storage applications where storage node failures and updates are frequent. We consider the application of regenerating codes for file maintenance. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost. We show that at a low departure-to-repair rate regime, in which repairs are initiated after several nodes have left the system outperforms if repairs are initiated after a single node departure. This optimality is reversed when the node turnover is high. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time. In addition, we examine cooperative repair strategies and show performance improvements. We investigate several models for the time needed for node repair including a simple fixed time model and a more realistic model that takes into account the number of repaired nodes. Finally, an extended model where additional failures are allowed during the repair process is investigated.

AB - We study the data reliability problem for devices forming a dynamic distributed storage system. Such systems are commonplace in traditional cloud storage applications where storage node failures and updates are frequent. We consider the application of regenerating codes for file maintenance. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost. We show that at a low departure-to-repair rate regime, in which repairs are initiated after several nodes have left the system outperforms if repairs are initiated after a single node departure. This optimality is reversed when the node turnover is high. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time. In addition, we examine cooperative repair strategies and show performance improvements. We investigate several models for the time needed for node repair including a simple fixed time model and a more realistic model that takes into account the number of repaired nodes. Finally, an extended model where additional failures are allowed during the repair process is investigated.

KW - data reliability

KW - Distributed storage

KW - dynamic cloud

KW - mobile cloud

KW - regenerating codes

UR - http://www.scopus.com/inward/record.url?scp=85065408440&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065408440&partnerID=8YFLogxK

U2 - 10.1109/TCC.2019.2914436

DO - 10.1109/TCC.2019.2914436

M3 - Article

AN - SCOPUS:85065408440

JO - IEEE Transactions on Cloud Computing

JF - IEEE Transactions on Cloud Computing

SN - 2168-7161

ER -