Stargate: Remote data access between hadoop clusters

Illyoung Choi, John H. Hartman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The transfer of large-scale datasets between geographically separated systems is a challenge in scientific computing, made even more complicated when the systems are clusters of computers. In this paper we present Stargate, a file system that enables efficient on-demand remote data access for Hadoop-based scientific computations. Stargate uses a content-addressable protocol, on-demand access, and multi-tier caching to address the challenges of large data transfers over a WAN. Stargate also uses a novel approach that co-locates computations and transfers to achieve efficient data access in cluster computing. Unlike other approaches, Stargate is implemented as an independent file system service that works with any computation framework. In our experiments Stargate's performance on heavy I/O workloads was 7% faster than WebHDFS and only 8% slower than HDFS. In addition, Stargate's caches effectively trade high-cost WAN traffic for low-cost LAN traffic. Stargate's performance, on-demand data access, and reduction in WAN traffic make it a good platform for providing remote dataset access to scientific computations on clusters.

Original languageEnglish (US)
Title of host publicationProceedings of the 36th Annual ACM Symposium on Applied Computing, SAC 2021
PublisherAssociation for Computing Machinery
Pages32-39
Number of pages8
ISBN (Electronic)9781450381048
DOIs
StatePublished - Mar 22 2021
Event36th Annual ACM Symposium on Applied Computing, SAC 2021 - Virtual, Online, Korea, Republic of
Duration: Mar 22 2021Mar 26 2021

Publication series

NameProceedings of the ACM Symposium on Applied Computing

Conference

Conference36th Annual ACM Symposium on Applied Computing, SAC 2021
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period3/22/213/26/21

Keywords

  • cluster-to-cluster data transfer
  • file system
  • on-demand remote data access
  • remote data access
  • WAN
  • WAN file system
  • wide-area network

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Stargate: Remote data access between hadoop clusters'. Together they form a unique fingerprint.

Cite this