Evaluating Distributed Computing Infrastructures: An Empirical Study Comparing Hadoop Deployments on Cloud and Local Systems

Devipsita Bhattacharya, Faiz Currim, Sudha Ram

Research output: Contribution to journalArticle

1 Scopus citations

Abstract

The popularity of distributed computing platforms (e.g., Hadoop) is largely to their ability to address scalability issues that arise due to data storage and processing limitations of standard computing systems. However, the decision to dedicate organizational resources and capital for such systems needs a careful consideration of several factors including evaluation of cloud-based distributed computing options. We propose a framework of metrics which we used to conduct an in-depth performance and cost benefit analysis of two standard Hadoop infrastructural choices, i.e., a Platform as a Service (PaaS) on-demand cloud setup and a local organizational setup. We evaluated the framework with an exploratory data analysis use case for a large-scale graph processing research problem. Our analysis considered highly granular aspects of distributed computing performance and studied how utilization rates and infrastructure amortization times affect break-even times. We identified that virtual memory management adversely affects the performance of a cloud cluster during the reduce phase with the magnitude of degradation dependent on the type of MapReduce operation. Our study is intended not only as an evaluation of infrastructural choices but also a development of a metric framework that can serve as a baseline for researchers examining distributed infrastructures.

Original languageEnglish (US)
JournalIEEE Transactions on Cloud Computing
DOIs
StateAccepted/In press - Jan 1 2019

Keywords

  • Cloud computing
  • Cloud computing
  • Computer architecture
  • Computers and information processing
  • Cost benefit analysis
  • Data processing
  • Data processing
  • Distributed computing
  • Measurement
  • Parallel processing
  • Performance evaluation
  • Platform-as-a-Service
  • Task analysis

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Evaluating Distributed Computing Infrastructures: An Empirical Study Comparing Hadoop Deployments on Cloud and Local Systems'. Together they form a unique fingerprint.

  • Cite this