Accordion: Multi-scale recipes for adaptive detection of duplication

Russell Lewis, John H. Hartman

Research output: Contribution to conferencePaper

Abstract

A recipe is metadata that describes the contents of a file as a sequence of blocks identified by their hash. Using recipes, one can rapidly compare the contents of two files without reading the files themselves. Unfortunately, recipes present a space/precision tradeoff: small block sizes will maximize the duplication that is discoverable, but large block sizes produce small recipes that can be compared more quickly. In this paper, we present Accordion, a toolset for the creation and use of multi-scale recipes - that is, recipes that include blocks at several different scales. We demonstrate two duplication-detection algorithms - one optimized for situations where lots of duplication is expected, and another for those where the existence of duplication is uncertain.

Original languageEnglish (US)
StatePublished - Jan 1 2020
Event7th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2015 - Santa Clara, United States
Duration: Jul 6 2015Jul 7 2015

Conference

Conference7th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2015
CountryUnited States
CitySanta Clara
Period7/6/157/7/15

    Fingerprint

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Cite this

Lewis, R., & Hartman, J. H. (2020). Accordion: Multi-scale recipes for adaptive detection of duplication. Paper presented at 7th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2015, Santa Clara, United States.