Using evolutive summary counters for efficient cooperative caching in search engines

David Dominguez-Sal, Josep Aguilar-Saborit, Mihai Surdeanu, Josep Lluis Larriba-Pey

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.

Original languageEnglish (US)
Article number5871600
Pages (from-to)776-784
Number of pages9
JournalIEEE Transactions on Parallel and Distributed Systems
Volume23
Issue number4
DOIs
StatePublished - 2012
Externally publishedYes

Fingerprint

Search engines
Data structures
Throughput
Statistics

Keywords

  • count filter
  • distributed caching
  • Distributed systems
  • resource intensive applications

ASJC Scopus subject areas

  • Hardware and Architecture
  • Signal Processing
  • Computational Theory and Mathematics

Cite this

Using evolutive summary counters for efficient cooperative caching in search engines. / Dominguez-Sal, David; Aguilar-Saborit, Josep; Surdeanu, Mihai; Larriba-Pey, Josep Lluis.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No. 4, 5871600, 2012, p. 776-784.

Research output: Contribution to journalArticle

Dominguez-Sal, David ; Aguilar-Saborit, Josep ; Surdeanu, Mihai ; Larriba-Pey, Josep Lluis. / Using evolutive summary counters for efficient cooperative caching in search engines. In: IEEE Transactions on Parallel and Distributed Systems. 2012 ; Vol. 23, No. 4. pp. 776-784.
@article{d4d9a61573a0476caad0540fbee45738,
title = "Using evolutive summary counters for efficient cooperative caching in search engines",
abstract = "We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.",
keywords = "count filter, distributed caching, Distributed systems, resource intensive applications",
author = "David Dominguez-Sal and Josep Aguilar-Saborit and Mihai Surdeanu and Larriba-Pey, {Josep Lluis}",
year = "2012",
doi = "10.1109/TPDS.2011.162",
language = "English (US)",
volume = "23",
pages = "776--784",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Using evolutive summary counters for efficient cooperative caching in search engines

AU - Dominguez-Sal, David

AU - Aguilar-Saborit, Josep

AU - Surdeanu, Mihai

AU - Larriba-Pey, Josep Lluis

PY - 2012

Y1 - 2012

N2 - We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.

AB - We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.

KW - count filter

KW - distributed caching

KW - Distributed systems

KW - resource intensive applications

UR - http://www.scopus.com/inward/record.url?scp=84858075067&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858075067&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2011.162

DO - 10.1109/TPDS.2011.162

M3 - Article

AN - SCOPUS:84858075067

VL - 23

SP - 776

EP - 784

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 4

M1 - 5871600

ER -