Nanocubes for real-time exploration of spatiotemporal datasets

Lauro Lins, James T. Klosowski, Carlos Eduardo Scheidegger

Research output: Contribution to journalArticle

118 Citations (Scopus)

Abstract

Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally Are there trends or outliers in the data Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

Original languageEnglish (US)
Article number6634137
Pages (from-to)2456-2465
Number of pages10
JournalIEEE Transactions on Visualization and Computer Graphics
Volume19
Issue number12
DOIs
StatePublished - 2013
Externally publishedYes

Fingerprint

Agglomeration
Data storage equipment
Databases
Data structures
Visualization
Scanning
Bandwidth
Datasets

Keywords

  • Data cube
  • data structures
  • interactive exploration

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Medicine(all)

Cite this

Nanocubes for real-time exploration of spatiotemporal datasets. / Lins, Lauro; Klosowski, James T.; Scheidegger, Carlos Eduardo.

In: IEEE Transactions on Visualization and Computer Graphics, Vol. 19, No. 12, 6634137, 2013, p. 2456-2465.

Research output: Contribution to journalArticle

@article{b8a3d7248d544b30933e02370786a74d,
title = "Nanocubes for real-time exploration of spatiotemporal datasets",
abstract = "Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally Are there trends or outliers in the data Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.",
keywords = "Data cube, data structures, interactive exploration",
author = "Lauro Lins and Klosowski, {James T.} and Scheidegger, {Carlos Eduardo}",
year = "2013",
doi = "10.1109/TVCG.2013.179",
language = "English (US)",
volume = "19",
pages = "2456--2465",
journal = "IEEE Transactions on Visualization and Computer Graphics",
issn = "1077-2626",
publisher = "IEEE Computer Society",
number = "12",

}

TY - JOUR

T1 - Nanocubes for real-time exploration of spatiotemporal datasets

AU - Lins, Lauro

AU - Klosowski, James T.

AU - Scheidegger, Carlos Eduardo

PY - 2013

Y1 - 2013

N2 - Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally Are there trends or outliers in the data Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

AB - Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally Are there trends or outliers in the data Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

KW - Data cube

KW - data structures

KW - interactive exploration

UR - http://www.scopus.com/inward/record.url?scp=84886695056&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886695056&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2013.179

DO - 10.1109/TVCG.2013.179

M3 - Article

C2 - 24051812

AN - SCOPUS:84886695056

VL - 19

SP - 2456

EP - 2465

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

SN - 1077-2626

IS - 12

M1 - 6634137

ER -