Gaussian Cubes: Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets

Zhe Wang, Nivan Ferreira, Youhao Wei, Aarthy Sankari Bhaskar, Carlos Eduardo Scheidegger

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Recently proposed techniques have finally made it possible for analysts to interactively explore very large datasets in real time. However powerful, the class of analyses these systems enable is somewhat limited: specifically, one can only quickly obtain plots such as histograms and heatmaps. In this paper, we contribute Gaussian Cubes, which significantly improves on state-of-the-art systems by providing interactive modeling capabilities, which include but are not limited to linear least squares and principal components analysis (PCA). The fundamental insight in Gaussian Cubes is that instead of precomputing counts of many data subsets (as state-of-the-art systems do), Gaussian Cubes precomputes the best multivariate Gaussian for the respective data subsets. As an example, Gaussian Cubes can fit hundreds of models over millions of data points in well under a second, enabling novel types of visual exploration of such large datasets. We present three case studies that highlight the visualization and analysis capabilities in Gaussian Cubes, using earthquake safety simulations, astronomical catalogs, and transportation statistics. The dataset sizes range around one hundred million elements and 5 to 10 dimensions. We present extensive performance results, a discussion of the limitations in Gaussian Cubes, and future research directions.

Original languageEnglish (US)
Article number7536648
Pages (from-to)681-690
Number of pages10
JournalIEEE Transactions on Visualization and Computer Graphics
Volume23
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

Principal component analysis
Earthquakes
Visualization
Statistics

Keywords

  • data cubes
  • Data modeling
  • dimensionality reduction
  • interactive visualization

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Cite this

Gaussian Cubes : Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets. / Wang, Zhe; Ferreira, Nivan; Wei, Youhao; Bhaskar, Aarthy Sankari; Scheidegger, Carlos Eduardo.

In: IEEE Transactions on Visualization and Computer Graphics, Vol. 23, No. 1, 7536648, 01.01.2017, p. 681-690.

Research output: Contribution to journalArticle

@article{1e3a9fe980d64af49cd573f2dc3686c5,
title = "Gaussian Cubes: Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets",
abstract = "Recently proposed techniques have finally made it possible for analysts to interactively explore very large datasets in real time. However powerful, the class of analyses these systems enable is somewhat limited: specifically, one can only quickly obtain plots such as histograms and heatmaps. In this paper, we contribute Gaussian Cubes, which significantly improves on state-of-the-art systems by providing interactive modeling capabilities, which include but are not limited to linear least squares and principal components analysis (PCA). The fundamental insight in Gaussian Cubes is that instead of precomputing counts of many data subsets (as state-of-the-art systems do), Gaussian Cubes precomputes the best multivariate Gaussian for the respective data subsets. As an example, Gaussian Cubes can fit hundreds of models over millions of data points in well under a second, enabling novel types of visual exploration of such large datasets. We present three case studies that highlight the visualization and analysis capabilities in Gaussian Cubes, using earthquake safety simulations, astronomical catalogs, and transportation statistics. The dataset sizes range around one hundred million elements and 5 to 10 dimensions. We present extensive performance results, a discussion of the limitations in Gaussian Cubes, and future research directions.",
keywords = "data cubes, Data modeling, dimensionality reduction, interactive visualization",
author = "Zhe Wang and Nivan Ferreira and Youhao Wei and Bhaskar, {Aarthy Sankari} and Scheidegger, {Carlos Eduardo}",
year = "2017",
month = "1",
day = "1",
doi = "10.1109/TVCG.2016.2598694",
language = "English (US)",
volume = "23",
pages = "681--690",
journal = "IEEE Transactions on Visualization and Computer Graphics",
issn = "1077-2626",
publisher = "IEEE Computer Society",
number = "1",

}

TY - JOUR

T1 - Gaussian Cubes

T2 - Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets

AU - Wang, Zhe

AU - Ferreira, Nivan

AU - Wei, Youhao

AU - Bhaskar, Aarthy Sankari

AU - Scheidegger, Carlos Eduardo

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Recently proposed techniques have finally made it possible for analysts to interactively explore very large datasets in real time. However powerful, the class of analyses these systems enable is somewhat limited: specifically, one can only quickly obtain plots such as histograms and heatmaps. In this paper, we contribute Gaussian Cubes, which significantly improves on state-of-the-art systems by providing interactive modeling capabilities, which include but are not limited to linear least squares and principal components analysis (PCA). The fundamental insight in Gaussian Cubes is that instead of precomputing counts of many data subsets (as state-of-the-art systems do), Gaussian Cubes precomputes the best multivariate Gaussian for the respective data subsets. As an example, Gaussian Cubes can fit hundreds of models over millions of data points in well under a second, enabling novel types of visual exploration of such large datasets. We present three case studies that highlight the visualization and analysis capabilities in Gaussian Cubes, using earthquake safety simulations, astronomical catalogs, and transportation statistics. The dataset sizes range around one hundred million elements and 5 to 10 dimensions. We present extensive performance results, a discussion of the limitations in Gaussian Cubes, and future research directions.

AB - Recently proposed techniques have finally made it possible for analysts to interactively explore very large datasets in real time. However powerful, the class of analyses these systems enable is somewhat limited: specifically, one can only quickly obtain plots such as histograms and heatmaps. In this paper, we contribute Gaussian Cubes, which significantly improves on state-of-the-art systems by providing interactive modeling capabilities, which include but are not limited to linear least squares and principal components analysis (PCA). The fundamental insight in Gaussian Cubes is that instead of precomputing counts of many data subsets (as state-of-the-art systems do), Gaussian Cubes precomputes the best multivariate Gaussian for the respective data subsets. As an example, Gaussian Cubes can fit hundreds of models over millions of data points in well under a second, enabling novel types of visual exploration of such large datasets. We present three case studies that highlight the visualization and analysis capabilities in Gaussian Cubes, using earthquake safety simulations, astronomical catalogs, and transportation statistics. The dataset sizes range around one hundred million elements and 5 to 10 dimensions. We present extensive performance results, a discussion of the limitations in Gaussian Cubes, and future research directions.

KW - data cubes

KW - Data modeling

KW - dimensionality reduction

KW - interactive visualization

UR - http://www.scopus.com/inward/record.url?scp=84999114943&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84999114943&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2016.2598694

DO - 10.1109/TVCG.2016.2598694

M3 - Article

AN - SCOPUS:84999114943

VL - 23

SP - 681

EP - 690

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

SN - 1077-2626

IS - 1

M1 - 7536648

ER -