Looks Good To Me: Visualizations As Sanity Checks

Michael Correll, Mingwei Li, Gordon Kindlmann, Carlos Eduardo Scheidegger

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.

Original languageEnglish (US)
JournalIEEE Transactions on Visualization and Computer Graphics
DOIs
StateAccepted/In press - Aug 19 2018

Fingerprint

Visualization
Defects
Drawing (graphics)
Opacity
Bins
Inspection
Statistics

Keywords

  • Bandwidth
  • Data analysis
  • Data integrity
  • data quality
  • Data visualization
  • Graphical perception
  • Histograms
  • Kernel
  • univariate visualizations
  • Visualization

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Cite this

Looks Good To Me : Visualizations As Sanity Checks. / Correll, Michael; Li, Mingwei; Kindlmann, Gordon; Scheidegger, Carlos Eduardo.

In: IEEE Transactions on Visualization and Computer Graphics, 19.08.2018.

Research output: Contribution to journalArticle

@article{d85e243c789442a0863713bd4855e019,
title = "Looks Good To Me: Visualizations As Sanity Checks",
abstract = "Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.",
keywords = "Bandwidth, Data analysis, Data integrity, data quality, Data visualization, Graphical perception, Histograms, Kernel, univariate visualizations, Visualization",
author = "Michael Correll and Mingwei Li and Gordon Kindlmann and Scheidegger, {Carlos Eduardo}",
year = "2018",
month = "8",
day = "19",
doi = "10.1109/TVCG.2018.2864907",
language = "English (US)",
journal = "IEEE Transactions on Visualization and Computer Graphics",
issn = "1077-2626",
publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Looks Good To Me

T2 - Visualizations As Sanity Checks

AU - Correll, Michael

AU - Li, Mingwei

AU - Kindlmann, Gordon

AU - Scheidegger, Carlos Eduardo

PY - 2018/8/19

Y1 - 2018/8/19

N2 - Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.

AB - Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.

KW - Bandwidth

KW - Data analysis

KW - Data integrity

KW - data quality

KW - Data visualization

KW - Graphical perception

KW - Histograms

KW - Kernel

KW - univariate visualizations

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=85052643856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052643856&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2018.2864907

DO - 10.1109/TVCG.2018.2864907

M3 - Article

AN - SCOPUS:85052643856

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

SN - 1077-2626

ER -