[Data Visualization] Validation

7 분 소요

Vis Design

Your boss: “Develop a vis interface for this data.”
What you do:
- Get the data
- Get some examples and libraries on the Web
- Implement and customize your visualization
- You in the next meeting: “This is our vis interface.”
This is what many data scientists actually do in their company.

Problems:
- You cannot answer “Why is yours effective?”
- You cannot answer “Why is yours better than other designs?”
- You cannot evaluate your vis quantitatively nor qualitatively.
Why this happened? Because you didn’t define the problem to solve using vis.
How can we design and validate a vis crrectly?

Four Levels of Vis Design

First part of the question: how do we design a vis?
Our framework: splitting the vis design process into four cascading levels

Domain situation: you consider the details of a particular application domain for vis.
Domain: a particular field of interest of the target users of a vis tool.
Example questions:
- Who are the users?
- What are their ultimate goal?
- Why cannot their tasks be automated?

What-why abstraction: you map those domain-specific problems and data into forms that are independent of the domain.
Example questions:
- What are the dataset types (tables/networks/geometry, $\dots$)?
- What are the attribute types (categorical/ordinal)?
- What are the tasks?
Use the terminology we learned!

How abstraction: you design idioms that specify the approach to visual encoding and interaction.
Idioms: vis and interaction techniques (a bar chart, a scatterplot, $\dots$)
Example questions:
- What visual marks and channels can be used?
- What interactions can be adopted?
Use the terminology we learned!

Algorithm implementation: you design and implement algorithms!
Example questions:
- What algorithms should be used for computation?
- Into what components the system can be modularized?
This is what “programmers” do.

A block is the outcome of the design process at one level.
The outcome from an upstream level is input to the downstream level.
Choosing the wrong block at an upstream level inevitably cascades to all downstream levels!
- e.g., Even though your implementation was good, if you misdesigned the vis encoding, your system would not solve the intended problem.

Iterative Design Process

Although the blocks cascade, this doesn’t mean that your design should follow the waterfall model.
- The waterfall model is usually bad!
You can do it iteratively!

Even the actual users have difficulties in articulating their goals and needs in a clear-cut way
- Not familiar with technical terms/SW development/vis
Iterative design process in practice:
- 1st meeting: 20% of domain situation/goals/data/tasks identified.
- Prototype a vis
- 2nd meeting: bring the prototype and talk about it. 40% identified.
- Improve the prototype
- 3rd meeting: 60% identified
- $\dots$

Domain Situation

Goal: Identify situation blocks
Working with users to iteratively refine a design (user-centered design or human-centered design)
Observe what they actually do! Not just hear from them.
A computational biologist working in the field of comparative genomics, using genomic sequence data to ask questions about the genetic source of adaptivity in a species.
- What are the differences between individual nucleotides of feature pairs?
- What is the density of coverage and where are the gaps across a chromosome?

Task and Data Abstraction

Goal: Identify task blocks and design data blocks
Identify task blocks:
- Abstract domain-specific vocabulary into the domain-independent vocabulary.
- Abstract tasks (browsing, comparing, and summarizing)
- Help future vis designers interested in the same domain!
Design data blocks:
- Design data blocks not just select them.
- Choose the right from for data and transformation between them (e.g., even though users say a table, it can be a tree in fact).

Visual Encoding and Interaction Idiom

Goal: Design idiom blocks
The visual encoding idiom controls what users see (marks and channels).
The interaction idiom controls how users change what they see.
The design space of static visual encoding idioms is already huge, and it grows even bigger when you consider the manipulation between them.

Algorithm

Goal: Implement algorithms
Implement algorithms that efficiently handle visual encoding and interaction idioms.
- Knowledge on computer graphics can be a plus.
Consider the speed of computation and the memory footprint
Latency is very important in vis [Niel94].
- 0.1 s for continuous feedback (animation)
- 1 s for maintaining the user’s flow of thought
- 10 s for keeping the user’s attention

Perceptual Fusion

Two stimuli within a perceptual processor cycle appear fused → the first event appears to cause the other.

Threats to Validity

Each level of the four levels has a different set of threats to validity.
Threats to validity: Reasons why you might have made the wrong choices.

Immediate validation approaches take place before you entering the next level.
- Not many since they require results from the downstream levels nested within them
- Prevent you from making poor choices
Downstream validation approaches happen at the end of each level.
- Necessary for papers to be published

Domain Validation

You can conduct a field study to validate the domain situation block.
- You observe how people act in real-world settings, rather than by bringing them into a laboratory setting.
e.g., contextual inquiry
- Observe users working in their real-world context and interrupt them to ask questions when clarification is needed
- Better than silent observation
- [Holtzblatt and Jones 93]

Idiom Validation

As immediate validation for the idiom level, you should justify your idiom choices.
- Why specific idioms were chosen (and others not)?
- Justify the design with respect to known perceptual and cognitive principles
- Ensure that your design does not violate known guidelines (e.g., scalability).

“We chose vis idiom X since it was found that idiom X outperforms others for ABC tasks [ref].”
“We limited # of color bands in vis to 4 due to the perceptual limitation in color judgement.”
“We adopted a filtering interaction since the scatterplot is not effective for 10 K data points.”

Algorithm Validation

To validate the algorithm level immediately, you can analyze computational complexity.
- “The time complexity of our algorithm is 𝑂(𝑁𝑙𝑜𝑔𝑁) which is better than the baseline.”
After implementation, you can conduct benchmarks.
- “Our algorithm was faster than the baseline.”

Idiom Validation

You can conduct a lab study to validate idiom abstraction.
- A controlled experiment in a lab setting
- Sometimes called a user study
Quantitative: time spent, # of errors made, logging actions, tracking eye movements, $\dots$
Qualitative: questionnaires, interviews, $\dots$

Attach images and videos that demonstrate your work
- Useful especially when there is an explicit discussion pointing out the desirable properties in the results
- “The cluttering problem is alleviated in Area A in Figure B $\dots$”
Use quality metrics if exist: edge crossings, edge bends, structural similarity (SSIM), $\dots$

Abstraction Validation

You can conduct a case study to validate task/data abstraction
- you invite members of the target user community, ask them to use the tool, and collect anecdotal evidence of utility.
- “The experts found the system useful in $\dots$”
- Qualitative (but try to be quantitative!)
Field studies are also possible.

Domain Validation

For downstream validation, you can investigate how your vis tool has been adopted by the target audience.
i.e., adoption rates
It does not tell the whole story but can be a Key Performance Indicator (KPI).

Validate Everything?

It is impossible to address all four levels in detail in a single research paper.
- Limited time and space
In practice, we use a small subset of validation methods focusing on validating what we claim.

Two Angles of Attack for Vis Design

With problem-driven work, you start at the top domain situation level and walk your way down through abstraction, idiom, and algorithm decisions.
- Top-down approach
- Called a design paper, an application paper, or a design study
In technique-driven work, you work at one of the bottom two levels, idiom or algorithm design.
- Bottom-up approach
- Called a technique paper or an algorithm paper
- Many people think InfoVis is about creating a new visualization, but only few idioms are newly reported every year!

https://ieeevis.org/year/2023/info/call-participation/call-for-participation

Validate What You Claimed

To validate problem-driven work (an application paper), you should bring the actual users in the design.
- Field study, case study, domain expert feedback, $\dots$
- Lab studies or performance benchmarks are not necessary.
To validate technique-driven work, you should report some quantitative results.
- Lab studies for new visualizations and interactions
- Benchmarks for new algorithms
If there is a mismatch, your paper is very unlikely to be published.

Validation Example

Henry and Fekete, “MatrixExplorer: a Dual-Representation System to Explore Social Networks” (TVCG, 2006)

Eliciting requirements

Don’t make your paper a manual
Justify encoding/interaction design based on the observation

At the algorithm level, the focus is on the reordering algorithm.
Downstream benchmark timings are mentioned very briefly.

Qualitative result image analysis

Summary: Validation

Twitter Facebook LinkedIn

LEE CHANWOO

[Data Visualization] Validation

Vis Design

Four Levels of Vis Design

Iterative Design Process

Domain Situation

Task and Data Abstraction

Visual Encoding and Interaction Idiom

Algorithm

Perceptual Fusion

Threats to Validity

Domain Validation

Idiom Validation

Algorithm Validation

Idiom Validation

Abstraction Validation

Domain Validation

Validate Everything?

Two Angles of Attack for Vis Design

Validate What You Claimed

Validation Example

Summary: Validation

공유하기

댓글남기기

참고

[Programming] gRPC란? gRPC와 REST의 차이점

[Python] uv : 패키지 관리 도구

[Python] PEP 8 : Style Guide for Python Code

[Python] PEP 20 : The Zen of Python