[Data Visualization] Why - Task Abstraction
Review: The Big Picture
- What data does the user see? (data abstraction)
- Why does the user use the system? (task abstraction)
- How are the visual encoding and interaction idioms constructed? (idiom abstraction)
The Big Picture
- What data does the user see? (data abstraction)
- Why does the user use the system? (task abstraction)
- How are the visual encoding and interaction idioms constructed? (idiom abstraction)
Why - Task Abstraction
- Why data abstraction?
- There are an infinite number of datasets.
- Thus, it would be inefficient to design a visualization system for every dataset.
- Why task abstraction?
- There are an infinite number of tasks that users want to perform on the system.
- Thus, it would be inefficient to design a visualization system for every task.
- Consider tasks in abstract form , rather than domain specific way
- Otherwise, hard to make useful comparisons between domain situations
- Actually, below are two instances of “compare values between two
- The analysis framework has a small set of carefully chosen words to describe why people are using vis
- Action: analyze, search, query, $\dots$
- Targets: trends, outliers, distribution, $\dots$
- The same vis tool might be usable for many different goals.
- To describe complex activities, you can specify a chained sequence of tasks, where the output of one becomes the input to the next.
Who : Designer or User
- Although who is not a part of the what why how framework, it is sometimes useful to specify who has a goal or makes a design choice.
Actions
- Three levels of actions:
- High level choice: Analyze
- Q) How is the vis tool used to analyze data?
- A) Consume existing data or produce additional data.
- Mid level choice: Search
- Q) What kind of search is involved?
- A) Lookup, browse, locate, or explore
- Low level choice: Query
- Q) Does the user need to identify one target?
- Choices at the three levels are independent.
- Usually, we describe actions at all three levels.
High level Choice: Analyze
- Why are possible goals of users who want to analyze data using a vis tool?
- Consume : The most common case for vis is for the user to consume information that has already been generated as data stored in a format amenable to computation.
- This is the most common “why”.
- Produce : However, sometimes, we use vis to produce new materials! We will see examples later.
High level Choice: Analyze - Consume
- Three consume goals:
- Discover (= explore) explore): to find new knowledge that was not previously known
- Present (= explain): to communicate with others about the knowledge that is known
- Enjoy : visualization in casual encounters, e.g., infographic
High level Choice: Analyze - Consume - Discover
- The discover goal refers to using vis to find new knowledge that was not previously known.
- You want to find some insights from an unseen dataset. What will you do?
- Usually, investigation is driven by existing theories, models, hypotheses, or hunches.
- Generate a new hypothesis or verify , or disconfirm, an existing hypothesis.
High level Choice: Analyze - Consume - Discover Example
- You have a periodic table data.
- What can you discover?
- You may want to explore the data using a vis tool.
- Plot a scatterplot melting point vs first ionization energy
- The distribution was bell shaped with an outlier (Carbon).
- Plot a scatterplot melting point vs boiling point
- My hypothesis was (boiling point) > (melting point), but there were a few outliers (e.g., Californium).
High level Choice: Analyze - Consume - Present
- The present goal refers to the use of vis for the succinct communication of information (= explain).
- e.g., telling a story with data, or guiding an audience through a series of cognitive operations.
- One classic example: a diagram in a newspaper
- The knowledge communicated is already known to the presenter in advance.
High level Choice: Analyze - Consume - Present Example
High level Choice: Analyze - Consume - Enjoy
- The enjoy goal refers to casual encounters with vis.
- Vis for fun!
- Sometimes, the goals of the eventual vis user might not be a match with the user goals conjectured by the vis designer!
High level Choice: Analyze - Consume - Enjoy Example
- Top 15 Best Global Brands Ranking (2000 2018)
- Source: https://www.youtube.com/watch?v=BQovQUga0VE&ab_channel=TheRankings
High level Choice: Analyze - Produce
- In the produce goal, the intent of the user is to generate new material.
- Sometimes, the user intends to use the new material for some other vis related tasks, such as discovery presentation.
- Annotate (~tag): adding graphical or textual annotations associated with one or more visualization elements
- Record : saving or capturing visualization elements as persistent artifacts
- Derive (= transform): producing new data elements based on existing data elements.
High level Choice: Analyze - Produce - Annotate & Record
- The difference between annotate and record
- The annotate choice attaches information temporality (can be subsequently recorded)
- The record choice saves a persistent artifact (e.g., screen shots, videos, etc.)
- But, it seems that these two are interchangeable in most contexts.
High level Choice: Analyze - Produce - Derive
- The derive goal is to produce new data elements based on existing data elements.
- What can be derived? an attribute or a new dataset
- Let’s recall the InfoVis Reference Model
- How would you derive a new attribute (creating derived attributes)?
- By changing the type of an attribute (can lose some information)
- Grade (O) to score (Q): A+ 98, A0 93, B+ 88, …
- Temperature (Q) to category (N): 30 hot, 20 warm, 0 cold
- By augmenting external data (adding information)
- City name (N) to latitude (Q) and longitude (Q)
- By applying mathematical operations
- Computing the difference between two attributes
- Log scale
- Min-max normalization
- One hot vector encoding
- By changing the type of an attribute (can lose some information)
High level Choice: Analyze - Produce - Derive Example
#### High level Choice: Analyze - Produce - Derive
- Sometimes, we derive an entirely new dataset.
- e.g., reshaping operations in pandas
- Sometimes, we derive an entirely new dataset.
- e.g., group by operations in pandas
- Another example of deriving a new dataset is to change the dataset type.
- e.g., building K Nearest Neighbor (KNN) Graph (a table to a network)
Mid-level Choice: Search
- All of the high level analyze cases require users to search for elements of interest within the vis as the mid level goal.
- Search can be classified into four cases depending on
-
- whether the identity of the search target is known or not and
-
- whether the location of the search target is known or not.
-
- Lookup : looking up human (target) knowing that it belongs to mammals ( location
- Locate : locating rabbits target ) not knowing where it belongs to
- Commonly, we call just this specific task a “search”.
- e.g., search “abc.txt” on your disk
- Browse : browsing all leaves of the mammal subtree location
- Explore : exploring for a family having the largest number of species
- Note: “explore” was a synonym of “discover”.
Low-level Choice: Query
- After searching, you will find a target or set of targets.
- Then, you may want to investigate the targets by querying some information.
- Identify : returns the characteristic of a single target
- Compare : returns the characteristics of multiple targets
- Summarize (= overview): returns a comprehensive view of everything
- Extremely common in vis systems as a startup view!
Low-level Choice: Query Example
- Identify : identifying the election result of one state
- Compare : comparing the election result of one state to another
- Summarize : summarizing the election results cross all states to determine how many favored one candidate
Targets
- So far, we have learned actions (verbs) that users want to perform on vis.
- Targets (nouns) mean some aspect of data that is of interest to users.
- What does your vis do?
- Action + Target
- Discover Trends
- Present Distribution
- Compare Topology
- All data level
- Trends : a high level characterization of data
- e.g., increase, peaks, troughs, plateaus
- Outliers : elements do not fit well with trends
- Features : particular structures of interest
- e.g., clique in graph theory
- Trends : a high level characterization of data
- Attribute level
- Distribution : the distribution of study time
- Extremes : the student who study longest per week
- Dependency : does grade depend on study time
- In general, such a causal relationship is really hard to prove!
- Correlation : are grade and study time positively correlated?
- Similarity : similarity between students in terms of study time grade
- Network datasets: topology and paths
- Spatial datasets: shapes
- The last two pertain to specific types of datasets.
So, How?
- So far, we have learned the terms for describing what to be visualized (data abstraction) and why we visualize (task abstraction)
- Then, how do we visualize data?
- “visualization
- There are many useful idioms and studying those idioms is the main goal of this course!
- Let me give you a brief overview first…
How - Idiom Abstraction
Visualization Analysis Example
- SpaceTree vs. TreeJuxtaposer
- SpaceTree : https://www.youtube.com/watch?v=B4vuSLVCJtw&t=112s&ab_channel=HCILUMD
- TreeJuxtaposer : https://www.youtube.com/watch?v=eIK3ItXyMi0&ab_channel=HCILUMD
Deriving Attribute Example
- When a tree is too big, we need a way to summarize the tree.
- The Strahler Number : a measure for node importance
- Original: 500,000 nodes, Simplified: 5,000 nodes
- Can be described by a chained sequence of two instances
댓글남기기