8 분 소요

image


The Big Picture

  • What data does the user see? (data abstraction)
  • Why does the user use the system? (task abstraction)
  • How are the visual encoding and interaction idioms constructed? (idiom abstraction)


image


image


What - Data Abstraction

image


Why Data Abstraction?

  • There are an infinite number of datasets.
  • Thus, it would be inefficient to design a visualization system for every dataset.
  • We will categorize and characterize data types that a visualization system aims to visualize.
  • Insights gained from such analysis can be used to design a visualization system in the future.


Why Data Semantics and Types?


1, 4.5, -3, 10001, 2, 0

  • What does this sequence of six numbers mean?
  • Multiple interpretations are possible.


Basil, 7, S, Pear

  • Same for this record


  • The type of the data is its structural or mathematical interpretation.
  • The semantics of the data is its real-world meaning.
  • Metadata: additional data required for correctly interpreting data
    • Types
    • Semantics
    • Syntax of a data file (e.g., CSV, TSV, or JSON)


Data, Dataset, and Attributes

  • We will learn three concepts: data types, dataset types, and attribute types.
  • Data types: what kind of thing is the data?
    • e.g., an item, a link, or an attribute
  • Dataset types: how are these data types combined into a larger structure?
    • e.g., a table, a tree, or a field of sampled values
  • Attribute types: what kinds of mathematical operations are meaningful for an attribute?
    • e.g., quantity, category, $\dots$


Data Types

  • Five Basic Data Types: Items, Attributes, Links, Positions, and Grids
  • An item is an individual entity that is discrete.
    • e.g., a row in a simple table or a node in a network.
  • An attribute is some specific property that can be measured, observed, or logged.
    • Sometimes, called variable or dimension
    • e.g., salary, price, or number of sales
  • A link is a relationship between items typically within a network.
    • e.g., marriage relationship
  • A grid specifies the strategy for sampling continuous data in terms of both geometric and topological relationships between its cells.
  • A position is spatial data, providing a location in two-dimensional (2D) or three-dimensional (3D) space.
    • e.g., a latitude–longitude pair describing a location on the Earth’s surface
    • e.g., three numbers specifying a location within the region of space measured by a medical scanner


Dataset Types

  • Let’s combine these five basic data types.
  • One of the most common dataset type is a table.
  • A table dataset type includes item (rows) and attribute (columns) data types.
  • A network dataset type consists of three data types: items (nodes), links (links), and attributes (attributes of node links).


  • Four dataset types: tables, networks and trees, fields, and geometry
      • clusters, sets, and lists

image


Tables

  • A table is made up of rows and columns.
    • Usually, 2D
  • A row represents an item.
  • A column represents an attribute.
  • A cell is specified by the combination of a row and a column.
    • e.g., stores a value specified by an item and an attibute.
  • A multidimensional table has a more complex structure for indexing into a cell, with multiple keys.


image


Networks and Trees

  • A network is made up of nodes and links and specifies the relationship between two or more nodes.
  • Nodes and links can have attributes independently.
  • Example: social network on Facebook
    • Node: accounts (people, organizations, or pages)
    • Link: friendship (or subscription)
    • Node attributes: name, photo, website_url, …
    • Link attributes: last interaction time, …


image


  • Networks with hierarchical structure are more specifically called trees.
  • In contrast to a general network, trees do not have cycles.
    • Each child node has only one parent node pointing to it.
  • Networks are sometimes called graphs.
    • e.g., graph drawing and graph theory
  • But, the term graphs is also used for charts.
    • e.g., bar graph and line graph
    • It is confusing. So, we will use the term charts for this


  • Two popular visualizations for networks: a node-link diagram and an adjacency matrix.
  • There are a lot of network visualizations!


image


Fields

  • The field dataset type contains attribute values associated with cells.
  • What is the difference between 2D tables and 2D fields?
  • In a 2D field, each cell contains measurements or calculations from an continuous domain.
    • So if you want, you can draw an infinite number of measurements!
    • In a table, rows and columns are discrete.


  • Consider a field dataset representing a medical scan of a human body.
    • This is a 3D field, because our body is continuous.
  • We can determine the resolution of the scan (i.e., granularity)
    • A low resolution (a coarser grid): 64 * 64 * 64 cells
    • A high resolution (a finer grid): 256 * 256 * 256 cells


image


  • Since it is impossible to measure an infinite number of cells, sampling and interpolation techniques are important in the field dataset type.
  • Sampling: how frequently to take the measurements?
  • Interpolation: how to show values in between the sampled points in a way that does not mislead.
  • Interpolating appropriately between the measurements allows you to reconstruct a new view of the data.


  • Grid geometry: the location of cells in space
  • Grid topology: how each cell connects with its neighboring cells


image


SciVis vs InfoVis

  • If we want to visualize a 2D field, an obvious choice for visual encoding would be to keep the spatialization of the data in the visualization.
    • e.g., longitude -> horizontal position, latitude -> vertical position
  • Scientific visualization (SciVis) is concerned with situations where spatial position is given with the dataset.
  • Information visualization (InfoVis) is concerned with situations where the use of space in a visual encoding is chosen by the designer.


Geometry

  • The geometry dataset type specifies information about the shape of items with explicit spatial positions.
    • Items + positions
    • e.g., points, one-dimensional lines or curves, or 2D surfaces or regions, or 3D volumes
  • e.g., cartography

image


Other Dataset Types

  • Set: an unordered group of items
  • List: an ordered group of items
  • Cluster: grouping based on attribute similarity, where items within a cluster are more similar to each other than to ones in another cluster


InfoVis Subfields

  • (Dataset type) + “visualization”
  • Table visualization
  • Network visualization
  • Field visualization (usually, vector or tensor visualization in SciVis)
  • Set visualization
  • Cluster visualization
  • $\dots$


Dataset Availability

  • Any of dataset types can be static or dynamic.
  • The default approach to visualization assumes that the entire dataset is available all at once, as a static file (static datasets, offline).
  • Recently, it becomes more frequent to visualize dynamic datasets that change over the course of the visualization session (dynamic datasets, online).
  • e.g., monitoring, …


Attribute Type

  • The type of an attribute

image


  • Categorical data do not have an implicit ordering (sometimes, nominal or qualitative).
    • But they often have hierarchy structure.
    • e.g., movie genres, file types, and city names
    • Operators: == and !=
  • Ordered data have an implicit ordering.
    • Ordinal data have an ordering but artihmetic is not meaningful (e.g., shirt sizes, ranks)
    • Quantitative data have an ordering and arithmetic makes sense (e.g., length, stock prices)
    • Operators: ==, !=, >, <, and (+ and – only for quantitative data)
  • Quantitative data can be further divided into two types: interval and ratio.
  • In interval data, distances are meaningful but there is no absolute zero.
    • e.g., temperature in Celsius or Fahrenheit
    • Multiplication and division does not make sense. 60°C is not twice as hot as 30°C.
  • In ratio data, distances are meaningful and there is an absolute zero.
    • e.g., temperature in Kelvin
    • 60°K is twice as hot as 30°K.


Summary: Attribute Types

  • Categorical (sometimes nominal or qualitative): movie genres, file types, …
    • Operators: ==, !=
  • Ordinal: shirt sizes, ranks, …
    • Operators: ==, !=, <, >
  • Interval: temperature in Celsius, …
    • Operators: ==, !=, <, >, +, -
  • Ratio: temperature in Kelvin, number of people, …
    • Operators: ==, !=, <, >, +, -, *, /


  • For ordered data, we can consisder the ordering direction.
    • i.e., where is the origin?
  • Sequential: there is a homogeneous range from a minimum to a maximum value, such as height.
  • Diverging: data can be deconstructed into two sequences pointing in opposite directions that meet at a common zero point, such as elevation.
  • Cyclic: the values wrap around back to a starting point rather than continuing to increase indefinitely, such as the day of the week.


  • Color schemes for ordering directions

image


Semantics

  • Two types of attribute semantics: key and value
  • A key attribute acts as an index that is used to look up value attributes.
    • Key: your student ID
    • Value: your name
    • Keys are sometimes called independent attributes or dimensions, and values are sometimes called dependent attributes or measures.
  • Types and semantics are cross-cutting.
    • A categorical attribute can be value attributes, and a quantitative attribute can be key attributes.


image


  • Name is a categorical attribute that might appear to be a reasonable key at first.
  • But it is not a good choice since there are two people (Amy) have the same name.
  • The quantitative attribute of Age and the ordinal attribute of Shirt Size have many duplicates so they are not a good choice.
  • ID can serve as a key attribute.


  • For multidimensional tables or fields, multiple keys are required to look up an item.
    • The combination of all keys must be unique for each item, even though an individual key attribute may contain duplicates!
    • (ID, Name)
  • Multidimensional: data have multiple keys.
    • one-dimensional, two-dimensional, …
  • Multivariate: data have multiple values.
    • univariate, bivariate, …
  • Many people do not separate these two terms but they are DIFFERENT!
    • 다차원 vs 다변량


  • Suppose you measured the temperature of a 3D space.
  • You have three keys: x, y, and z
  • For each cell, you have one value (temperature)
  • So, you have 3 dimensions and one attribute for each cell!
    • Three-dimensional univariate dataset!


image


  • Keys: one-dimensional, two-dimensional, three-dimensional, …, multidimensional
  • Values: univariate (scalar), bivariate (vector), trivariate (tensor), …, multivariate
  • Measuring the wind direction at some locations in a region: twodimensional (lat and long) vector fields (the direction of the wind)
  • Can you imagine a 4-dimensional univariate dataset?


Summary: Data Abstraction

image

댓글남기기