5 분 소요

image


Bayesian Networks


Introduction


Boolean Random Variables

  • We deal with the simplest type of random variables – Boolean ones
  • Take the values true or false
  • Think of the event as occurring or not occurring
  • Examples (Let A be a Boolean random variable):

A = Getting heads on a coin flip A = It will rain today A = There is a typo in these slides


Probability : Random Variables

  • A random variable is the basic element of probability
  • Refers to an event and there is some degree of uncertainty as to the outcome of the event
  • For example, the random variable A could be the event of getting a heads on a coin flip


Bayesian Networks

  • Bayesian networks help us reason with uncertainty
  • In the opinion of many AI researchers, Bayesian networks are the most significant contribution in AI in the last 10 years
  • They are used in many applications e. g : – Spam filtering / Text mining – Speech recognition – Robotics – Diagnostic systems


Probability (확률)

  • We will write P(A = true) to mean the probability that A = true.
  • What is probability? It is the relative frequency with which an outcome would be obtained if the process were repeated a large number of times under similar conditions*

image

The sum of the red and blue areas is 1


Conditional Probability (조건부 확률)

  • $P(A = true B = true)$ = Out of all the outcomes in which B is true, how many also have A equal to true
  • Read this as: “Probability of A conditioned on B” or “Probability of A given B”


image


Joint Probability (결합 확률)

  • We will write P(A = true, B = true) to mean “the probability of A = true and B = true”
  • Notice that:


image


The prior(사전) (unconditional) probability of an event a is the degree of belief accorded to it in the absence of any other information. P(Suit = hearts) = P(hearts) = 0.25
The posterior(사후) (conditional) probability of a is the degree of belief we have in a, given that we know b: P(a|b) P(hearts| redCard) = 0.5


  • Chain rule (연쇄 규칙):
    • X1(학번): 2015
    • X2(전공):IT_Consulting
    • X3(인공지능_과목수강): yes -> P(X1,X2,X3) = P(X3|X1,X2) P(X2|X1) P(X1)
  • Marginalization (한계화)
    • X1(인공지능_과목수강): yes
    • X2(전공) -> P(X1) = P(X1,X2=정보보호) + P(X1, X2=IT_Consulting) + P(X1,X2=빅데이터)


Joint Probability Distribution

  • Joint probabilities can be between any number of variables

eg. $P(A = true, B = true, C = true)$

  • For each combination of variables, we need to say how probable that combination is
  • The probabilities of these combinations need to sum to 1


image


  • Once you have the joint probability distribution, you can calculate any probability involving A, B, and C
  • Note: May need to use marginalization and Bayes rule

  • Examples of things you can compute:
    • P(A=true) = sum of P(A,B,C) in rows with A=true
    • P(A=true, B = true C=true) = P(A = true, B = true, C = true) / P(C = true)


problem

  • Lots of entries in the table to fill up!
  • For k Boolean random variables, you need a table of size $2^k$
  • How do we use fewer numbers? Need the concept of independence


Bayesian Network

  • A BN is the marriage between probability theory and graph theory and consists of two parts:
  • Qualitative part
    • Random variables (constituting the nodes in the graph).
    • Directed edges (showing dependencies between nodes).
  • Quantitative part
    • Conditional probability tables (CPTs) quantifying the effect that any parent nodes have on the node in question.

image

  • A Bayesian network is made up of:
      1. A Directed Acyclic Graph
      1. A set of tables for each node in the graph

image


image

Each node Xi has a conditional probability distribution
P(Xi | Parents(Xi )) that quantifies the effect of the parents on the node
The parameters are the probabilities in these conditional probability tables (CPTs)


  • Conditional Probability Distribution for C given B

image

  • If you have a Boolean variable with k Boolean parents, this table has 2k+1 probabilities (but only 2k need to be stored)


Conditional Independence

  • The Markov condition: given its parents (P1, P2), a node (X) is conditionally independent of its non-descendants (ND1, ND2)

image


  • Two important properties:
    1. Encodes the conditional independence relationships between the variables in the graph structure
    1. Is a compact representation of the joint probability distribution over the variables

image


Inference

  • Using a Bayesian network to compute probabilities is called inference
  • In general, inference involves queries of the form:

image


  • An example of a query would be: P( HasPneumonia = true | HasFever = true, HasCough = true)
  • Note: Even though HasDifficultyBreathing and ChestXrayPositive are in the Bayesian network, they are not given values in the query (i.e. they do not appear either as query variables or evidence variables)
  • They are treated as unobserved variables

image


  • Propositions
  • B: the battery is charged
  • L: the block is liftable
  • M: the arm moves
  • G: the gauge indicates that the battery is charged


image


Causal (top down) inference

  • L: evidence, M: query node
  • L: cause of M → causal reasoning
  • P(M L) = P(M,B L)+P(M,~B L) / marginal probability
  • P(M L) = P(M B,L)P(B L)+P(M ~B,L)P(~B L) / chain rule
  • P(B L) = P(B) from structure / M.con.
  • P(~B L) = P(~B)
  • P(M L) = P(M B,L)P(B)=P(M ~B,L)P(~B) = 0.855


How is the Bayesian network created?

  • Get an expert to design it – Expert must determine the structure of the Bayesian network
    • This is best done by modeling direct causes of a variable as its parents – Expert must determine the values of the CPT entries
    • These values could come from the expert’s informed opinion
    • Or an external source eg. census information
    • Or they are estimated from data
    • Or a combination of the above
    • This is the hardest part for a domain expert when it comes to construction of a BN!
  • Learn it from data
    • This is a much better option but it usually requires a large amount of data
    • This is where Bayesian statistics comes in!


Learning Bayesian Networks from Data

  • Given a data set, can you learn what a Bayesian network with variables A, B, C and D would look like?


image


image


  • Each possible structure contains information about the conditional independence relationships between A, B, C and D
  • We would like a structure that contains conditional independence relationships that are supported by the data
  • Note that we also need to learn the values in the CPTs from data

댓글남기기