[Artificial Intelligence] Bayesian Networks
Bayesian Networks
Introduction
Boolean Random Variables
- We deal with the simplest type of random variables – Boolean ones
- Take the values true or false
- Think of the event as occurring or not occurring
- Examples (Let A be a Boolean random variable):
A = Getting heads on a coin flip A = It will rain today A = There is a typo in these slides
Probability : Random Variables
- A random variable is the basic element of probability
- Refers to an event and there is some degree of uncertainty as to the outcome of the event
- For example, the random variable A could be the event of getting a heads on a coin flip
Bayesian Networks
- Bayesian networks help us reason with uncertainty
- In the opinion of many AI researchers, Bayesian networks are the most significant contribution in AI in the last 10 years
- They are used in many applications e. g : – Spam filtering / Text mining – Speech recognition – Robotics – Diagnostic systems
Probability (확률)
- We will write P(A = true) to mean the probability that A = true.
- What is probability? It is the relative frequency with which an outcome would be obtained if the process were repeated a large number of times under similar conditions*
The sum of the red and blue areas is 1
Conditional Probability (조건부 확률)
-
$P(A = true B = true)$ = Out of all the outcomes in which B is true, how many also have A equal to true - Read this as: “Probability of A conditioned on B” or “Probability of A given B”
Joint Probability (결합 확률)
- We will write P(A = true, B = true) to mean “the probability of A = true and B = true”
- Notice that:
The prior(사전) (unconditional) probability of an event a is the degree of belief accorded to it in the absence of any other information. P(Suit = hearts) = P(hearts) = 0.25
The posterior(사후) (conditional) probability of a is the degree of belief we have in a, given that we know b: P(a|b) P(hearts| redCard) = 0.5
- Chain rule (연쇄 규칙):
- X1(학번): 2015
- X2(전공):IT_Consulting
- X3(인공지능_과목수강): yes -> P(X1,X2,X3) = P(X3|X1,X2) P(X2|X1) P(X1)
- Marginalization (한계화)
- X1(인공지능_과목수강): yes
- X2(전공) -> P(X1) = P(X1,X2=정보보호) + P(X1, X2=IT_Consulting) + P(X1,X2=빅데이터)
Joint Probability Distribution
- Joint probabilities can be between any number of variables
eg. $P(A = true, B = true, C = true)$
- For each combination of variables, we need to say how probable that combination is
- The probabilities of these combinations need to sum to 1
- Once you have the joint probability distribution, you can calculate any probability involving A, B, and C
-
Note: May need to use marginalization and Bayes rule
- Examples of things you can compute:
- P(A=true) = sum of P(A,B,C) in rows with A=true
-
P(A=true, B = true C=true) = P(A = true, B = true, C = true) / P(C = true)
problem
- Lots of entries in the table to fill up!
- For k Boolean random variables, you need a table of size $2^k$
- How do we use fewer numbers? Need the concept of independence
Bayesian Network
- A BN is the marriage between probability theory and graph theory and consists of two parts:
- Qualitative part
- Random variables (constituting the nodes in the graph).
- Directed edges (showing dependencies between nodes).
- Quantitative part
- Conditional probability tables (CPTs) quantifying the effect that any parent nodes have on the node in question.
- A Bayesian network is made up of:
-
- A Directed Acyclic Graph
-
- A set of tables for each node in the graph
-
Each node Xi has a conditional probability distribution
P(Xi | Parents(Xi )) that quantifies the effect of the parents on the node
The parameters are the probabilities in these conditional probability tables (CPTs)
- Conditional Probability Distribution for C given B
- If you have a Boolean variable with k Boolean parents, this table has 2k+1 probabilities (but only 2k need to be stored)
Conditional Independence
- The Markov condition: given its parents (P1, P2), a node (X) is conditionally independent of its non-descendants (ND1, ND2)
- Two important properties:
-
- Encodes the conditional independence relationships between the variables in the graph structure
-
- Is a compact representation of the joint probability distribution over the variables
Inference
- Using a Bayesian network to compute probabilities is called inference
- In general, inference involves queries of the form:
- An example of a query would be: P( HasPneumonia = true | HasFever = true, HasCough = true)
- Note: Even though HasDifficultyBreathing and ChestXrayPositive are in the Bayesian network, they are not given values in the query (i.e. they do not appear either as query variables or evidence variables)
- They are treated as unobserved variables
- Propositions
- B: the battery is charged
- L: the block is liftable
- M: the arm moves
- G: the gauge indicates that the battery is charged
Causal (top down) inference
- L: evidence, M: query node
- L: cause of M → causal reasoning
-
P(M L) = P(M,B L)+P(M,~B L) / marginal probability -
P(M L) = P(M B,L)P(B L)+P(M ~B,L)P(~B L) / chain rule -
P(B L) = P(B) from structure / M.con. -
P(~B L) = P(~B) -
P(M L) = P(M B,L)P(B)=P(M ~B,L)P(~B) = 0.855
How is the Bayesian network created?
- Get an expert to design it
– Expert must determine the structure of the Bayesian network
- This is best done by modeling direct causes of a variable as its parents – Expert must determine the values of the CPT entries
- These values could come from the expert’s informed opinion
- Or an external source eg. census information
- Or they are estimated from data
- Or a combination of the above
- This is the hardest part for a domain expert when it comes to construction of a BN!
- Learn it from data
- This is a much better option but it usually requires a large amount of data
- This is where Bayesian statistics comes in!
Learning Bayesian Networks from Data
- Given a data set, can you learn what a Bayesian network with variables A, B, C and D would look like?
- Each possible structure contains information about the conditional independence relationships between A, B, C and D
- We would like a structure that contains conditional independence relationships that are supported by the data
- Note that we also need to learn the values in the CPTs from data
댓글남기기