[Artificial Intelligence] Bayesian Networks

5 분 소요

Bayesian Networks

Introduction

Boolean Random Variables

We deal with the simplest type of random variables – Boolean ones
Take the values true or false
Think of the event as occurring or not occurring
Examples (Let A be a Boolean random variable):

A = Getting heads on a coin flip A = It will rain today A = There is a typo in these slides

Probability : Random Variables

A random variable is the basic element of probability
Refers to an event and there is some degree of uncertainty as to the outcome of the event
For example, the random variable A could be the event of getting a heads on a coin flip

Bayesian Networks

Bayesian networks help us reason with uncertainty
In the opinion of many AI researchers, Bayesian networks are the most significant contribution in AI in the last 10 years
They are used in many applications e. g : – Spam filtering / Text mining – Speech recognition – Robotics – Diagnostic systems

Probability (확률)

We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an outcome would be obtained if the process were repeated a large number of times under similar conditions*

The sum of the red and blue areas is 1

Conditional Probability (조건부 확률)

$P(A = true B = true)$ = Out of all the outcomes in which B is true, how many also have A equal to true
Read this as: “Probability of A conditioned on B” or “Probability of A given B”

Joint Probability (결합 확률)

We will write P(A = true, B = true) to mean “the probability of A = true and B = true”
Notice that:

The prior(사전) (unconditional) probability of an event a is the degree of belief accorded to it in the absence of any other information. P(Suit = hearts) = P(hearts) = 0.25
The posterior(사후) (conditional) probability of a is the degree of belief we have in a, given that we know b: P(a|b) P(hearts| redCard) = 0.5

Chain rule (연쇄 규칙):
- X1(학번): 2015
- X2(전공):IT_Consulting
- X3(인공지능_과목수강): yes -> P(X1,X2,X3) = P(X3|X1,X2) P(X2|X1) P(X1)
Marginalization (한계화)
- X1(인공지능_과목수강): yes
- X2(전공) -> P(X1) = P(X1,X2=정보보호) + P(X1, X2=IT_Consulting) + P(X1,X2=빅데이터)

Joint Probability Distribution

Joint probabilities can be between any number of variables

eg. $P(A = true, B = true, C = true)$

For each combination of variables, we need to say how probable that combination is
The probabilities of these combinations need to sum to 1

Once you have the joint probability distribution, you can calculate any probability involving A, B, and C
Note: May need to use marginalization and Bayes rule
Examples of things you can compute:
- P(A=true) = sum of P(A,B,C) in rows with A=true
- P(A=true, B = true C=true) = P(A = true, B = true, C = true) / P(C = true)

problem

Lots of entries in the table to fill up!
For k Boolean random variables, you need a table of size $2^k$
How do we use fewer numbers? Need the concept of independence

Bayesian Network

A BN is the marriage between probability theory and graph theory and consists of two parts:
Qualitative part
- Random variables (constituting the nodes in the graph).
- Directed edges (showing dependencies between nodes).
Quantitative part
- Conditional probability tables (CPTs) quantifying the effect that any parent nodes have on the node in question.

A Bayesian network is made up of:
- 1. A Directed Acyclic Graph
- 1. A set of tables for each node in the graph

Each node Xi has a conditional probability distribution
P(Xi | Parents(Xi )) that quantifies the effect of the parents on the node
The parameters are the probabilities in these conditional probability tables (CPTs)

Conditional Probability Distribution for C given B

If you have a Boolean variable with k Boolean parents, this table has 2k+1 probabilities (but only 2k need to be stored)

Conditional Independence

The Markov condition: given its parents (P1, P2), a node (X) is conditionally independent of its non-descendants (ND1, ND2)

Two important properties:
1. Encodes the conditional independence relationships between the variables in the graph structure
1. Is a compact representation of the joint probability distribution over the variables

Inference

Using a Bayesian network to compute probabilities is called inference
In general, inference involves queries of the form:

An example of a query would be: P( HasPneumonia = true | HasFever = true, HasCough = true)
Note: Even though HasDifficultyBreathing and ChestXrayPositive are in the Bayesian network, they are not given values in the query (i.e. they do not appear either as query variables or evidence variables)
They are treated as unobserved variables

Propositions
B: the battery is charged
L: the block is liftable
M: the arm moves
G: the gauge indicates that the battery is charged

Causal (top down) inference

L: evidence, M: query node
L: cause of M → causal reasoning
P(M L) = P(M,B L)+P(M,~B L) / marginal probability
P(M L) = P(M B,L)P(B L)+P(M ~B,L)P(~B L) / chain rule
P(B L) = P(B) from structure / M.con.
P(~B L) = P(~B)
P(M L) = P(M B,L)P(B)=P(M ~B,L)P(~B) = 0.855

How is the Bayesian network created?

Get an expert to design it – Expert must determine the structure of the Bayesian network
- This is best done by modeling direct causes of a variable as its parents – Expert must determine the values of the CPT entries
- These values could come from the expert’s informed opinion
- Or an external source eg. census information
- Or they are estimated from data
- Or a combination of the above
- This is the hardest part for a domain expert when it comes to construction of a BN!
Learn it from data
- This is a much better option but it usually requires a large amount of data
- This is where Bayesian statistics comes in!

Learning Bayesian Networks from Data

Given a data set, can you learn what a Bayesian network with variables A, B, C and D would look like?

Each possible structure contains information about the conditional independence relationships between A, B, C and D
We would like a structure that contains conditional independence relationships that are supported by the data
Note that we also need to learn the values in the CPTs from data

Twitter Facebook LinkedIn

LEE CHANWOO

[Artificial Intelligence] Bayesian Networks

Bayesian Networks

Introduction

Boolean Random Variables

Probability : Random Variables

Bayesian Networks

Probability (확률)

Conditional Probability (조건부 확률)

Joint Probability (결합 확률)

Joint Probability Distribution

Bayesian Network

Conditional Independence

Inference

Causal (top down) inference

Learning Bayesian Networks from Data

공유하기

댓글남기기

참고

[Programming] gRPC란? gRPC와 REST의 차이점

[Python] uv : 패키지 관리 도구

[Python] PEP 8 : Style Guide for Python Code

[Python] PEP 20 : The Zen of Python