Overall, Bayesian networks uses products of smaller, local conditional probability distributions to represent probability distributions. This is only possible through assuming that some variables are independent.
These independencies between variables can be described by directed graphs by assessing three types of structures:
- Common parent
In the example where we have a Bayesian net G that has 3 nodes: A, B, and C. In this case, G essentially has 3 possible structures, of which each leads to a different independence assumptions. In the case of common parent, G has the form \( A \leftarrow B \rightarrow C \). If B is observed, \( A \bot C | B \). If B is unobserved, then \( A \not\perp C \). This is because in this form, B contains all the information that determines the outcomes of A and C and so if B is observed, there is nothing else that affect these variables’ outcomes.
In the case of cascade, G has the form \( A \rightarrow B \rightarrow C \). If B is observed, \( A \bot C | B \). If B is unobserved, then \( A \not\perp C \). The explanation is the same as the common parent case.
In the case of V-structure, G has the form \( A \rightarrow C \leftarrow B\). If C is observed, A and B are coupled. This means \( A \bot B \) if C is unobserved and \( A \not\perp B | C \) if C is observed. In this case, A and B are not independent given C. For example, C is a Boolean variable of whether the grass is wet or not, A is whether it has rained and B is whether the sprinkler is turned on. If we observed that the grass is wet (C is true) and that sprinkler is not turned on (B is false), then we can deduce that probability that it has rained (A is true) is one.
What is d-separation?
Q and W are d-separated given O (observed) if Q and W are not connected by an active path. There are rules that uses the consecutive triple of variables form to tell us whether a path is active or not. Please see figures below as examples:
Representational power of directed graphs
A directed graphs can express independencies of any distribution p but can it express ALL the independencies? Given a distribution p, can we construct a graph G such that I(G) = I(p)? where I(G) is a set of variables that are d-separated in G and I(p) are all the independencies in the distribution p. Unfortunately, no.