What is probabilistic graphical modelling?
It is a branch of machine learning that studies the use of probability distributions to describe the real world and to make useful predictions about it. It involves the combination of both probability and graph theory. Probabilistic modelling is widely used to solve problems in variety of fields such as medicine, natural language processing, vision, and others. My motivation of studying probabilistic graphical modelling is to apply it to the field of natural language processing.
We often use mathematics (in the form of equations) to model the world. The simplest model would be a linear equation in the form of y=βTx, where y is the outcome variable and x are the known variables that we believe could affect the outcome. However, the real world is very complicated and there are a significant amount of uncertainty. In order to deal with this uncertainty, we can model the world in the form of a probability distribution instead, p(x, y).
Given the probabilistic model, we can attempt to answer questions like “given that the house costs $100,000, what is the probability that it has three bedrooms?”. This probabilistic modelling is very important because:
We can’t perfectly predict the future as we don’t have enough knowledge of the world and the world is stochastic
We need to assess the confidence of our predictions
Essentially, probabilistic modelling uses probability and graph theory to give us a systematic way to model uncertainty and answer interesting questions such as:
What are the trade-offs between computational complexity and the richness of a probabilistic model?
What is the best model for inferring facts about the future, given a fixed dataset and computational budget?
How does one combine prior knowledge with observed evidence in a principled way to make predictions?
How can we rigorously analyse whether A is the cause of B, or vice versa?
Why is probabilistic modelling difficult?
One of the main challenges is that probabilities are inherently exponentially-sized objects. This makes simple application of probabilistic modelling of spam classification very difficult to model from both a computational and statistics point of view.
The only way we could manipulate probabilities is by making simple assumptions about their structure. The main simplifying assumption that we make is that of conditional independence among the variables. This is also known as Naïve Bayes assumption as it is naïve to consider all events to be independent to each other! However, given this assumption, we can now write the model probability as a product of all its factors! The entire distribution is parametrised by O(n) parameters, which we can tractably estimate from data and make predictions.
The coolest part is that we can use directed and undirected graph to present these probabilities and assumptions!