 #### Likelihood Computation: The Forward Algorithm

###### What’s the likelihood computation?

It’s the computation of the probability of a particular observation sequence given the HMM. In a markov chain, the surface observations are the same as the hidden and so the probability computation are easier to compute by multiplying the probabilities of the arcs. In HMM, it’s more challenging because we don’t know what the hidden sequence is!

But let’s assume we know the hidden event of the weather states (for simplicity), it’s now easier to compute the output likelihood as each hidden state can be use to produce a single observation. The sequence of the hidden states and the observation sequence have the same length. The computation of a possible hidden state sequence is shown below. In reality, we don’t know the actual hidden state (weather) sequence. Therefore, we would need to model the probability of the observation sequence by summing over all the possible weather sequences, weighted by their probability. This involves:

1. Compute the joint probability of a particular weather sequence Q and generating a particular sequence O. This is shown in the figure below.

2. Compute the total probability of the observations by summing over all possible hidden state sequences. If we have 3-event sequences, then there are eight possible hidden sequences and so we would sum up all the event sequences for a particular observation sequence. There are N^T possible sequences with an HMM with N hidden states and T observations! When N and T becomes very large, it’s practically infeasible to compute the total observation likelihood by summing all the different hidden state sequences. Instead we can use forward algorithm.

###### What is forward algorithm?

It’s a dynamic programming algorithm that uses previous predicted values to generate future values to build up the whole probability of the observation sequence. The forward algorithm computes the observation probability by summing over all the probabilities of possible hidden state paths (relevant to the observation sequence) using the forward trellis. Figure below is an example. Each cell of the forward algorithm represents the probability of being in that state after seeing the first t observations, given the automaton lambda. There are three factors to compute each cell:

1. Previous forward path probability from previous time step

2. Transition probability from previous state to current state

3. State observation likelihood of observation given current state

Below is the pseudocode of the forward algorithm. ##### Source: https://web.stanford.edu/~jurafsky/slp3/A.pdf Data Scientist