 #### Decoding: The Viterbi Algorithm

###### What is the decoding task?

The task of determining which sequence of variables is the cause of some observation sequences. For example, given a sequence of ice-cream observations and an HMM, the decoding task is to determine the best hidden weather sequence.

###### What is a naive idea of finding the most probable sequence of states?

A naive approach is where for each possible hidden state sequence, we could run the forward algorithm and compute the likelihood of observation sequence given that hidden state sequence. We would then select the highest likelihood. This approach is non-scalable as the run time would grow exponentially as the number of state increases.

###### What’s the Viterbi algorithm?

Viterbi algorithm is the most common decoding algorithm for HMMs. It’s a dynamic programming algorithm and strongly resembles the minimum edit distance algorithm. Below is an example of the Viterbi trellis for computing the best hidden state sequence. The idea here is to process the observation sequence left to right, filling out the trellis. Each cell represents the probability that the HMM is in state j after seeing the first t observations. The value of each cell is computed by recursively taking the most probable path that could lead us to that cell. The three factors that are used to compute the Viterbi probability at time t are:

1. previous Viterbi path probability

2. transition probability from previous state to current state

3. state observation likelihood of observation symbol given the current state The Viterbi algorithm is identical to the forward algorithm except it takes the max over the previous path probabilities whereas the forward algorithm takes the sum. The Viterbi also has backpointers that the forward algorithm doesn’t have.

#### HMM Training: The Forward-Backward Algorithm

###### What does learning mean in HMM?

Learning means given an observation sequence and the set of possible states in HMM, learn the HMM parameters A and B. The input would be unlabelled sequence of observations and a vocabulary of potential hidden states.

###### What’s the standard algorithm for HMM training?

The standard algorithm is the Baum-Welch algorithm, which it’s a special case of Expectation-Maximisation (EM) algorithm. This algorithm will train both the transition probabilities and the emission probabilities of the HMM. EM is an iterative algorithm that computes the initial probabilities estimates and use those estimates to compute better estimate, therefore iteratively improving the probabilities.

##### Source: https://web.stanford.edu/~jurafsky/slp3/A.pdf Data Scientist