What is bptt?
It is back propagation through time. It indicates how many steps of history we are considering.
What is batch normalisation?
It is used to normalise the activations of each layer to avoid the numbers getting really big or really small (unstable) during training. In today’s neural network, batch normalisation is often included in each layer after the non-linearity function has been applied.
How to deal with seq2seq when the input and targets are in different lengths?
We use padding to ensure that the sequence lengths are the same.
What are the strengths and weaknesses of BLEU?
It’s fast and easy to compute
It’s widely used, allowing you to compare your model to other baseline models easily
It doesn’t capture meaning. Sentences with different meanings can have the same BLEU score
It doesn’t consider sentence structure
It doesn’t handle morphologically rich languages like Turkish well
It doesn’t correlate well with human judgements
What is teacher forcing?
Teacher forcing is the concept of feeding the decoder the real targets instead of its previous predictions during training. This is to speed up the training process as our models are more likely to make wrong predictions at the beginning and if it uses the wrong predictions to make future predictions, it will probably not give us the right translation. We would always do teacher forcing at the beginner and gradually reducing the level of teacher forcing as we train the model for more epochs.