What is zero-shot learning?
Zero-shot learning means getting a model to do a task that it wasn’t explicitly trained to do. This means that the model hasn’t seen any task-specific data.
What is the latent embedding approach to zero-shot learning?
This involves classifying sequences using the cosine similarity between the sequence and all the possible class names. We could use sentence-BERT to encode the sentence and labels together! The problem with using sentence-BERT is that sentence-BERT is designed for effective sentence-level representations. This means that our label embeddings might not be as well capture when compared to using word-level embeddings such as word2vec.
How can we potentially fix the issue above?
Learn a least-squares linear projection to be used as additional transformation to S-BERT embeddings for both sequences and labels. We can train the linear projection as follows:
Take the top K most frequent words in the word2vec vocab
Obtain embeddings for each word using word2vec, model_word
Obtain embeddings for each word using S-BERT, model_sent
Learn the linear projection matrix Z with L2 regularisation from model_sent to model_word
What is classification as natural language inference?
Natural language inference (NLI) is the task where we have two sentences: premise and hypothesis and we want to classify whether the hypothesis is true (entailment) or false (contradiction) given the premise. When using transformers, the NLI are modelled as a sequence-pair classification where we feed both the premise and hypothesis through the model and classify into either contradiction, neutral, or entailment. This could be easily be achieved using the HuggingFace library.
If you have some annotated data where you have data only for some classes and not all the classes, then you can train the model by passing the sequence into the model twice, once with the correct label and the other with the randomly selected wrong label.
Describe classification as a cloze task.
A cloze task is the task of filling the missing values (blanks) from the sequence text. A pre-trained language model is used to choose the most likely value for the blank space among the possible class names. More on this in future blog post!