The take-away of this article is that the sentence embeddings from BERT tend to lead to underperformance. To alleviate this problem, we can fine-tune BERT to create sentence-BERT!

Generally, BERT is good at generating contextualised word embeddings. However, there are many NLP tasks that require us to generate sentence embeddings. Using BERT contextualised word embeddings, we can generate sentence embeddings using two main methods:

  1. Average word embeddings

  2. Use the [CLS] vector (at the beginning of the sentence) as the sentence embedding

However, these two methods of generating sentence embeddings have been shown to underperform in the textual similarity tasks. The solution to this is Sentence-BERT!


We can fine-tune BERT sentence embeddings on a dataset that trains BERT to generate sentence embeddings that are semantically similar to other sentences it is close to. We can use the siamese neural network to accomplish this as shown in the figure below.

During inference, we would compute the sentence embeddings for two sentences we are comparing and compute the cosine similarity to capture the semantic textual similarity.



Data Scientist

Leave a Reply