Jay Alammar… He writes some of the best online materials on transformers! I learned a lot from his blog posts and I am inspired to be able to produce blog posts of his high quality one day. BERT has been very popular in the NLP space and to be honest, although I was researching into it when it first came out, I didn’t pay too much attention to it. I guess I didn’t want to fall into the hype like everyone else haha Besides, I was more fascinated by the components of BERT, the transformers itself, which I used extensively during my master’s degree.

I wanted to start playing around with BERT which it’s why today’s blog post is all about how to get started with BERT. Who better to go to to learn about BERT than Jay Alammar! Today’s blog post will cover my main takeaways from learning how to use pre-trained BERT to do sentiment analysis.

BERT on Sentiment Analysis

The tutorial uses DistilBert, a smaller and faster version of BERT that supposingly generate similar results as BERT. Here’s the step we need to take to use BERT for sentiment analysis:

  1. Generate sentence embeddings using DistilBert

  2. Train / Test split

  3. Train our classifier (Logistic Regression in the tutorial)

  4. Evaluate our Logistic Regression classifier

Step 2 – 4 is the typical machine learning process and so I won’t be making any notes on that. The only additional step is step 1, which uses DistilBert to generate sentence embeddings to train our logistic regression classifier.

Generate sentence embeddings using DistilBert

In the whole process, we are actually using two models:

  1. DistilBert (HuggingFace)

  2. Logistic Regression (sklearn)

Our text data will first go through the DistilBert model, where it will process the sentences. This involves two sub-steps. It will first tokenise the input sentences using the BERT tokeniser, which it’s a special tokeniser that tokenise the sentences into the format that BERT (or DistilBert) accepts. This involves adding special tokens such as [CLS] and [SEP] and replacing each token with its respective word embedding ids. Once the sentences passed through the BERT tokeniser, we can then pass it into the DistilBert to get our final sentence embeddings.

Once we have our sentence embeddings, we can then proceed as the normal machine learning process. We can split the sentence embeddings into training and test set, train our logistic regression using the training set and evaluate it on the test set.

Note that what comes out of DistilBert is a vector for each input token, where each vector has a dimension of 768. Because this is a sentence classification task, we will only be using the first output vector (the one associated with the [CLS] token). You can find the original tutorial and codes from the source link below 🙂



Data Scientist

Leave a Reply