Problem context

This is a multi-series practical implementation of a mini NLP data science project, where we are dealing with the Spotify customer support messages dataset. As with any data science project, the first step is to understand the business context and objective of the project. Spotify (and any other B2C businesses) receives hundreds and thousands of customer support messages every single day. It takes a huge amount of human labour to sift through these support messages and categorise / prioritise them in order for Spotify to derive actionable plans. Our goal is to cluster these messages together and derive general consensus of the customers in order for Spotify to know what needs to be prioritise in product development. For example, if there are two groups of messages, one is complaining about the ads being too long in between musics and the other is suggesting new good-to-have features, then Spotify should prioritise their product development plan towards tackling the first group of messages.

Action plan

We will be using topic modelling and text similarity to cluster the messages together. Once we have cluster the messages together, we will use out-of-the-box sentiment model to compute the sentiment score of all the messages and so subsequently, we will also have the sentiment of each clustered group of messages, which allow Spotify to quickly identify which group of messages to focus on. Having said this, the action plan for this project is as follows:

  1. Exploratory Data Analysis

  2. Text Processing

  3. Text Clustering

  4. Topic Modelling Analysis

  5. Text Similarity

  6. Visualisation

Each of the steps will be covered in the next few blog posts. Stay tune! 🙂



Data Scientist

Leave a Reply