Now that we have completed the TopicModelling class, we are ready to use the TopicModelling class and VaderSentiment to derive new features for our dataset and categorise Spotify’s support messages. Below is the code for our topic modelling analysis as well as the output generated from our code.

The output of our code will showcase the following:

  1. New features such as Sentiment, Dominant Topic, Topic Contribution, and related Topic Keywords

  2. Topic breakdown analysis of the number of support messages belonging to each topic

Import Dependencies

import pickle
import pandas as pd
from gensim import corpora, models
from text_clustering import TopicModelling

Topic Modelling Analysis

# Read dataset
clustered_dataset = pd.read_csv('clustered_data.csv')
print(clustered_dataset.head())

# Topic analysis
topic_analysis_df = clustered_dataset['dominant_topic'].value_counts().reset_index().rename(columns = {'index': 'dominant_topic', 
                                                                                                        'dominant_topic': 'topic_counts'})

topic_analysis_df['topic_contribution'] = topic_analysis_df['topic_counts'].apply(lambda x: round(x / topic_analysis_df['topic_counts'].sum(), 4))

unique_topic_keywords_df = clustered_dataset[['dominant_topic', 'topic_keywords']].drop_duplicates(subset = 'dominant_topic', keep = 'first')

topic_breakdown_df = pd.merge(topic_analysis_df, unique_topic_keywords_df, on = 'dominant_topic')
topic_breakdown_df = topic_breakdown_df[['dominant_topic', 'topic_keywords', 'topic_counts', 'topic_contribution']]
print(topic_breakdown_df)

Output of Topic Modelling Analysis

Ryan

Ryan

Data Scientist

Leave a Reply