21.5 Supervised Learning of Word Sentiment

How to use online reviews for supervised learning of word sentiment?

Online reviews associate texts with ratings. These ratings could be positive / negative or on a scale. We can use these review score as supervision as positive words are more likely to appear in positive reviews and vice versa with negative words. This allows us to build a word distribution over the scores where if we have a ten-star system, the sentiment of each word can be represented in a 10-tuple. The association could be a simple raw count or a likelihood.

What is a potts diagram?

It is a visualisation that plots the word scores against the distribution of the rating system. In the figure below, for each word, we plot its word scores over the distribution of the ten-star scoring system. The figures are useful as we can quickly identify strongly positive / negative words by the shape of the graph. A strongly positive words tend to have a low word scores during the lower ratings and higher word scores during the higher ratings, forming a J-shape. You can apply the same diagram to measure affective meaning where instead of positive / negative groups, you have, for example, emphatics and attenuators and see which words are strongly emphatics.

How to find words that are more likely to appear in one type of document over the other?

The traditional method is the log likelihood ratio where you compare the likelihood of the word appearing in corpus x over corpus y. This won’t work well for very rare or very frequent words as the difference in likelihood would either be too small or too large.

Instead, we can use the log odds ratio informative Dirichlet prior method, which computes does a particular word has a higher odds in corpus x or in corpus y? The idea is that we would use a large background corpus to get a prior estimate of what we expect the frequency of each word to be. The final statistics for a word is the z-score of its log-odds-ratio. The z-score takes into account the variance in a word’s frequency as well as uses the background corpus to provide a prior count for words.

21.6 Using Lexicons for Sentiment Recognition

Why would you use lexicons for sentiment detection?

It could be costly to build a sufficient training data to train a supervised sentiment classifer. Instead, we can use lexicons in a rule-based algorithm. The simplest version could be to compute the ratio of positive to negative words and if the document has more positive words, it would be classified as positive. You can specify a threshold where a document is only classified as positive if the ratio is greater than the threshold.

In addition, even if you do have training data, these sentiment lexicons can be used as additional features for your classifier.

Data Scientist