Objective and Contribution

The objective is to improve the discovery of coherent aspects as existing works do not produce highly coherent aspects. The paper presents a novel neural model, attention-based aspect extraction (ABAE) that improves coherence by exploiting the distribution of word co-occurrences using word embeddings. The model also uses an attention mechanism to de-emphasise irrelevant words during training, which further improves the coherence of aspects.

Weakness of conventional LDA
  • Do not directly encode word co-occurrence statistics which are important as they preserve topic coherence

  • LDA models need to estimate a distribution of topics for each document. However, review documents tend to be short and so making the estimation of topic distributions for each document more difficult


Two real-world datasets: Citysearch (restaurant) and BeerAdvocate (beer).

Citysearch corpus

This is a restaurant review corpus. There are 3,400 manually labelled aspects. These annotations are used for evaluation of aspect extraction. There are 6 defined aspect labels: Food, Staff, Ambience, Price, Anecdotes, and Miscellaneous.


Around 1000 reviews (9,245 sentences) are annotated with 5 aspect labels: Feel, Look, Smell, Taste, Overall.

Attention-based Aspect Extraction (ABAE)

The proposed model, attention-based aspect extraction (ABAE), explicitly encodes word-occurrence statistics using word embeddings, uses dimension reduction to extract the most important aspects, and uses an attention mechanism to remove irrelevant words to further improve coherence of the aspects.

The ultimate goal is to learn aspect embeddings and map it onto the embedding space. Below is the figure of the ABAE architecture as well as highlight of the main steps:

  1. Map each word in our vocabulary to their respective word embeddings. The aspect embeddings are used to approximate aspect words in our vocabulary

  2. Filter away non-aspect words using attention mechanism and construct a sentence embedding \(z_s = \sum_{i = 1}^{n}a_ie_{wi}\). \(a_i\) tells the model how much it should focus on word i in order to capture the main aspect of the sentence

  3. Reconstruct the sentence embedding as a linear combination of aspect embeddings from T (aspect embedding matrix) as follows: \(r_s = T^Tp_t\). \(p_t\) is the weight vector over K aspect embeddings and it tells the model how relevant is the input sentence to the related aspect. \(p_t\) is obtained through dimension reduction of \(z_s\) from d dimensions to K (number of aspects) dimensions and softmax non-linearity. This process of dimension reduction and reconstruction preserves most of the information of the aspect words in the embedded aspects

The training objective is to minimise the reconstruction loss. In other words, the model will aim to minimise the difference between \(r_s\) and \(z_s\).


Baseline models
  • LocLDA: Standard LDA. Each sentence is treated as a separate document

  • K-means: Initialise aspect matrix using k-means centroids of the word embeddings

  • SAS: A hybrid model that extracts both aspects and aspect-specific opinions

  • BTM: A biterm topic model that is designed for short texts. It alleviate the data sparsity problem in short documents by directly modelling the generation of unordered word-pair co-occurrences over the corpus


The authors set the number of aspects for both the restaurant and beer corpus to 14 based on existing work. They evaluate ABAE on two criteria:

  1. Is it able to find meaningful and semantically coherent aspects?

  2. Is it able to improve aspect detection performance on real-world review datasets?


Aspect Quality

The inferred aspects are more fine-grained than the gold aspects. For example, it can distinguish main dishes from desserts. To evaluate the quality of aspects, we use coherence score. A higher coherence score indicates a better aspect interpretability, and therefore more meaningful and semantically coherent. Below is figure of the average coherence score of each model for both the restaurant and beer corpus. Two findings: 1) ABAE has outperformed previous models and 2) k-means on word embeddings is enough to perform better than all topic models, showing us that word embedding is a strong model for capturing co-occurrence than LDA.

The authors also conducted human evaluation. First, the human judges have to assess how many coherent aspect they are. An aspect is coherent if most of its top 50 terms coherently represent the aspect. The results are as follows and ABAE discovers the most number of coherent aspects.

Secondly, the human judges have to evaluate if the top term of an aspect is correct. The top terms is only labelled as correct if the majority of judges agreed that it reflects the related aspect. This is shown in the figure below.

Aspect Detection

Given a review sentence, ABAE first assigns an inferred aspect label, which it’s then map to the appropriate gold-standard label. The results for the restaurant corpus are shown in the table above. SERBM model in the table has reported SOTA results for aspect detection on the restaurant corpus and ABAE was able to outperformed it for Staff and Ambience aspect.

The results for the beer corpus is shown below. Note that the authors combined the Taste and Smell aspects together as they are both high correlated. ABAE has outperformed all models in all aspects except Taste aspect.

Another finding is that the attention mechanism has proven to be the key factor in driving the performance of ABAE. The table below shows the performance comparisons between ABAE and ABAE-, where ABAE- is the ABAE model without the attention mechanism. ABAE outperformed ABAE- in all aspects, in all metrics.

Conclusion and Future Work

Contrasting to LDA models, ABAE explicitly captures word co-occurrence and overcomes the problem of data sparsity. The experimental results show that ABAE learns higher quality aspects and more effectively in capturing aspects of reviews in comparison to previous models. Based on the paper, this is the first unsupervised neural technique for aspect extraction. ABAE is a simple and effective neural attention model and it scales up well.



Data Scientist

Leave a Reply