Objective and Contribution

Introduce an attention-over-attention (AOA) neural network for aspect-based sentiment analysis. The AOA module jointly learns the representations for aspects and sentences and explicitly captures the interaction between aspects and context sentences. The results on laptop and restaurant datasets outperforms previous LSTM-based architectures.


Experimented with two domain-specific datasets from SemEval 2014 Task 4: laptop and restaurant. Accuracy is the evaluation metric. Dataset summary is shown in the figure below:


In this task, we are given a sentence and an aspect target and our goal is to classify the sentiment polarity of the aspect target in the sentence. There are 4 main components in the architecture shown below: word embedding, bi-LSTM, attention-over-attention (AOA), and final prediction.

Word embedding and bi-LSTM

Word embedding is a standardised step where we convert the text sentence and aspect target into its numerical representations. Nothing special here. Once we get the word vectors, we feed them into two bi-LSTM respectively, to learn the hidden semantics of words in the sentence and aspect target.

Attention-over-Attention (AOA)

The next step is to calculate the attention weights for the text using the AOA module. Here’s the following steps:

  1. Calculate a pair-wise interaction matrix between the two hidden states, where the value of each entry represents the correlation of a word pair between sentence and target

  2. Perform column-wise softmax to get \(\alpha\), target-to-sentence attention

  3. Perform row-wise softmax to get \(\beta\), sentence-to-target attention

  4. Calculate the column-wise average of \(\beta\) to get a target-level attention \(\bar{\beta}\), which tells us the important parts in an aspect target

  5. The final sentence-level attention \(\gamma\) is the weighted sum of each individual target-to-sentence attention \(\alpha\) as follows: \(\gamma = \alpha\bar{\beta}^T\).

Final prediction

The final sentence representation is a weighted sum of sentence hidden states using sentence attention from AOA module, as follows: \(r = h_s^T\gamma\). This final sentence representation is feed into a linear layer with a softmax function to output probabilities of sentiment classes. The sentiment class with the highest probability is the predicted label for the sentence, given the aspect target.

Experiments and Results

Model comparisons
  • Majority
    • Simple baseline that assigns the largest sentiment polarity in the training set to each sample in the test set

  • LSTM
    • Uses two LSTM to model the preceding and following contexts surrounding the aspect term

    • Models the sentence via LSTM and combine the output hidden states with the aspect term embedding to generate attention vector

    • Extends AT-LSTM by appending the aspect embedding into each of the word vector

  • IAN
    • Uses two LSTM to model the sentence and aspect term respectively. It uses the hidden states from the sentence to generate an attention vector for the target and vice versa


  • AOA-LSTM performed the best in comparisons to other baseline methods according to the result table

  • We also included the table that showcase which word contributes the most to the aspect sentiment polarity by visualising the sentence attention vectors \(\gamma\).

Conclusion and Future Work

In the error analysis, there are cases that the model can’t handle efficiently. One is complex sentiment expression. Another is uncommon idioms. In future work, we could incorporate sentences’ grammar structures or feed prior language knowledge to the AOA neural network.



Data Scientist

Leave a Reply