Objective and Contribution

Introduced a new aspect-based sentiment analysis (ABSA) dataset known as Multi-Aspect Multi-Sentiment (MAMS), where each sentence contains at least two different aspects and two difference sentiment. The proposed MAMS dataset could solve the common issue of existing ABSA dataset, where most sentences contain the same sentiment for different aspects, degenerating ABSA to sentence-level sentiment analysis. The paper also proposed a simple baseline model, CapsNet-BERT, for the dataset.

Dataset Construction

The dataset construction of MAMS is broken down to three steps:

  1. Data Collection

  2. Data Annotation

  3. Dataset Analysis

Data Collection

Annotate Citysearch New York dataset similar to SemEval-2014 dataset. Remove any sentences with more than 70 words.

Data Annotation

Two versions of MAMS dataset are created to tackle two areas of aspect-based sentiment analysis: aspect-term sentiment analysis (ATSA) and aspect-category sentiment analysis (ACSA). For ATSA, we extracted aspect terms in sentences and map them with the appropriate sentiment and remove any sentences with one aspect or multiple aspects with the same sentiment. The dataset also includes the start and end positions for each aspect term. For ACSA, we pre-defined eight aspect categories: food, service, staff, price, ambience, menu, place, and miscellaneous. Each sentence is map to an aspect category along with the appropriate sentiment towards the aspect category. The dataset only includes sentences with at least two unique aspect categories with different sentiment.

Dataset Analysis

ATSA contains 13,854 sentences with an average of 2.62 aspect terms. ACSA has 8,879 sentences with an average of 2.25 aspect categories. Note that all sentences in MAMS contains multiple aspects with different sentiment. Existing ABSA datasets (SemEval-2014 and Twitter) contains no more than 30% of sentences that are multi-aspect multi-sentiment, some even less than 1%.


Given a sentence and an aspect term or an aspect category, we want the model to predict the sentiment of the sentence with respect to the aspects. The proposed model is CapsNet-BERT, which consists of 4 layers:

  1. Embedding layer

  2. Encoding layer

  3. Primary capsule layer

  4. Category capsule layer

Embedding layer

In this layer, we convert the input sentence and aspect into word embeddings. For aspect term embedding, we computed it as the average of the aspect word embeddings. For aspect category embedding, we initialised the embedding randomly and learn during training. The output of the embedding layer is the aspect-aware sentence embedding where we concatenate aspect embedding with each word embedding in the sentence.

Encoding layer

We take the aspect-aware sentence embedding and feed it into Bi-directional GRU with residual connection to get the contextualised representation.

Primary capsule layer

Using linear transformation and squashing activation, we get the primary capsules P using the contextualised representation and aspect capsule using the aspect embedding from the embedding layer. There are two further mechanisms in this layer:

  1. Aspect Aware Normalisation. This is to counter the fact that variation of sentence length cause training to be unstable and so we use aspect capsule to normalise primary capsule weights, to select important primary capsules.

  2. Capsule Guided Routing. This leverages prior knowledge of sentiment categories to improve the routing process. During training, a sentiment matrix is initialised and this is feed into a squash activation to obtain sentiment capsules. The routing weights are then computed by measuring the similarity between the primary capsules and sentiment capsules.

Category capsule layer

Using the primary capsules, aspect-aware normalised weights and capsule-guided routing weights, we can compute the final category capsules. Note that for CapsNet-BERT, the embedding and encoding layer are replaced with pre-trained BERT.

Experiments and Results

There are three evaluation datasets: ATSA, ACSA, and SemEval-2014 restaurant review.

Models Comparison

Models are divided into 4 categories:

  1. LSTM-based

  2. CNN-based

  3. Attention-based

  4. Ablation study to compared the effectiveness of combining CapsNet and BERT and the effect of proposed mechanism


  • As mentioned, sentence-level sentiment classifier (TextCNN and LSTM) performed competitively in SemEval-2014 but poorly in MAMS datasets

  • The SOTA ABSA method on SemEval-2014 perform poorly or average on the MAMS datasets, indicating the high difficulty level of the MAMS dataset

  • Attention-based models without properly modelling word sequences performed badly in MAMS as they lose sequential information of sentences and so fail to connect the context with the aspect

  • CapsNet outperformed BERT on 4 out of 6 datasets, showing the strength of CapsNet. The combination of CapsNet-BERT outperformed all models in all datasets

  • CapsNet-DR and CapsNet-BERT-DR are included to measure the effectiveness of the capsule guided routing. We use the standardised dynamic routing (DR) which reduces the performance of the model and underperformed our CapsNet-BERT



Data Scientist

Leave a Reply