Objective and Contribution
Presented OPINIONDIGEST, an abstractive opinion summarisation framework that uses Aspect-based Sentiment Analysis (ABSA) to extract opinion phrases from reviews and trains a Transformer to reconstruct the original reviews from these extracted opinion phrases. We merge opinion phrases from multiple reviews and select the most popular ones at summarisation time. OPINIONDIGEST can also generate summaries based on different user needs, by filtering selected opinion based on the user’s aspect and / or sentiment. We show that our model was able to outperformed other baseline models and produces informative customisable summaries.
For each entity, our goal is to generate abstractive summaries of the most salient opinions in the relevant reviews. OPINIONDIGEST has three main components:
Controllable Opinion Selection
Here, we used a pre-trained tagging model to obtain our opinion set that contains extracted opinion phrases and its respective polarity and aspect categories.
For each review that contains the entity, we want to extract the entity’s opinion set. Summarising the opinion sets of each entity involves selecting the most salient opinions. Here, we have three operations: opinion merging, ranking, and filtering (optional). Opinion merging is where we used greedy algorithm and word embeddings to merge similar opinions (measured using cosine similarity) into different clusters. We would then rank these clusters (opinion ranking) based on size where we assume the larger a cluster is, the more popular the opinions are within the cluster. The last operation is opinion filtering (optional) where we control the selection process by filtering opinions based on aspect category or sentiment.
Here, our goal is to reconstruct reviews from our extracted opinion phrases and generate opinion summaries from our selected opinions. We train our transformer model by encoding extracted opinion phrases of a single review and learning to reconstruct the review’s full text. We then used the trained transformer to generate summaries. During summarisation, we would use the selected opinions (concatenated together) as an input to our trained transformer model to generate the respective summary.
Experiments and Results
We have to evaluation datasets: YELP restaurant reviews and HOTEL reviews. The YELP dataset has 624K training reviews whereas the HOTEL has 688K reviews. Our evaluation metrics are ROUGE-1, -2, and -L. We have 3 baseline models: LexRank, MeanSum, and the Best/Worst Review (the highest/lowest average word overlap with input reviews).
The results above on the YELP dataset shows that our framework outperformed all the baseline models. Our framework is not an unsupervised model but we only required labelled data for our opinion extractor model. To further evaluate our framework, we conducted human evaluation to evaluate the quality of our generated summaries. Judges are asked to evaluate the summary based on three criteria:
The results are showcase in the table 2 below and OPINIONDIGEST’s summaries scored the highest in both informativeness and coherence, however, the summaries still need to work on reduce redundancy. We also performed content support study where judges were given 8 input reviews from YELP, and are required to evaluate the extent to which each sentence is supported by the reviews. Table 3 below showcase the proportion of sentences in the summary that are fully, partially and not supported by the input reviews. We show that OPINIONDIGEST was able to generate summaries that have a higher proportion of fully and partially supported sentences.
Lastly, we want to measure how well our framework can generate summaries based on different aspects. We produced aspect-specific summaries and asked the participants to evaluate if the summaries includes the specified aspects exclusively, partially, or not at all. The results are displayed below and we show that OPINIONDIGEST can generate summaries that are exclusively or partially related to specified aspects 89.7% of the time.
Below are example summaries that showcase OPINIONDIGEST’s ability to summaries more 100+ reviews and also how we can control the summaries using opinion filtering of aspects and sentiments. The first two examples showcase the ability of our model to generate summaries based on 8 and 128 reviews. Our framework doesn’t aggregate review representations and so the scalability of our model is not tied down to the input size. Example 3 – 6 showcase how we can control the generated summaries using aspects and sentiments. Example 3 – 4 control the generated summaries using sentiments whereas Example 5 – 6 uses different aspects (staff vs food).