Objective and Contribution
Proposed TANR, a neural news recommendation system with topic-aware news embeddings. This consists of a topic-aware news encoder and a user encoder. The news encoder uses CNN networks and attention mechanism to select important words using the title of the news. We jointly train the news encoder and an auxiliary topic classification task. For user encoder, we learn the representation through historical news that the user has read and use attention mechanism to select informative news for the user. The results show that our approach improve the performance of news recommendation.
The model architecture consists of three main modules:
The objective of the news encoder is to learn news representations from the titles. There are three layers. The first layer is a word embedding layer, which convert the words of the title and into word embeddings. The second layer is the CNN layer, which takes in the word embeddings and output contextual word embeddings by capturing local contexts information. The final layer is the attention layer, which allows the model to pay attention to more important words in the title. This layer generates the final news representation, which it’s the weighted sum of all the contextual word embeddings.
The objective of the user encoder is to learn users representations from historical browsed news. The idea is that historical browsed news allow us to capture different information / preference about a particular user. We used the news encoder to encode all the historical browsed news and obtain the news representations. The user encoder takes in these news representations and apply an attention mechanism over it, to select key news that give us better information about the user. The final user representation is the weighted sum of all the user’s historical browsed news representations.
The objective of the click predictor is to predict the probability that the user will click on a candidate news. The click predictor takes in the candidate news representation and the user representation and compute the click probability score by taking the inner product between the two representations.
Topic-aware news encoder
The topic of the news article is important for news recommendations and so including topic information would improve the news and users representations. However, we have limited topic information and so we decided to jointly train our news encoder with news topic classification model as shown below. This gives us a topic-aware news encoder. The news topic classification model consists of a news encoder and a topic predictor module. The news encoder is shared with the news recommendation model and the topic predictor module is used to predict the topic distribution (using softmax) from news representation. With the shared news encoder, the news encoder will encode topic information and be used by the news recommendation model. Jointly training both news recommendation and topic classification tasks means we have two losses to optimise. The total loss is the sum of these two losses.
Experimental Setup and Results
The real-world dataset is one month of MSN News. The statistics of the dataset and topic distributions are shown below. The evaluation metrics are AUC, MRR, nDCG@5, and nDCG@10.
LibFM. Matrix factorisation technique for recommendation
DSSM. Uses historical browsed news as query to retrieve candidate news
Wide&Deep. Wide linear channel + deep neural network
DeepFM. Uses different factorisation machines and neural networks
DFM. Combine different level of dense layers and use attention mechanism
DKN. Uses entity information in knowledge graphs
TANR-basic. TANR without topic-aware news encoder
Neural network models outperform traditional matrix factorisation technique as expected as neural network can learn better news and user representation. Both TANR-basic and TANR outperformed all baseline models. TANR consistently outperformed TANR-basic showcasing the benefit of incorporating news topics for news recommendation and the effectiveness of our strategy of jointly training our models.
In terms of the performance of our topic classifier, the F1 results are shown below. The classification of different topics are good except for the “kids” class. This might be due to the limited training data for the “kids” class. Overall, the results show that our news-encoder has encoded topic information and this improves the results of our news recommendation model.
In figure 6, we showcase the results of using different attention networks. The results has show us that both news-level and word-level attentions are useful as they both outperformed the baseline of no attention network. This goes back to the hypothesis that different news contain different information about the user and different words have different importance in representing the news and our attention networks allow us to pick the most informative news and important words. Combining both these attention networks yield a higher end results.
Lastly, we investigated the influence of hyperparameter lambda. This hyperparameter controls the relative importance of the topic classification task as it determines how much the model focuses on optimising for the topic classification loss function. The results shown below tells us that if lambda is too low, the performance of our model is not optimal as the news encoder hasn’t learn enough topic information. If lambda is too high, the model would focus too much on the topic classification task and neglected the news recommendation task. The optimal lambda seems to be 0.2.