#### Objective and Contribution

Proposed A2N, a novel attention-based method to tackle knowledge graph (KG) completion task that combines relevant graph neighbourhood of an entity to compute query-dependent entities embeddings. The proposed method perform competitively or outperform current SOTA models on two evaluation datasets and through qualitative probing, we are able to explore how the model jump around the knowledge graph to derive its final reasonings.

The task of KG completion involves filling and inferring the missing entity relationships from the KG. This is often formulated as target entity prediction task whereby given the source entity, and relation, what’s the target entity? And so, given a KG, which consists of many tuples of (s, r, t), whereby s is the source entity, r is the relation and t is the target entity, our objective is to predict the target entity given s and r such that the predicted tuple doesn’t already exist in the graph.

Most embedding-based methods for KG completion involve defining a scoring function for every tuple in the KG. The scoring function can differ but it takes in the embeddings of the source entity, relation, and target entity. In this paper, we used the DistMult scoring function.

#### A2N Model

Our proposed A2N takes in the query and uses a bi-linear attention on the graph neighbourhood of the entity to generate query-dependent entity embedding. This special embedding is then used to score target entities for the query. Below is the figure that showcase an example of how the model score the neighbouring nodes of the same node differently given two different queries.

Here’s the breakdown of each step in A2N:

1. Each graph entity has an initial embedding $$\tilde{e}^0$$ and each relation r has an embedding $$\tilde{r}$$

2. Given the embeddings of entities and relations, we can now encode the neighboring entities and relations into embeddings. The embedding of a neighbour ($$\tilde{n}_i$$) of entity s is computed by a) concatenating the initial entity embedding and relation embedding and b) apply linear transformation to it

3. The model uses the scoring function to compute the attention score $$a_i$$ of each neighboring embeddings and normalised it to obtain the probabilities $$p_i$$

4. Step 3 gives us the probabilities of how relevant each neighboring embeddings are in answering the query. We aggregated these weighted neighboring embeddings to generate query-dependent embedding of entity s, $$\hat{s}$$

5. Lastly, we concatenate the query-dependent embedding with the initial source embedding to create the final source embedding $$\tilde{s}$$

Now that we have obtain the final source embedding, we can use the final source embedding, relation embedding, and the scoring function to score all the possible target entities in the KG. This will gives us a ranked list of potential entities for the particular query.

#### Experimental Setup and Results

There are two KG completion evaluation datasets: FB15k-237 and WN18RR. The evaluation metrics are the Mean Reciprocal Rank (MRR) of the correct entity and the Hits@N which measures the accuracy in the top N predictions.

###### Results

For target-only prediction (table 1), our A2N model significantly outperformed the previous SOTA performance in both datasets in all evaluation metrics. For source and target prediction (table 2), we received mix results. The A2N model outperformed all models in the WN18RR dataset in all metrics except Hits@10. However, in the FB15k-237 dataset, our model has underperformed the ConvE, however, it still achieved competitive performance close to the SOTA.

As shown in the figure above, the model has the ability to attend different neighbouring nodes of the same entity depending on the query and perform multi-hop reasoning. For example, using the neighbouring “places_lived”, the entity is mapped into the relevant embedding subspace and using the scoring function and relation “nationality”, we were able to achieve a high score for the target entity US which it’s our model’s final prediction. Given this example, we have a two-hop reasoning of first about the places_lived and then second about the country of those places. See figure below for more examples.

#### Conclusion and Future Work

The proposed A2N model is interpretable and its size does not depend on the number of entity neighbourhoods. Potential future work could involve applying the methods to attend to textual mentions of entities in addition to graph, to reason jointly about text and knowledge graph.

Data Scientist