Objective and Contribution

Utilised a simple transformer-based model with relative position representations and copy attention mechanism to generate SOTA results for source code summarisation. We found that the absolute encoding of source code tokens’ position hinders the performance of summarisation whereas the relative encoding significantly improves the performance.

What is source code summarisation?

The objective is to encode source code and generate a readable summary that describes the functionality of the program.


We have two evaluation datasets: the Java and Python dataset from GitHub as shown below. Our evaluation metrics are BLEU, METEOR, and ROUGE-L.


Our proposed model is the vanilla Transformer. We encoded the code and summary as sequence of embeddings. The vanilla Transformer has stacked of multi-head attention and linear transformation layers in the encoder and decoder. We also included the copy attention in the Transformer to allow the model the ability to copy rare tokens from the source code.

Position representations

Here, we explored the absolute position encoding on the sequential order of source code tokens and the pairwise relationship encoding in Transformer. The absolute position encoding aims to capture the order information of source tokens, however, we show that the order information is actually not helpful in learning source code representations and leads to bad summarisation. We found that it is the mutual interactions between tokens that influence the meaning of the source code, which it’s why we explore pairwise relationship encoding. To capture this pairwise relationships between input tokens, we capture the relative positional representations of two position i and j for each token.


As shown below, our full model outperformed all the baseline models. In fact, the base model trained on the dataset without the CamelCase and snake_case code token processing, outperformed all baseline models except on the ROUGE-L metric. Our baseline models didn’t incorporate copy attention mechanism and we shown that the copy attention mechanism does improve the performance of our full model.

Ablation studies

Impact of position representation

Table 3 below showcase the performance of performing absolute position encoding on source and targets. It showcase the decrease in performance when include the absolute position encoding. Table 4 showcase the benefit of learning pairwise relationship between source code tokens. We experimented with different clipping distance and whether we should include the bidirectional information. The performance of different clipping distance are very similar to the performance of our full model and models that include the directional information outperformed the ones that didn’t.

Varying model size and number of layers

Our results below showcase that a deeper model (more layers) performs better than a wider model (more neurons per layer). We suspect that deeper model is more beneficial in source code summarisation as it depends more on semantic information than syntactic.

Qualitative Analysis

Our qualitative example below showcase that the copy attention mechanism enabled model to generate shorter summaries with more appropriate keywords. We observed that frequent tokens in the source code has a higher copy probability when we use relative position representations.

Conclusion and Future Work

A potential future work could be to incorporate the code structure into Transformer and apply it to other code sequence generation tasks such as the generation of commit messages for source code changes.



Data Scientist

Leave a Reply