Objective and Contribution
Proposed an ontology-aware pointer-generator model for radiology report summarisation. Radiology report consists of two main sections: FINDINGS and IMPRESSION (summary). We use domain-specific ontology to improve content selection and this augmentation has shown to outperform SOTA results in terms of ROUGE scores. We also perform human evaluation on how good our generated summaries are in retaining salient information.
What’s the problem with existing abstractive summarisation model on clinical notes?
They suffer from content selection and completeness. It is crucial for the summary of clinical notes to capture all the main diagnoses.
We extend the original pointer-generator network with domain-specific ontology. Here, we link ontologies (UMLS or RadLex) to input clinical text to from a new encoded sequence. We do this by using a mapping function to output concepts if the input token appears in the ontology, otherwise we skip it. We also have a second biLSTM to encode the additional ontology terms and an additional context vector that uses the domain-ontology sequence. This additional context vector acts as additional global information to help the decoding process, which we have modified the decoder to accept the new ontological-aware context vector.
We use two ontologies: UMLS and RadLex. UMLS is a medical ontology that includes different procedures, conditions, symptoms, and many more. We specifically use QuickUMLS to extract UMLS concepts from the FINDINGS section. RadLex contains widely-used ontology of radiology terms and it’s organised in a hierarchical structure. We use exact n-gram matching to identify important radiology entities.
Experiments and Results
Our evaluation dataset is 41066 radiology reports from MedStar Georgetown University Hospital. Our evaluation metric is ROUGE scores. Our baseline models are LSA, LexRank, vanilla pointer-generator model (with copy mechanism), and background-aware PG (encodes BACKGROUND section of the radiology report to improve summarisation).
Our ontology-aware models outperformed the other baseline models on both the dev and test set, with the RadLex ontology model achieving the highest performance. Both LexRank and LSA significantly underperformed, indicating that the most central sentences might not be important for the IMPRESSION summary. We also examine the attention weights to assess if our ontology-aware decoder was able to attend more radiology terms. Figure 2 below showcase the attention weights of two samples. Relative to vanilla PG, our model was able to attend more radiological terms within the FINDINGS section, improving the summary generation process.
We have a expert radiologist to manually evaluate 100 reports where each report has the radiology FINDINGS, the manually-written summary, PG-generated summary, and RadLex-ontology-regenerated summary. The radiologist score each IMPRESSION according to the following factors from 1 (worst) – 5 (best):
The results are displayed below and show that our RadLex-generated summary improves completeness and accuracy while maintaining readability. There is a net gain of 10 reports that shown improvement in completeness between score 3 and 4. Our approach also has significantly less critical errors where only 5% of our generated summaries were scored 1 and 2. The radiologist also evaluated our RadLex-generated summaries independently to identify how the summaries could improve further. Our RadLex-generated summaries was able to select important points that are related to the RadLex terms and in some occasions, it was able to identify important points that the radiologist has missed. One problem is that our approach still generates repetitive sentences sometimes which we could mitigate by doing further pre-processing such as removing repetitive n-grams. Another problem is that our approach occasionally mix up the details leading to factual inconsistent, indicating that we could spend more time further developing our attention mechanism.