Objective and Contribution

Released 4 Clinical BERT models: Clinical BERT and Discharge Summary BERT trained on both generic clinical text and discharge summaries respectively. In addition, for each type of BERT, we either initialised it from BERT-Base or BioBERT. We shown that domain-specific model improve the performance on two clinical NER tasks and one medical NLI task when compared to general and/or BioBERT embeddings, however, Clinical BERT and Discharge Summary BERT underperformed in de-identification (de-ID) tasks.

Note: BioBERT is trained on PubMed corpus which consists of list of biomedical research articles.


We used MIMIC-III dataset for our clinical text which has approx. 2 million notes. As mentioned above, Clinical BERT was train on all types of notes whereas Discharge Summary BERT is train on discharge summaries only.

In terms of our BERT training, we trained two BERT models: Clinical BERT and Clinical BioBERT. Clinical BERT was initialised from BERT-Base whereas Clinical BioBERT was initialised from BioBERT. We applied all four of our clinical BERT models to 5 different tasks and compared their performance to BERT and BioBERT:

  1. MedNLI

  2. I2b2 2006

  3. I2b2 2010

  4. I2b2 2012

  5. I2b2 2014


Clinical BioBERT outperformed other embedding models in three of the five tasks. Specifically, in MedNLI, clinical BioBERT achieve SOTA results by a wide margin. However, on two de-ID tasks (2006 and 2014), vanilla BioBERT outperformed all our clinical BERTs. This is to our expectations as de-ID data distribution is different than that of MIMIC.

The table below showcase the nearest neighbours for words in three different categories: Disease, Operations, Generic. We show that Clinical BERT was able to capture more cohesive terms around medical and clinical operations than BioBERT. For example, the word “admitted”, in BioBERT, we have “sinking” shown as a similar word which it’s irrelevant whereas with clinical BERT, all three nearest neighbours terms are relevant.

Conclusion and Future Work

MIMIC only contains notes from a single healthcare institution. There are major differences in care practices across institutions and it would be beneficial to use notes from different institutions.



Data Scientist

Leave a Reply