Two things to focus on when training your own NER model.

  1. Improving accuracy with transfer learning

  2. How do I get training data?

We will focus on the latter. How can we get training data? The two biggest challenges in extending existing NER models: finding good representative examples and extracting the relevant candidates from those examples. What are the potential solutions?

  1. Using SpaCy to help identify incorrect labels and add those as training examples>

  2. Rule-based matcher

If you already know the kind of candidates for your new entity type, then you can potentially use rule-based matcher to extract these candidates and assign it to the new entity type. The SpaCy rule-based matcher can be useful for doing this as it gives you the start and end position index of the candidate within the document.

Check out the SpaCy’s annotation scheme here:

Preparing the SpaCy’s format once you got the annotated data

SpaCy provides scripts for both updating NER and training an additional entity type to existing NER which you can find in the links below. Here are the following steps for updating existing NER:

  1. Load the NER model. This is your starting point. It could be a pretrained NER model or an empty model. If you are using existing NER model, remember to disable other components in the SpaCy pipeline to avoid training them

  2. Load, shuffle, and loop over your training data to train your data

  3. Save the trained model to disk

  4. Test your new model

And here are the steps for adding a new additional entity type:

  1. Load the NER model

  2. Add the new entity label

  3. Loop over the examples of the new entity label

  4. Save and test the new model



Data Scientist

Leave a Reply