What is Prodigy?

It’s an annotation tool for AI, Machine Learning, and NLP. An example use case is that you can provide seed words and it will find similar words that you would like to add as your training labels.

What’s the process for building a custom NER model?
  1. Create match patterns

  2. Label the data using the match patterns

  3. Train a temporal NER model

  4. Do more labelling by correcting the NER model

  5. Train a new and better NER model

  6. Perform inference

How does create match patterns work?

Using Prodigy, you can feed in seed words and Prodigy will return a set of similar word vectors. These new words, along with your seed words, will be use as match patterns to label the dataset.

What’s required to label the dataset?

You would need to have a tokeniser, the dataset to be label, the actual label, and the patterns from the previous step. The prodigy will label any words within the dataset that matches your pattern. You also have the flexibility to add in more annotations to improve the dataset.

What’s the command line for doing corrections on labelling?




Data Scientist

Leave a Reply