NER Fundamentals

What is NER?

NER stands for Named Entity Recognition and it’s an NLP task that involves identify entities such as people, organisation, country, and so on from text information. For generally entity such as name, location, organisation, and so on, we can use pre-trained NER models such as Stanford NER or Spacy. For domain specific data, you would need to fine-tune your NER model.

What are the two main approaches to NER?
  1. Knowledge-based

  2. Machine learning (Conditional Random Field and Hidden Markov Model)

Describe the knowledge-based approach.

Examples of knowledge-based approach are WordNet and Lexicons (general) and UMLS and MedLEE (specific domains). The advantage of this approach is that it requires little training data. The disadvantage is that creating these lexicons manually can be time-consuming and expensive and it doesn’t allow for information to move across domains.

Describe the machine learning approach.

The ML approach includes Stanford NER, CRF++, Mallet, and NLTK. The good thing about the ML approach is that it reduced human effort in maintaining rules and dictionaries but it requires large high quality of annotated training data.

What’s the difference between Stanford NER and Spacy NER?

The Stanford NER uses CRF sequence modelling algorithm and it requires you to install JAVA whereas Spacy NER uses word embedding strategy with sub word features and CNN with residual connections to ensure the system is highly efficient and accurate.



Data Scientist

Leave a Reply