Lecture 1 – What is NLP?

This course will cover the following NLP applications

  • Topic Modelling
  • Sentiment Classification
  • Language Modelling
  • Translation

This course follows a top-down teaching approach. In short, focus on what things DO first before diving into what they ARE.

Top NLP debates

  • Norvig vs Chomsky
    • Should you model the underlying mechanism of a phenomena (Chomsky)
    • Or
    • Use machine learning to predict outputs (Norvig)
  • Yann LeCun vs Chris Manning
    • How much linguisitic structure to incorporate into NLP models?
      • Chris Manning – Incorporating more linguistic structure into DL systems is good!
      • Yann LeCun – Believe in the ability of simple powerful NN to perform sophisticated tasks without extensive task-specific feature engineering

Popular NLP python libraries

  • NLTK: very broad and popular NLP library
  • SpaCY: great tokeniser, creates parse trees
  • Gensim: topic modelling and similarity detection
  • PyText
  • FastText: library of embeddings
  • Sklearn: popular python ML library
  • Fastai: fast NN using modern best practices (on top of PyTorch)

Highlights of NLP applications

  • Identify company descriptions that are low quality (too much generic marketing language)
  • Classify legal documents into categories (civil, criminal, contract, family,…)
  • Twitter sentiment of politicians
  • Classify quotes from articles

Ethics issues in NLP

  • Bias
  • Fakery

Course structure

  1. What is NLP?
  2. Topic Modelling with SVD & NMF
  3. Topic Modelling & SVD revisited
  4. Sentiment Classification with Naïve Bayes
  5. Sentiment Classification with Naïve Bayes & Logistic Regression, contd.
  6. Derivation of Naïve Bayes & Numerical Stability
  7. Revisiting Naïve Bayes & Regex
  8. Intro to Language Modelling
  9. Transfer Learning
  10. ULMFIT for non-English Languages
  11. Understanding RNNs
  12. Seq2Seq Translation
  13. Word embeddings quantify 100 years of gender & ethnic stereotypes
  14. Text generation algorithms
  15. Implementing a GRU
  16. Algorithmic Bias
  17. Introduction to Transformer
  18. The Transformer for language translation
  19. What you need to know about Disinformation


Data Scientist

Leave a Reply