Continue on from yesterday, I went through another 30 NLP interview questions and below are the questions I got wrong and learned / consolidated from 🙂

What are true about Topic Modelling?

Topic Modelling is an unsupervised learning technique that uses latent dirichlet allocation (LDA) to group terms into different topics. The number of topics is a chosen parameter and it’s heavily affected by the size of the data. The number of topic terms per topic is not directly proportional to the size of the data.

In Latent Dirichlet Allocation model for text classification purposes, what does alpha and beta hyperparameter represents?

Alpha represents the density of topics generated within documents. Beta represents density of terms generated within topics.

Which of the following technique is not a part of flexible text matching?

The flexible text matching techniques are Soundex, Metaphone, and Edit Distance. The non-flexible text matching technique is Keyword Hashing.

Soundex matches similar-sounding names by converting the string to its Soundex code. With the conversion, you can fuzzy match / compare between two strings. Metaphone indexes words by their English pronunciation.

Polysemy is defined as the coexistence of multiple meanings for a word or phrase in a text object. Which model is likely the best choice to correct this problem?

Convolutional Neural Network (CNN) as they consider both left and right contexts of the words, which heavily determines the meaning of the central word.

How to tackle the problem of word sense disambiguation?

We can compare the dictionary definition of an ambigous word with its surrounding context words. This is also known as Lesk algorithm.

What is the major difference between CRF (Conditional Random Field) and HMM (Hidden Markov Model)?

CRF is generative model whereas HMM is discriminative model. A generative model learns the joint probability distribution p(x, y) and use bayes theorem to predict conditional probability. A discriminative model learns the conditional probability distribution p(y | x).



Data Scientist

Leave a Reply