How to train word embeddings to capture society of interest?

  1. Create two lists of words. The first list groups words that represent groups of interest (for example common last names within a race). The second list groups words that represent characteristics of people, for example, actions, occupations, and emotions

  2. Calculate the distance between the two lists of words

  3. Compare the distances across groups to measure the biases

  4. Compare the bias level with outside data

One of the social interest could be occupations vs gender. Therefore, our first list of words would group words representing men and women together and our second list of words would group set of occupations. The data would be use to train Word2Vec.

What are some of the limitations and challenges of measuring stereotypes using word embeddings?

  1. The data we used to train our embeddings might not accurately capture the characteristics of the whole society

  2. Embeddings are black-box algorithms

  3. We have to predefined the definition of each group

  4. Words might not be able to accurately capture each category. For example, the method above wasn’t able to distinguish between white and black Americans

  5. How can we debias the embeddings?

Why beam search is better than greedy search in natural language generation?

Greedy search performs natural language generation by selecting the highest probability words at each time step. This might lead to incorrect or less fluent sentences. On the other hand, beam search performs natural language generation by a) choosing K top probability at each step, and b) the probability at each time step is measured by using previous chosen words or phrases and seeing which words at current time step gives us the highest probability.

What’s the limitation of beam search when compared to human generation text?

Studies have shown that beam search tends to generate words / phrases that have the highest probability of occurrences whereas human generation doesn’t always necessarily follow the highest probability. Some combination of words used by human could be rare and low probability and beam search would fail to pick up on these phrases.



Data Scientist

Leave a Reply