What are the three types of bias?

1. Statistical bias. The difference between expected value and true value

2. Unjust bias. Disproportionate preference for and against a group

3. Unconscious bias. Biases that we don’t realise

What is representation bias and evaluation bias?

Representation bias is when the dataset you use doesn’t accurately represent the nature of the problem whereas evaluation bias is when everyone uses the same benchmark dataset and the errors that exist in the dataset doesn’t get highlighted.

What is measurement bias?

It’s where the data you used might not be directly measured for the things you want to predict. For example, to measure the probability of strokes, EHR data were used. From the EHR data, we were able to identify several factors that could help us better predict the probability of strokes. The factors are as follow:

1. Prior strokes

2. Cardiovascular disease

3. Accidental injury

4. Benign breast lump

5. Colonoscopy

Factor 1 and 2 makes sense but the rest are bizarre factors. The reason the model picked factors 3 – 5 is because we tend to measure people who go to hospitals when they have symptoms and so we aren’t necessarily measuring strokes.

What is aggregation bias?

When you try to use one model to capture all the different factors for a particular problem. For example, diabetes patients have different complications across ethnicities and if you train a single model to capture all diabetes patients from different ethnicities, it might lead to aggregation bias.

Humans are biased, so why does algorithmic bias matter?

1. Algorithms and humans are used differently

2. ML can amplify bias

3. ML can create feedback loops

4. Technology is power. And with that comes responsibility

[Point 1] Algorithms are assumed to be objective and error-free (which it’s not true) and it’s often used at scale meaning the error caused by algorithms can propagate at mass. [Point 3] Feedback loops occurs when your model is controlling the next round of data you get. For example, a crime rate detector. If the detector predicts that more crime would happen in a particular area, more police would be deploy into that area which subsequently train the model to predict high “crime” rate in that area due to lots of police being deploy there.