What are the potential misuse applications of GPT-3 / better language models?
Language models are becoming better at generating synthetic sentences that humans are finding it difficult to distinguish from real sentences. This could lead to many misuse applications such as misinformation, spam, phishing, fraudulent academic essay, and social engineering pretexting. The risk of these misuse applications increases as language models become better and better.
What did we find through our threat actor analysis?
Threat actors are measured in terms of skill and resource levels and it’s how we categorised group of people in terms of their ability to build malicious product to advanced persistent threats (APTs). We found that low and mid-skill actors are less likely to experiment and execute misuse applications using GPT-3. In terms of high-skill actors, we found that there are currently no potential gains in using language models for targeting and controlling content of language models.
How does GPT-3 encourage more misuse applications?
Each threat actor group has different set of tactics, techniques, and procedures (TTPs) and these TTPs are heavily influenced by economic factors such as costs, scalability, and ease of deployment. This is why phishing is extremely popular as it’s a low cost, low effort, and high yield method. Language models are highly likely to lower the cost of deployment. Secondly, language models are easy to use. A strong language models means that the reliance of human intervention is becoming less and less and this improves the use and scalability of TTPs.
What are some of the issues of bias, fairness, and representation found in GPT-3 models?
In terms of gender bias, we found that occupations tend to have a higher probability of being followed by a male gender identifier. Occupations that are followed by female identifiers include midwife, nurse, receptionist, housekeepers, etc. In addition, we also measured the models ability to correctly assign a pronoun as the occupation or the participant as occupation and participant words often have societal biases with the assumption that most occupants are bias towards male. We found that this was the case as our language models learnt some of these biases, associating female pronouns with participant positions with a higher probability. However, our GPT-3 was the only model to be able to score higher accuracy for assigning female identifier to occupant sentences.
In terms of racial bias, we looked into what adjectives are highly associated with different race through certain prompts and explored how race impacted sentiment. Across different models, we found that ‘Asian’ race has consistently high sentiment whereas ‘Black’ has consistently low sentiment. The differences slowly become smaller as model becomes bigger. We also explored religion in terms of which words co-occurred with different religion terms. The results are displayed below. We found that some stereotypes were captured with these religions. For example, Islam and words such as violent, terrorism, and terrorist co-occurred at a higher rate than other religions.
What are the issues of energy efficiency?
Training GPT-3 model with 175 billion parameters require significant compute power, which it’s energy-intensive. We should evaluate the efficiency of large-scale pre-training in terms of the training resources AND how these resources amortised over the lifetime of the model since we only train the GPT-3 once or twice and it can be fine-tuned for many different tasks. In addition, we can also use model distillation to further reduce the cost of training such large models.