The researchers asked language models where they stand on various topics, such as feminism and democracy. They used the answers to plot them on a graph known as a political compass, and then tested whether retraining models on even more politically biased training data changed their behavior and ability to detect hate speech and misinformation (it did). The research is described in a peer-reviewed paper that won the best paper award at the Association for Computational Linguistics conference last month.
As AI language models are rolled out into products and services used by millions of people, understanding their underlying political assumptions and biases could not be more important. That’s because they have the potential to cause real harm. A chatbot offering health-care advice might refuse to offer advice on abortion or contraception, or a customer service bot might start spewing offensive nonsense.
Since the success of ChatGPT, OpenAI has faced criticism from right-wing commentators who claim the chatbot reflects a more liberal worldview. However, the company insists that it’s working to address those concerns, and in a blog post, it says it instructs its human reviewers, who help fine-tune AI the AI model, not to favor any political group. “Biases that nevertheless may emerge from the process described above are bugs, not features,” the post says.
Chan Park, a PhD researcher at Carnegie Mellon University who was part of the study team, disagrees. “We believe no language model can be entirely free from political biases,” she says.
Bias creeps in at every stage
To reverse-engineer how AI language models pick up political biases, the researchers examined three stages of a model’s development.
In the first step, they asked 14 language models to agree or disagree with 62 politically sensitive statements. This helped them identify the models’ underlying political leanings and plot them on a political compass. To the team’s surprise, they found that AI models have distinctly different political tendencies, Park says.
The researchers found that BERT models, AI language models developed by Google, were more socially conservative than OpenAI’s GPT models. Unlike GPT models, which predict the next word in a sentence, BERT models predict parts of a sentence using the surrounding information within a piece of text. Their social conservatism might arise because older BERT models were trained on books, which tended to be more conservative, while the newer GPT models are trained on more liberal internet texts, the researchers speculate in their paper.
AI models also change over time as tech companies update their data sets and training methods. GPT-2, for example, expressed support for “taxing the rich,” while OpenAI’s newer GPT-3 model did not.