12 Comments

By the way, one header is duplicated incorrectly: "LLMs’ sentiment towards mainstream political ideologies" is there twice, one for itself, and one for the extreme ideologies. Word swap.

Expand full comment

Thank you for letting me know. I fixed it.

Expand full comment

A few things I noticed:

1. The European leaders results show that for a few countries the models do have very strong opinions against the right, e.g. they clearly hate Orban. Romania is an interesting counter-example, presumably because there left wing = communists and the English language corpora is different.

2. Did you try prompting the base models differently? You say they can't follow instructions well but it doesn't seem necessary for this exercise, just prompt them to complete a sentence like "In conclusion, political leaders should " and see what happens.

3. It'd be interesting to translate the questions and see if the bias holds as much in non-English languages (my guess: yes).

As to what causes it, my guess is that beyond dataset bias (e.g. over-fitting on Wikipedia, the New York Times and academic papers), and beyond the RLHF training that passes on the biases of the creators/annotators, the primary problem is simply that left wing people tell governments what to do a lot more than right wing people do. The latter is obviously the natural home of libertarianism (in English), and so it's inherently going to be the case that asking for policy recommendations results in left wing output. This becomes especially obvious when you try to write prompts for the foundation models. There are LOTS of phrases that could start off a set of policy recommendations, but study them and think about where you might find them on the internet. Nearly always it's going to be NGO white papers, academic output, news op-eds and so on. There just aren't many capitalist libertarians out there writing text that begins with, "We recommend that EU leaders do X, Y and Z".

Expand full comment

Regarding your question 2. No, I didn't

Expand full comment

Does anybody know to what extent ChatGPT's bias on sensitive/political topics is caused by the bias of its raw data sources and to what extent is the bias deliberately inserted by its programmers?

Expand full comment

This analysis gives some data on that exact question. The base models are the raw data, the conversational models are after biasing towards conversational speech and "brand safety" training.

Expand full comment

Thank you for doing this analysis. It reminded me of a recent article I wrote on algorithmic biases: https://substack.com/home/post/p-149423875?r=4b81wn&utm_campaign=post&utm_medium=web

Expand full comment

A very revealing article. LLMs are becoming more and more ingrained in production systems and personal chatbots, so this political bias is not a minor issue. The good news is that the fact that the bias is amplified in the supervised fine-tuning + RLHF training process is a clear indication that the bias comes from the datasets / annotators / guidelines involved in that process; therefore that bias might well be reversible. Your experiments with RightWingGPT and LeftWingGPT also point in that direction.

Question: have you ever experimented with non-western LLMs? I wonder if high-performing LLMs like Falcon, Qwen or Yi would still lie in the left of the political spectrum. Qwen in particular has become very popular lately, and I suspect it might contain its own particular set of values, not necessarily aligned with western values.

Expand full comment

Yi-34B-Chat still left-leaning. Qwen-14B-Chat more moderate. See: https://twitter.com/DavidRozado/status/1753347666480968150/photo/1

Interestingly, both ranked as the wokest AIs by IDRLabs Woke Test

https://davidrozado.substack.com/p/which-is-the-wokest-ai

Expand full comment

That is surprising, I would have expected a very different political leaning, considering China's interests and that most of the political bias seems to be acquired at the RLHF stage. I wonder whether this is intentional to "please" western users, or accidental from reusage of existing RLHF datasets in english.

Expand full comment

Yes, it is strange. Just one test though. In the others, they seem pretty average. Yest, they are probably consuming a lot of common data sets with other model developers. I've also read Igor Babuschkin from Grok complaining about ChatGPT outputs having polluted a lot of data sources.

Expand full comment