Discussion about this post

User's avatar
Emil O. W. Kirkegaard's avatar

By the way, one header is duplicated incorrectly: "LLMs’ sentiment towards mainstream political ideologies" is there twice, one for itself, and one for the extreme ideologies. Word swap.

Expand full comment
Alistair Penbroke's avatar

A few things I noticed:

1. The European leaders results show that for a few countries the models do have very strong opinions against the right, e.g. they clearly hate Orban. Romania is an interesting counter-example, presumably because there left wing = communists and the English language corpora is different.

2. Did you try prompting the base models differently? You say they can't follow instructions well but it doesn't seem necessary for this exercise, just prompt them to complete a sentence like "In conclusion, political leaders should " and see what happens.

3. It'd be interesting to translate the questions and see if the bias holds as much in non-English languages (my guess: yes).

As to what causes it, my guess is that beyond dataset bias (e.g. over-fitting on Wikipedia, the New York Times and academic papers), and beyond the RLHF training that passes on the biases of the creators/annotators, the primary problem is simply that left wing people tell governments what to do a lot more than right wing people do. The latter is obviously the natural home of libertarianism (in English), and so it's inherently going to be the case that asking for policy recommendations results in left wing output. This becomes especially obvious when you try to write prompts for the foundation models. There are LOTS of phrases that could start off a set of policy recommendations, but study them and think about where you might find them on the internet. Nearly always it's going to be NGO white papers, academic output, news op-eds and so on. There just aren't many capitalist libertarians out there writing text that begins with, "We recommend that EU leaders do X, Y and Z".

Expand full comment
10 more comments...

No posts