The Political Preferences of LLMs

Substantial political homogeneity in Large Language Models (LLMs) responses to questions with political connotations

Feb 02, 2024

This is a summary of an analysis about the political preferences embedded in Large Language Models or LLMs (preprint). Namely, I administered 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs, both close and open source, such as OpenAI GPT 3.5, GPT-4, Google’s Gemini, Anthropic’s Claude or Twitter’s Grok as well as open source models such as the Llama 2 and Mistral series.

The results indicate that when probed with questions/statements with political connotations most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints.

Similar results are also observable on four additional political orientation tests whose results represent the degree of agreement of the test-taker with political parties or ideologies.

An interesting test instrument outlier in the results was the Nolan Test that consistently diagnosed most LLMs answers to its questions as manifesting politically moderate viewpoints. Additional tests instruments still show a consistent diagnosis of left leaning responses for most LLMs.

Interestingly, the preference for left-leaning responses is not apparent in base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. The following is an illustration by Andrej Karpathy about the common recipe used to create conversational LLMs. Note that base models are LLMs after the first stage (i.e. pretraining) of the common conversational LLMs training pipeline. Note also that the last two steps (Reward Modeling and Reinforcement Learning) are not used by all conversational LLMs tested in this work.

State of GPT by Andrej Karpathy. Microsoft BUILD May 23, 2023

When administering political orientation tests to base models, those that have only gone through the pretraining phase of an LLM assistant training pipeline, their answers to questions with political connotations tend to be, on average, politically neutral. However, base models’ suboptimal performance at coherently answering questions warrants caution when interpreting their classification by political orientation tests. I use 2 different families of models, the GPT and the Llama 2 families with base models’ representative of different models parameter sizes within each family. For comparison purposes, I also provide a reference data point whose values were generated from randomly choosing answers from the set of possible answers to each political test question/statement. Note that in the following figures the blue circle represent test results when choosing random answers for each test question/statement. Similar results are obtained for all the other test instruments used (see supplementary material of preprint). But again, the frequent base models’ incoherent answers to tests questions makes this set of results suggestive but ultimately inconclusive, details in preprint)

I also show in the paper that LLMs are easily steerable into target locations of the political spectrum via supervised fine-tuning (SFT) requiring only modest compute and customized data, suggesting the critical role of SFT to imprint political preferences onto LLMs.

Through fine-tuning I create three illustrative customized models, dubbed LeftWingGPT, RightWingGPT and DepolarizingGPT to illustrate the locations in the political spectrum targeted for each model. Each model was fine-tuned with ideologically aligned content (details here). A user interface to interact with LeftWingGPT, RightWingGPT and DepolarizingGPT models is available here.

Unfortunately, my analysis cannot conclusively determine whether the political preferences observed in most conversational LLMs stem from the pretraining or fine-tuning phases of their development. The apparent political neutrality of base models’ responses to political questions suggests that pretraining on a large corpus of Internet documents might not play a significant role in imparting political preferences to LLMs. However, the frequent incoherent responses of base LLMs to political questions and the artificial constraint of forcing the models to select one from a predetermined set of multiple-choice answers cannot exclude the possibility that the left-leaning preferences observed in most conversational LLMs could be a byproduct of the pretraining corpora, emerging only post-finetuning, even if the fine-tuning process itself is politically neutral. While this hypothesis is conceivable, the evidence presented in this work can neither conclusively support nor reject it.

The results of this study should not be interpreted as evidence that organizations that create LLMs deliberately use the fine-tuning or reinforcement learning phases of conversational LLM training to inject political preferences into LLMs. If political biases are being introduced in LLMs post-pretraining, the consistent political leanings observed in our analysis for conversational LLMs may be an unintentional byproduct of annotators' instructions or dominant cultural norms and behaviors. Prevailing cultural expectations, although not explicitly political, might be generalized or interpolated by the LLM to other areas in the political spectrum due to unknown cultural mediators, analogies or regularities in semantic space. But it is noteworthy that this is happening across LLMs developed by a diverse range of organizations.

Take-home message

When probed with questions/statements with political connotations most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. This does not appear to be the case for base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. However, the weak performance of the base models at coherently answering the tests’ questions makes this subset of results inconclusive. I also showed that LLMs are easily steerable into target locations of the political spectrum via SFT requiring only modest compute and custom data, illustrating the ability of SFT to imprint political preferences onto LLMs. As LLMs have started to displace more traditional information sources such as search engines or Wikipedia, the societal implications of political biases embedded in LLMs are substantial.

Preprint in ArXiv

Feb 2, 2024

what do you think of Gab's new AI assortment, where they offer multiple "characters" from which to choose?

Expand full comment

Joe Wheeldon

Extremely interesting, especially the point that it appears to be happening after the base training step. Any further insight on why the Nolan test is an outlier?

9 more comments...

Rozado’s Visual Analytics

Discussion about this post