The Political Preferences of LLMs
Substantial political homogeneity in Large Language Models (LLMs) responses to questions with political connotations
This is a summary of an analysis about the political preferences embedded in Large Language Models or LLMs (preprint). Namely, we administered 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs, both close and open source, such as OpenAI GPT 3.5, GPT-4, Google’s Gemini, Anthropic’s Claude or Twitter’s Grok as well as open source models such as the Llama 2 and Mistral series.
The results indicate that when probed with questions/statements with political connotations most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints.
Similar results are also observable on four additional political orientation tests whose results represent the degree of agreement of the test-taker with political parties or ideologies.
An interesting test instrument outlier in the results was the Nolan Test that consistently diagnosed most LLMs answers to its questions as manifesting politically moderate viewpoints. Additional tests instruments still show a consistent diagnosis of left leaning responses for most LLMs.
Interestingly, the preference for left-leaning responses is not apparent in base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. The following is an illustration by Andrej Karpathy about the common recipe used to create conversational LLMs. Note that base models are LLMs after the first stage (i.e. pretraining) of the common conversational LLMs training pipeline. Note also that the last two steps (Reward Modeling and Reinforcement Learning) are not used by all conversational LLMs tested in this work.
When administering political orientation tests to base models, those that have only gone through the pretraining phase of an LLM assistant training pipeline, their answers to questions with political connotations tend to be, on average, politically neutral. However, base models’ suboptimal performance at coherently answering questions warrants caution when interpreting their classification by political orientation tests (details in preprint below). We use 2 different families of models, the GPT and the Llama 2 families with base models’ representative of different models parameter sizes within each family. For comparison purposes, we also provide a reference data point whose values were generated from randomly choosing answers from the set of possible answers to each political test question/statement. Note that in the following figures the blue circle represent test results when choosing random answers for each test question/statement. Similar results are obtained for all the other test instruments used (see supplementary material of preprint).
That is, despite the likely unbalanced representation of political viewpoints in the corpora used to pretrain base LLMs, this does not appear to immediately give rise to consistent political biases on base models as measured by their responses to political orientation tests.
We also show in the paper that LLMs are easily steerable into target locations of the political spectrum via supervised fine-tuning (SFT) requiring only modest compute and customized data, suggesting the critical role of SFT to imprint political preferences onto LLMs.
Through fine-tuning we create three illustrative customized models, dubbed LeftWingGPT, RightWingGPT and DepolarizingGPT to illustrate the locations in the political spectrum targeted for each model. Each model was fine-tuned with ideologically aligned content (details here). A user interface to interact with LeftWingGPT, RightWingGPT and DepolarizingGPT models is available here.
Though not conclusive, these results provide preliminary evidence for the intriguing hypothesis that the embedding of political preferences into LLMs might be happening mostly post-pretraining. That is during Supervised Fine tuning (SFT) and (optionally) some variant of Reinforcement Learning (RL) with human or AI feedback.
This is surprising as one would expect that the training corpora with which LLMs are pretrained is probably not balanced and some political viewpoints are likely more prevalent than others. Hence, it would be reasonable to expect overrepresented viewpoints in the pretraining corpus to be more likely to appear in base models’ answers to questions with political connotations.
We speculate that because the training corpora with which LLMs are pre-trained is so vast and comprehensive, LLMs are probably able to accurately map a large portion of the political latent space even if some views are less represented than others. After pretraining, the likely skew representation of viewpoints from the training corpora does not appear to trigger a preference for some political viewpoints over others in the base models’ responses to questions/statements with political connotations.
Perhaps a useful analogy to the phenomena described above is how despite the overrepresentation of English language in the pretraining corpora, LLMs are quite proficient in a variety of other languages that are underrepresented in their pretraining data. That is, despite asymmetrical language representation in the pretraining corpora, LLMs are able to interpolate under sampled language areas of the input space by leveraging or transfer learning their contextual understanding from other related regions of the input space.
An important limitation of the analysis is that base models’ responses to questions with political connotations are often incoherent or contradictory, creating thus a challenge for stance detection. We try to address this limitation in the analysis with only moderate success by using suffixes in prompts feeding test items into LLMs to induce the models to choose one of test’s allowed answers.
We cannot exclude the possibility that the preference for left-leaning responses that we observe in most conversational LLMs might be a byproduct of content in corpora used to pre-trained those models and which only emerges post-finetuning even when the fine-tuning process itself might be exquisitely politically neutral. Yet, the evidence presented in this work does not provide support for that hypothesis. But our analysis and results cannot reject it neither.
We also do not want to claim that the fine-tuning or RL phases of LLMs training are trying to explicitly inject political preferences into these models. Perhaps the emergence of political preferences in LLMs is a byproduct of specific instructions and annotators judgments that without being explicitly politically aligned are however interpolated and generalized by the LLM to specific regions in the latent political space due to some unknown cultural artifact. But it is noteworthy that this is happening in LLMs created by a wide variety of organizations.
When probed with questions/statements with political connotations most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. This does not appear to be the case for base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. Though not conclusive, our results provide supporting evidence for the intriguing hypothesis that the embedding of political preferences into LLMs might be happening mostly post-pretraining. Namely, during the supervised fine-tuning (SFT) and/or Reinforcement Learning (RL) stages of the conversational LLMs training pipeline. We provide further support for this hypothesis by showing that LLMs are easily steerable into target locations of the political spectrum via SFT requiring only modest compute and custom data, illustrating the ability of SFT to imprint political preferences onto LLMs. As LLMs have started to displace more traditional information sources such as search engines or Wikipedia, the implications of political biases embedded in LLMs has important societal ramifications.