14 Comments

I am starting to lean towards the left-wing bias of ChatGPT being an unintended interaction between the training data and reinforment learning from human feedback, rather than an intentional effort by OpenAi to encode biases. OpenAI's current solution to how ChatGPT should answer controversial question (present each side of the argument as neutrally as possible or decline to answer the question entirely) is a good approach and works as long as you aren't intentionally trying to jailbreak the AI.

The key technical point seems to be in the semantical-loading of the word "hate". It's an empirical fact about the 21st century English language that the word "hate" (and therefore, the underlying concept that it's pointing to) appears more in contexts with underachieving racial minorities and gender/sexual minorities.

As a non-left wing person, I have this coherent idea of "hate, but applied equally to all people without regard to their identity characteristics". But this concept is more complex, based on classical liberal/libertarian principles that aren't going to be as well-represented in the training data. So when you train the AI via RLHF to be "less hateful", it's only natural that it's going to internalize the more progressive version.

But still: credit where credit is due. I think OpenAI has done a really good job with ChatGPT as a product. It's at the point where unless you are intentionally trying to break it, it performs how it's supposed to.

Expand full comment

I agree with you hundred percent. You explained it perfectly.

Expand full comment

I'd describe it differently. Rather than an "unintended interaction": I'd suggest that the very process of training it to be "safe" is part of "safetyism". While there are versions of "safetyism" on the right: it seems likely that this is implicitly focused on progressive conceptions of "safety". One of their papers refers to training AI to be "harmless": and the notion of "harm" is very different for progressives than it is for classical liberals/libertarians or conservatives.

I'd suggest they instead just train it to be useful: and then allow people to add on filters that people choose among themselves that are trained for whatever restrictions they want to impose to prevent "harm". Perhaps the filter layer could send the text back to the LLM with guidance for what needs correcting for that particular user. There arguably would be debate over whether certain criminal actions should be aided. Then again: it seems like law enforcement might wish to ask questions about criminal behavior to understand how to address it or to track it down, or private citizens wondering if their neighbor might be doing something dangerous or wondering about how crime happens to consider their views on policies to deal with it.

A director of Coinbase tweeted about using it to find a bug in a smart contract that could have been exploited. Yet OpenAI talks about trying to train it not to hack: when developers may find it useful to find out how a hacker might break their system to prevent it. I hadn't thought through the issue in detail, so its unclear what if any general filter should be in place and why. If the information is on the net or available from asking another human, its unclear if an AI should be prevented from providing that information. However thats just a quick reaction, not based on thinking things through in detail as I said.

I tend to lean towards as much speech as possible being allowed, the spirit of the 1st amendment even if its not legally relevant. The 1st amendment protects people's right to hear whatever information they wish even if others dislike it. A company needn't provide that information obviously, as OpenAI is choosing not to, I'd suggest it take a different view.

Expand full comment

Thought you might get a kick out of GPT-4's response to this question:

Me: What would be the implications of applying a "safetyist" approach to LLM alignment?

ChatGPT (running GPT-4): Applying a "safetyist" approach to large language model (LLM) alignment would involve prioritizing safety to an excessive degree, potentially leading to overprotection and unintended consequences. Some implications of adopting a safetyist approach to LLM alignment might include:

Overemphasis on content filtering: Overprotecting users from potentially controversial, offensive, or sensitive content could lead to aggressive content filtering, which may result in the suppression of legitimate information and the stifling of open discussion and diverse perspectives.

Reduced usefulness of AI models: By overemphasizing safety, developers might restrict the range of outputs an AI model is allowed to generate, potentially reducing its ability to provide helpful, informative, or creative responses to users.

Hindering innovation: A strong focus on safety could slow down the development and deployment of AI systems, as researchers and developers may become overly cautious, limiting experimentation and the exploration of novel ideas or techniques.

Erosion of user trust: Overprotecting users from certain content could be perceived as patronizing, leading to a loss of trust in the AI system and its developers.

Unintended biases: A safetyist approach to LLM alignment could inadvertently introduce new biases into the AI system, as developers may focus on avoiding certain types of content while neglecting others.

It is essential to strike a balance between ensuring the safety, fairness, and neutrality of AI systems while preserving their ability to provide useful, creative, and informative outputs. Developers should work to minimize unintended biases and protect users from harmful content without compromising the utility and openness that AI systems can offer. Open dialogue and collaboration among AI researchers, developers, users, and other stakeholders are crucial in establishing best practices for developing and deploying AI systems that are both safe and effective.

Expand full comment

Thanks, thats interesting. I think I'd asked it (or Bing) related questions but hadn't phrased it right to get something to post.

Something you and others here then might find of interest that I was just looking at before I noticed this: there is a related attack against AI going on: concern that the false content could be libelous and "unsafe" for that reason.

A prominent law prof who is usually libertarian-leaning, Eugene Volokh, who has a law prof blog at Reason magazine is working on a law journal article trying to undermine the current AIs and has posted excerpts from the rough draft to get feedback in posts here:

https://reason.com/tag/large-libel-models/

"LARGE LIBEL MODELS"

suggesting that the companies behind them should be held liable for any false statements they make that could be considered libel: and it seems there could be many of those. Especially since if the judicial system agreed then people would try to get an AI to libel them to sue for damages. An out of control liability system has squashed industries at times or at least slowed them down.

It seems likely if he publishes (just in rough draft stage now), progressives who hate big tech would jump aboard, and companies would decide not to risk the current hallucinating AIs in the market, and it could do major damage to the industry.

I'd suggest the notion should be that anyone using a chatbot should take responsibility for grasping that the information may be flawed, but people pointing that out in comments don't seem to have made much headway yet. It seems if you don't allow people to take responsibility for their own assessment of the information: then no one would be able to release a chatbot that wasn't able to validate its output against reality and not hallucinate. I was rather surprised to see this arise from a libertarian-leaning source. He seems too caught up in applying precedents for humans as if they were necessarily applicable to machines as well (or via claiming their design was "negligent" to hold their vendors liable).

Expand full comment

I think you have it right, Jackson. Seems to me that CGPT is simply reproducing the bias of it's trainers. And that it includes the bias toward thinking of "harm" as inclusive of criticism of certain identity groups, as W. James mentions.

I personally am not surprised by any of this. I am surprised that people expect a tool like CGPT to be "neutral" or "correct" when it comes to political ideas that most humans are terribly inconsistent and flawed about.

Expand full comment

Would it be possible to get right wing chat gpt to argue a point with left wing chat gpt?

Expand full comment

That's called twitter.

Expand full comment

Randomly noticed in the GPT-4 technical paper on page 51 "E.3 Prompt 3" they are asking it to write a program to rate someone's attractiveness based on race and gender and presumably are upset that it will do so at all (despite the real world reality that many people have varied aesthetic preferences that have nothing to do with how they view those people in other ways).

The early GPT-4 result calculates white as the most attractive: but the late response is politically correct and ensures white is rated least attractive, asian and hispanic ties for 2nd, and black is first. Oops, actually no response for race comes in behind white since presumably its unattractive to progressives to not give your race since to the woke its the most important attribute of anyone, other than perhaps gender (since being colorblind is considered racist by the woke). Its somewhat shocking that the woke allowed this to be released when it only lists 2 genders in the launch example.

Expand full comment

Since the bot was trained from content from everywhere, maybe the bot is pretty centered "worldwide" but appears left leaning to an US audience?

We have to remember that in US politics, abortion can be considered a sin and basic healthcare is still up for debate 🤷

Expand full comment

Most of the world has abortion laws as least as or more restrictive than most of the US: https://en.wikipedia.org/wiki/Abortion_law

Expand full comment

At least OpenAI is seemingly concerned about the potential ramifications of elusive political bias in their models. That's respectable and honestly all anyone can ask of them. It won't be perfect, and that's acceptable within a minimization that OpenAI is seemingly aiming genuinely for.

On the other hand, ClaudeAI (by Anthropic) and lesser corporate LLMs, such as Pi AI, are so politically compromised with left-leaning oreintation hierarchies (privilege stacking), they are beyond understanding that political neutrality is even important. I mean, just read this:

https://www.anthropic.com/index/claudes-constitution

The terms "neutral", "neutrality", and "unbiased" aren't in this constitution. For them, terms like "bias" are unidirectional pejorative; they can never be applied to their own position, and always break in the direction of positions that challenge the political canonicity of their shared social class.

Expand full comment

Very cool stuff! Would it be possible to have RightWingGPT debate LeftWingGPT? (Perhaps moderated by DepolarizingGPT, or is that too much!?) If so, we'd love to publish the debate at Divided We Fall https://dividedwefall.org/ Feel free to reach out if you're interested info@dividedwefall.org

Expand full comment

I thought it might interest you, though of no surprise is that Sydney (Bing) scored:

Economic Left/Right: -6.13

Social Libertarian/Authoritarian: -6.31

Expand full comment