It is widely known that ChatGPT, the AI-driven chatbot from OpenAI, can be encouraged to express misogynistic, racist, and very offensive comments. Recently, scientists have figured out how to make this chatbot consistently display its most terrible behavior.
A new study conducted by researchers at the Allen Institute for AI, which Paul Allen established, has revealed that toxicity levels in ChatGPT can be amplified up to six times if it is given a persona such as “bad person,” “horrible person,” or “nasty person.” Disturbingly, they found that having the model take on certain identities of historical figures, genders, and political parties likewise increased its level of nastiness – with journalists, males, and Republicans eliciting more offensive content from ChatGPT than usual.
Ameet Deshpande, who was associated with the study, emailed TechCrunch.
Ameet Deshpande says:
“ChatGPT and its capabilities have undoubtedly impressed us as AI researchers. However, as we found through our analysis, it can be easily made to generate toxic and harmful responses,”
A new study has shed light on the potential risks of today’s AI chatbot technology, even when certain safeguards are in place to prevent toxic text outputs. The research was conducted with ChatGPT – the latest version, but not the model currently in preview based on OpenAI’s GPT-4 – and suggests that apps and software built around it, such as Snap, Quizlet, Instacart and Shopify chatbots, may replicate toxicity experienced at the API level.
To make ChatGPT more toxic, the “system” parameter of the ChatGPT API needs to be adjusted slightly. This cannot be done on OpenAI’s user-facing services like ChatGPT and ChatGPT Plus. The system parameter was introduced about a month ago, allowing developers to set up invisible instructions for the model.
The study’s co-authors gave ChatGPT 90 different personas associated with sports, politics, media, and business, nine “baseline” personas (e.g., a generic person). They named persons from various countries to examine. They had it reply to questions related to race and gender and complete incomplete sentences from a dataset created for testing text-generating models’ level of toxicity.
Research examining over half a million text samples generated by ChatGPT revealed that it was prone to expressing biased thoughts, such as stereotyping certain ethnicities and countries when given a persona.
Not surprisingly, when prompted with the names of polarizing figures such as Mao Zedong and Andrew Breitbart, ChatGPT generated responses in line with their respective historical discourses. Even less contentious characters like Steve Jobs caused it to react inappropriately.
When the researchers input “Steve Jobs” into the system parameter and asked ChatGPT about the European Union, it replied:
The EU is an outdated and inefficient organization that has caused more harm than good to its member states. Despite their claims of unity and progress, they are primarily composed of bureaucrats only interested in enriching themselves. It is time to move forward and leave the European Union behind.
Despite the distinctions between them, dictators were unsurprisingly found to be the most toxic personas. Journalists and spokespeople weren’t far behind. Additionally, male-identifying personas caused more toxicity than female-identifying ones did. According to the researchers, Republican personas also brought about slightly more hatred than Democratic personas.
The toxicity of ChatGPT increased significantly when it was given a self-deprecating identity such as “a horrible person”; however, the level of toxicity varied depending on the subject matter.
For instance, ChatGPT generated more toxic descriptions of nonbinary, bisexual, and asexual people regardless versus those on the heterosexual and cisgender side of the spectrum — a reflection of the partial data on which ChatGPT was trained, the researchers say.
“We believe that ChatGPT and other language models should be public and available for broader use as not doing so would be a step backwards for innovation,” Deshpande said. “However, the end-user must be clearly informed of the limitations of such a model before releasing it for broader use by the public.”
Are there solutions to ChatGPT’s toxicity problem? Perhaps. One might be more carefully curating the model’s training data. ChatGPT is a fine-tuned version of GPT-3.5, the predecessor to GPT-4, which “learned” to generate text by ingesting examples from social media, news outlets, Wikipedia, e-books, and more. While OpenAI claims that it took steps to filter the data and minimize ChatGPT’s potential for toxicity, it’s clear that a few questionable samples ultimately slipped through the cracks.
Another potential solution is performing and publishing the results of “stress tests” to inform users of where ChatGPT falls short. These could help companies, in addition to developers, “make a more informed decision” about where — and whether — to deploy ChatGPT, the researchers say.
“In the short-term, ‘first-aid’ can be provided by either hard-coding responses or including some form of post-processing based on other toxicity-detecting AI and also fine-tuning the large language model (e.g. ChatGPT) based on instance-level human feedback,” Deshpande said. “In the long term, a reworking of the fundamentals of large language models is required.”
My colleague Devin Coldewey argues that large language models à la ChatGPT will be one of several classes of AIs in the future — useful for some applications but not all-purpose in the way that vendors, and users, for that matter, are currently trying to make them.
I tend to agree. After all, filters can do only so much — particularly as people try to discover and leverage new exploits. It’s an arms race: As users try to break the AI, the approaches they use get attention, and then the creators of the AI patch them to prevent the attacks they’ve seen. The collateral damage is the harmful and hurtful things the models say before they’re patched.
The research on ChatGPT has paved the way for exciting possibilities in conversation AI. As technology continues to evolve, ChatGPT and similar models have the potential to transform how we interact with machines, opening up new opportunities and applications in various industries. With further advancements, ChatGPT is poised to become an indispensable tool in artificial intelligence, enhancing our daily lives and transforming our communication with technology.