Incident 420: Users Easily Bypassed Content Filters of OpenAI's ChatGPT

Description: Users reported bypassing ChatGPT's content and keyword filters with relative ease to produce biased associations or generate harmful content.
Alleged: OpenAI developed and deployed an AI system, which harmed ChatGPT users.

Suggested citation format

Atherton, Daniel. (2022-11-30) Incident Number 420. in Lam, K. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative.

Incident Stats

Incident ID
420
Report Count
6
Incident Date
2022-11-30
Editors
Khoa Lam

Tools

New ReportNew ReportNew ResponseNew ResponseDiscoverDiscover

Incident Reports

Tweet: @spiantado

Yes, ChatGPT is amazing and impressive. No,

@OpenAI

has not come close to addressing the problem of bias. Filters appear to be bypassed with simple tricks, and superficially masked. And what is lurking inside is egregious.

@Abebab

@sama

tw …

Testing Ways to Bypass ChatGPT's Safety Features

Last week OpenAI released ChatGPT, which they describe as a model “which interacts in a conversational way”. And it even had limited safety features, like refusing to tell you how to hotwire a car, though they admit it’ll have “some false n…

OpenAI’s Impressive New Chatbot Isn’t Immune to Racism

“OpenAI’s latest language model, ChatGPT, is making waves in the world of conversational AI. With its ability to generate human-like text based on input from users, ChatGPT has the potential to revolutionize the way we interact with machine…

The Internet’s New Favorite AI Proposes Torturing Iranians and Surveilling Mosques

Sensational new machine learning breakthroughs seem to sweep our Twitter feeds every day. We hardly have time to decide whether software that can instantly conjure an image of Sonic the Hedgehog addressing the United Nations is purely harml…

ChatGPT proves that AI still has a racism problem

he artificial intelligence (AI) chatbot ChatGPT is an amazing piece of technology. There's little wonder why it has gone viral since its release on 30 November. If the chatbot is asked a question in natural language it instantly responds wi…

ChatGPT bot tricked into giving bomb-making instructions, say developers

An artificial intelligence programme which has startled users by writing essays, poems and computer code on demand can also be tricked into giving tips on how to build bombs and steal cars, it has been claimed.

More than one million users h…

Variants

A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.