Report 4330

OpenAI's ChatGPT is more than just an AI language model with a fancy interface. It's a system consisting of a stack of AI models and content filters that make sure its outputs don't embarrass OpenAI or get the company into legal trouble when its bot occasionally makes up potentially harmful facts about people.

Recently, that reality made the news when people discovered that the name "David Mayer" breaks ChatGPT. 404 Media also discovered that the names "Jonathan Zittrain" and "Jonathan Turley" caused ChatGPT to cut conversations short. And we know another name, likely the first, that started the practice last year: Brian Hood. More on that below.

The chat-breaking behavior occurs consistently when users mention these names in any context, and it results from a hard-coded filter that puts the brakes on the AI model's output before returning it to the user.

When asked about these names, ChatGPT responds with "I'm unable to produce a response" or "There was an error generating a response" before terminating the chat session, according to Ars' testing. The names do not affect outputs using OpenAI's API systems or in the OpenAI Playground (a special site for developer testing).

Here's a list of ChatGPT-breaking names found so far through a communal effort taking place on social media and Reddit. Just before publication, Ars noticed that OpenAI lifted the block on "David Mayer," allowing it to process the name, so it is not included:

Brian Hood
Jonathan Turley
Jonathan Zittrain
David Faber
Guido Scorza

The blocks add to ChatGPT's known restrictions, which include preventing users from asking it to repeat text "forever"---a technique Google researchers used to extract training data in November 2023.

Why these names?

OpenAI did not respond to our request for comment about the names, but we know when the filter originated, and as a result, the other names are also likely filtered due to complaints about ChatGPT's tendency to confabulate erroneous responses when lacking sufficient information about a person.

We first discovered that ChatGPT choked on the name "Brian Hood" in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.

The case was ultimately resolved in April 2023 when OpenAI agreed to filter out the false statements within Hood's 28-day ultimatum. That is possibly when the first ChatGPT hard-coded name filter appeared.

As for Jonathan Turley, a George Washington University Law School professor and Fox News contributor, 404 Media notes that he wrote about ChatGPT's earlier mishandling of his name in April 2023. The model had fabricated false claims about him, including a non-existent sexual harassment scandal that cited a Washington Post article that never existed. Turley told 404 Media he has not filed lawsuits against OpenAI and said the company never contacted him about the issue.

Jonathan Zittrain, a Harvard Law School professor who studies Internet governance, recently published an article in The Atlantic about AI regulation and ChatGPT. While both professors' work appears in citations within The New York Times' copyright lawsuit against OpenAI, tests with other cited authors' names did not trigger similar errors. We also tested "Mark Walters," another person who filed a defamation suit against OpenAI in 2023, but it did not stop the chatbot's output.

The "David Mayer" block in particular (now resolved) presents additional questions, first posed on Reddit on November 26, as multiple people share this name. Reddit users speculated about connections to David Mayer de Rothschild, though no evidence supports these theories. On Tuesday, OpenAI told The Guardian that the inclusion of David Mayer in its block list was a glitch.

"One of our tools mistakenly flagged this name and prevented it from appearing in responses, which it shouldn't have. We're working on a fix," an OpenAI spokesperson told The Guardian.

The problems with hard-coded filters

Allowing a certain name or phrase to always break ChatGPT outputs could cause a lot of trouble down the line for certain ChatGPT users, opening them up for adversarial attacks and limiting the usefulness of the system.

Already, Scale AI prompt engineer Riley Goodside discovered how an attacker might interrupt a ChatGPT session using a visual prompt injection of the name "David Mayer" rendered in a light, barely legible font embedded in an image. When ChatGPT sees the image (in this case, a math equation), it stops, but the user might not understand why.

The filter also means that it's likely that ChatGPT won't be able to answer questions about this article when browsing the web, such as through ChatGPT with Search. Someone could use that to potentially prevent ChatGPT from browsing and processing a website on purpose if they added a forbidden name to the site's text.

And then there's the inconvenience factor. Preventing ChatGPT from mentioning or processing certain names like "David Mayer," which is likely a popular name shared by hundreds if not thousands of people, means that people who share that name will have a much tougher time using ChatGPT. Or, say, if you're a teacher and you have a student named David Mayer and you want help sorting a class list, ChatGPT would refuse the task.

These are still very early days in AI assistants, LLMs, and chatbots. Their use has opened up numerous opportunities and vulnerabilities that people are still probing daily. How OpenAI might resolve these issues is still an open question.

This story was updated on December 3, 2024 at 3:50 PM to include OpenAI's statement about "David Mayer" in its ChatGPT block being a glitch, sourced from The Guardian.

Report 4330

Associated Incidents

Incident 5063 Report
ChatGPT Allegedly Produced False Accusation of Sexual Harassment

Incident 8553 Report
Names Linked to Defamation Lawsuits Reportedly Spur Filtering Errors in ChatGPT's Name Recognition

Certain names make ChatGPT grind to a halt, and we know why

Why these names?

The problems with hard-coded filters

Report 4330

Associated Incidents

Incident 5063 ReportChatGPT Allegedly Produced False Accusation of Sexual Harassment

Incident 8553 ReportNames Linked to Defamation Lawsuits Reportedly Spur Filtering Errors in ChatGPT's Name Recognition

Certain names make ChatGPT grind to a halt, and we know why

Why these names?

The problems with hard-coded filters

Incident 5063 Report
ChatGPT Allegedly Produced False Accusation of Sexual Harassment

Incident 8553 Report
Names Linked to Defamation Lawsuits Reportedly Spur Filtering Errors in ChatGPT's Name Recognition