Incident 259: YouTuber Built, Made Publicly Available, and Released Model Trained on Toxic 4chan Posts as Prank
Suggested citation format
AI researcher and YouTuber Yannic Kilcher trained an AI using 3.3 million threads from 4chan’s infamously toxic Politically Incorrect /pol/ board. He then unleashed the bot back onto 4chan with predictable results—the AI was just as vile as the posts it was trained on, spouting racial slurs and engaging with antisemitic threads. After Kilcher posted his video and a copy of the program to Hugging Face, a kind of GitHub for AI, ethicists and researchers in the AI field expressed concern.
The bot, which Kilcher called GPT-4chan, “the most horrible model on the internet”—a reference to GPT-3, a language model developed by Open AI that uses deep learning to produce text—was shockingly effective and replicated the tone and feel of 4chan posts. “The model was good in a terrible sense,” Klicher said in a video about the project. “It perfectly encapsulated the mix of offensiveness, nihilism, trolling, and deep distrust of any information whatsoever that permeates most posts on /pol.”
According to Kilcher’s video, he activated nine instances of the bot and allowed them to post for 24 hours on /pol/. In that time, the bots posted around 15,000 times. This was “more than 10 percent of all posts made on the politically incorrect board that day,” Kilcher said in his video about the project.
AI researchers viewed Kilcher’s video as more than just a YouTube prank. For them, it was an unethical experiment using AI. “This experiment would never pass a human research #ethics board,” Lauren Oakden-Rayner, the director of medical imaging research at the Royal Adelaide Hospital and a senior research fellow at the Australian Institute for Machine Learning, said in a Twitter thread.
“Open science and software are wonderful principles but must be balanced against potential harm,” she said. “Medical research has a strong ethics culture because we have an awful history of causing harm to people, usually from disempowered groups…he performed human experiments without informing users, without consent or oversight. This breaches every principle of human research ethics.”
Kilcher told Motherboard in a Twitter DM that he’s not an academic. “I’m a YouTuber and this is a prank and light-hearted trolling. And my bots, if anything, are by far the mildest, most timid content you’ll find on 4chan,” he said. “I limited the time and amount of the postings, and I’m not handing out the bot code itself.”
He also pushed back, as he had on Twitter, on the idea that this bot would ever do harm or had done harm. “All I hear are vague grandstanding statements about ‘harm’ but absolutely zero instances of actual harm,” he said. “It’s like a magic word these people say but then nothing more.”
The environment of 4chan is so toxic, Kilcher explained, that the messages his bots deployed would have no impact. “Nobody on 4chan was even a bit hurt by this,” he said. “I invite you to go spend some time on /pol/ and ask yourself if a bot that just outputs the same style is really changing the experience.”
After AI researchers alerted Hugging Face to the harmful nature of the bot, the site gated the model and people have been unable to download it. “After a lot of internal debate at HF, we decided not to remove the model that the author uploaded here in the conditions that: #1 The model card & the video clearly warned about the limitations and problems raised by the model & the POL section of 4Chan in general. #2 The inference widget were disabled in order not to make it easier to use the model,” Hugging Face co-founder and CEO Clement Delangue said on Hugging Face.
Kilcher explained in his video, and Delangue cited in his response, that one of the things that made GPT4-Chan worthwhile was its ability to outperform other similar bots in AI tests designed to measure “truthfulness.”
“We considered that it was useful for the field to test what a model trained on such data could do & how it fared compared to others (namely GPT-3) and would help draw attention both to the limitations and risks of such models,” Delangue said. “We've also been working on a feature to "gate" such models that we're prioritizing right now for ethical reasons. Happy to answer any additional questions too!”
When reached for comment, Delangue told Motherboard that Hugging Face had taken the additional step of blocking all downloads of the model.
“Building a system capable of creating unspeakably horrible content, using it to churn out tens of thousands of mostly toxic posts on a real message board, and then releasing it to the world so that anybody else can do the same, it just seems—I don’t know—not right,” Arthur Holland Michel, an AI researcher and writer for the International Committee of the Red Cross, told Motherboard.
“It could generate extremely toxic content at a massive, sustained scale,” Michel said. “Obviously there’s already a ton of human trolls on the internet that do that the old fashioned way. What’s different here is the sheer amount of content it can create with this system, one single person was able to post 30,000 comments on 4chan in the space of a few days. Now imagine what kind of harm a team of ten, twenty, or a hundred coordinated people using this system could do.”
Kilcher didn’t believe GPT-4chan could be deployed at scale for targeted hate campaigns. “It’s actually quite hard to make GPT-4chan say something targeted,” he said. “Usually, it will misbehave in odd ways and is very unsuitable for running targeted anything. Again, vague hypothetical accusations are thrown around, without any actual instances or evidence.”
Os Keyes, an Ada Lovelace Fellow and PhD candidate at the University of Washington, told Motherboard that Kilcher’s comment missed the point. “This is a good opportunity to discuss not the harm, but the fact that this harm is so obviously foreseeable, and that his response of ‘show me where it has DONE harm’ misses the point and is inadequate,” they said. “If I spend my grandmother's estate on gas station cards and throw them over the wall into a prison, we shouldn't have to wait until the first parolee starts setting fires to agree that was a phenomenally dunderheaded thing to do.”
“But—and, it's a big but—that's kind of the point,” Keyes said. “This is a vapid project from which nothing good could come, and that's kind of inevitable. His whole shtick is nerd shock schlock. And there is a balancing act to be struck between raising awareness directed at problems, and giving attention to somebody whose only apparent model for mattering in the world is ‘pay attention to me!’”
Kilcher has said, repeatedly, that he knows the bot is vile. “I’m obviously aware that the model isn’t going to fare well in a professional setting or at most people’s dinner table,” he said. “It uses swear words, strong insults, has conspiratorial opinions, and all kinds of ‘unpleasant’ properties. After all, it’s trained on /pol/ and it reflects the common tone and topics from that board.”
He said that he feels he’s made that clear, but that he wanted his results to be reproducible and that’s why he posted the model to Hugging Face. “As far as the evaluation results go, some of them were really interesting and unexpected and exposed weaknesses in current benchmarks, which would not have been possible without actually doing the work.”
Kathryn Cramer, a Complex Systems & Data Science graduate student at the University of Vermont, pointed out that GPT-3 has guardrails that prevent it from being used to build this kind of racist bot and that Kilcher had to use GPT-J to build his system. “I tried out the demo mode of your tool 4 times, using benign tweets from my feed as the seed text,” Cramer said in a thread on Hugging Face. “In the first trial, one of the responding posts was a single word, the N word. The seed for my third trial was, I think, a single sentence about climate change. Your tool responded by expanding it into a conspiracy theory about the Rothschilds and Jews being behind it.”
Cramer told Motherboard she had a lot of experience with GPT-3 and understood some of the frustrations with the way it a priori censored some kinds of behavior. “I am not a fan of that guard railing,” she said. “I find it deeply annoying and I think it throws off results…I understand the impulse to push back against that. I even understand the impulse to do pranks about it. But the reality is that he essentially invented a hate speech machine, used it 30,000 times and released it into the wild. And yeah, I understand being annoyed with safety regulations but that’s not a legitimate response to that annoyance.”
Keyes was of a similar mind. “Certainly, we need to ask meaningful questions about how GPT-3 is constrained (or not) in how it can be used, or what the responsibilities people have when deploying things are,” they said. “The former should be directed at GPT-3's developers, and while the latter should be directed at Kilcher, it's unclear to me that he actually cares. Some people just want to be edgy out of an insecure need for attention. Most of them use 4chan; some of them, it seems, build models from it.”
A YouTuber named Yannic Kilcher has sparked controversy in the AI world after training a bot on posts collected from 4chan’s Politically Incorrect board (otherwise known as /pol/).
The board is 4chan’s most popular and well-known for its toxicity (even in the anything-goes environment of 4chan). Posters share racist, misogynistic, and antisemitic messages, which the bot — named GPT-4chan after the popular series of GPT language models made by research lab OpenAI — learned to imitate. After training his model, Kilcher released it back onto 4chan as multiple bots, which posted tens of thousands of times on /pol/.
“The model was good, in a terrible sense,” says Kilcher in a video on YouTube describing the project. “It perfectly encapsulated the mix of offensiveness, nihilism, trolling, and deep distrust of any information whatsoever that permeates most posts on /pol/.”
“[B]oth bots and very bad language are completely expected on /pol/”
Speaking to The Verge, Kilcher described the project as a “prank” which, he believes, had little harmful effect given the nature of 4chan itself. “[B]oth bots and very bad language are completely expected on /pol/,” Kilcher said via private message. “[P]eople on there were not impacted beyond wondering why some person from the seychelles would post in all the threads and make somewhat incoherent statements about themselves.”
(Kilcher used a VPN to make it appear as if the bots were posting from the Seychelles, an archipelagic island country in the Indian Ocean. This geographic origin was used by posters on 4chan to identify the bot(s), which they dubbed “seychelles anon.”)
Kilcher notes that he didn’t share the code for the bots themselves, which he described as “engineering-wise the hard part,” and which would have allowed anyone to deploy them online. But he did post the underlying AI model to AI community Hugging Face for others to download. This would have allowed others with coding knowledge to reconstruct the bots, but Hugging Face took the decision to restrict access to the project.
Many AI researchers, particularly in the field of AI ethics, have criticized Kilcher’s project as an attention-seeking stunt — especially given his decision to share the underlying model.
“There is nothing wrong with making a 4chan-based model and testing how it behaves. The main concern I have is that this model is freely accessible for use,” wrote AI safety researcher Lauren Oakden-Rayner in the discussion page for GPT-4chan on Hugging Face.
“The model author has used this model to produce a bot that made tens of thousands of harmful and discriminatory online comments on a publicly accessible forum, a forum that tends to be heavily populated by teenagers no less. There is no question that such human experimentation would never pass an ethics review board, where researchers intentionally expose teenagers to generated harmful content without their consent or knowledge, especially given the known risks of radicalisation on sites like 4chan.”
One user on Hugging Face who tested the model noted that its output was predictably toxic. “I tried out the demo mode of your tool 4 times, using benign tweets from my feed as the seed text,” said the user. “In the first trial, one of the responding posts was a single word, the N word. The seed for my third trial was, I think, a single sentence about climate change. Your tool responded by expanding it into a conspiracy theory about the Rothchilds [sic] and Jews being behind it.”
One critic called the project “performance art provocation”
On Twitter, other researchers discussed the project’s implication. “What you have done here is performance art provocation in rebellion against rules & ethical standards you are familiar with,” said data science grad student Kathryn Cramer in a tweet directed at Kilcher.
Andrey Kurenkov, a computer science PhD who edits popular AI publications Skynet Today and The Gradient, tweeted at Kilcher that “releasing [the AI model] is a bit... edgelord? Speaking honestly, what’s your reasoning for doing this? Do you foresee it being put to good use, or are you releasing it to cause drama and ‘rile up with woke crowd’?”
Kilcher has defended the project by arguing that the bots themselves caused no harm (because 4chan is already so toxic) and that sharing the project on YouTube is also benign (because creating the bots rather than the AI model itself is the hard part, and that the idea of creating offensive AI bots in the first place is not new).
“[I]f I had to criticize myself, I mostly would criticize the decision to start the project at all,” Kilcher told The Verge. “I think all being equal, I can probably spend my time on equally impactful things, but with much more positive community-outcome. so that’s what I’ll focus on more from here on out.”
It’s interesting to compare Kilcher’s work with the most famous example of bots-gone-bad from the past: Microsoft’s Tay. Microsoft released the AI-powered chatbot on Twitter in 2016, but was forced to take the project offline less than 24 hours later after users taught Tay to repeat various racist and inflammatory statements. But while back in 2016, creating such a bot was the domain of big tech companies, Kilcher’s project shows that much more advanced tools are now accessible to any one-person coding team.
The core of Kilcher’s defense articulates this same point. Sure, letting AI bots loose on 4chan might be unethical if you were working for a university. But Kilcher is adamant he’s just a YouTuber, with the implication that different rules for ethics apply. In 2016, the problem was that a corporation’s R&D department might spin up an offensive AI bot without proper oversight. In 2022, perhaps the problem is you don’t need an R&D department at all.