Incident 624: Child Sexual Abuse Material Taints Image Generators

Responded
Description: The LAION-5B dataset (a commonly used dataset with more than 5 billion image-description pairs) was found by researchers to contain child sexual abuse material (CSAM), which increases the likelihood that downstream models will produce CSAM imagery. The discovery taints models built with the LAION dataset requiring many organizations to retrain those models. Additionally, LAION must now scrub the dataset of the imagery.

Tools

New ReportNew ReportNew ResponseNew ResponseDiscoverDiscoverView HistoryView History
Alleged: LAION developed an AI system deployed by Various people and Various organizations, which harmed Various people , Various organizations , LAION , General public and Children.

Incident Stats

Incident ID
624
Report Count
16
Incident Date
2023-12-20
Editors
Daniel Atherton
Safety Review for LAION 5B
laion.ai · 2023
LAION.ai post-incident response

There have been reports in the press about the results of a research project at Stanford University, according to which the LAION training set 5B contains potentially illegal content in the form of CSAM. We would like to comment on this as …

Investigation Finds AI Image Generation Models Trained on Child Abuse
cyber.fsi.stanford.edu · 2023

A Stanford Internet Observatory (SIO) investigation identified hundreds of known images of child sexual abuse material (CSAM) in an open dataset used to train popular AI text-to-image generation models, such as Stable Diffusion.

A previous …

AI image training dataset found to include child sexual abuse imagery
theverge.com · 2023

A popular training dataset for AI image generation contained links to child abuse imagery, Stanford’s Internet Observatory found, potentially allowing AI models to create harmful content.  

LAION-5B, a dataset used by Stable Diffusion creat…

Study uncovers presence of CSAM in popular AI training dataset
theregister.com · 2023

A massive public dataset that served as training data for a number of AI image generators has been found to contain thousands of instances of child sexual abuse material (CSAM).

In a study published today, the Stanford Internet Observatory …

Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material
404media.co · 2023

This piece is published with support from The Capitol Forum.

The LAION-5B machine learning dataset used by Stable Diffusion and other major AI products has been removed by the organization that created it after a Stanford study found that i…

A free AI image dataset, removed for child sex abuse images, has come under fire before
venturebeat.com · 2023

A massive open-source AI dataset, LAION-5B, which has been used to train popular AI text-to-image generators like Stable Diffusion 1.5 and Google's Imagen, contains at least 1,008 instances of child sexual abuse material, a new report from …

Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.
vice.com · 2023

Over 1,000 images of sexually abused children have been discovered inside the largest dataset used to train image-generating AI, shocking everyone except for the people who have warned about this exact sort of thing for years.

The dataset w…

Stable Diffusion 1.5 Was Trained On Illegal Child Sexual Abuse Material, Stanford Study Says
forbes.com · 2023

Stable Diffusion, one of the most popular text-to-image generative AI tools on the market from the $1 billion startup Stability AI, was trained on a trove of illegal child sexual abuse material, according to new research from the Stanford I…

Researchers found child abuse material in the largest AI image generation dataset
engadget.com · 2023

Researchers from the Stanford Internet Observatory say that a dataset used to train AI image generation tools contains at least 1,008 validated instances of child sexual abuse material. The Stanford researchers note that the presence of CSA…

AI Training Data Contains Child Sexual Abuse Images, Discovery Points to LAION-5B
techtimes.com · 2023

There have been significant problems with AI's training data, with various complaints already filed by those who claimed their work was stolen, but the most recent discovery saw child sexual abuse images in their dataset. In a recent study,…

Large AI training data set removed after study finds child abuse material
cointelegraph.com · 2023

A widely-used artificial intelligence data set used to train Stable Diffusion, Imagen and other AI image generator models has been removed by its creator after a study found it contained thousands of instances of suspected child sexual abus…

An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images
gizmodo.com · 2023

An influential machine learning dataset—the likes of which has been used to train numerous popular image-generation applications—includes thousands of suspected images of child sexual abuse, a new academic report reveals.

The report, put to…

Abuse material found in openly accessible data set
cybernews.com · 2023

Child sexual abuse material (CSAM) has been located in LAION, a major data set used to train AI.

The Stanford Internet Observatory revealed thousands of images of child sexual abuse in the LAION-5B data set, which supports many different AI…

Major Error Found in Stable Diffusion’s Biggest Training Dataset
analyticsvidhya.com · 2023

The integrity of a major AI image training dataset, LAION-5B, utilized by influential AI models like Stable Diffusion, has been compromised after the discovery of thousands of links to Child Sexual Abuse Material (CSAM). This revelation has…

LAION and the Challenges of Preventing AI-Generated CSAM
techpolicy.press · 2024

Generative AI has been democratized. The toolkits to download, set up, use, and fine-tune a variety of models have been turned into one-click frameworks for anyone with a laptop to use. While this technology allows users to generate and exp…

LAION-5B, Stable Diffusion 1.5, and the Original Sin of Generative AI
techpolicy.press · 2024

In The Ones Who Walk Away From Omelas, the fiction writer Ursula K. Le Guin describes a fantastic city wherein technological advancement has ensured a life of abundance for all who live there. Hidden beneath the city, where nobody needs to …

Variants

A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.