Incident 624: Child Sexual Abuse Material Taints Image Generators

Responded

Description: The LAION-5B dataset (a commonly used dataset with more than 5 billion image-description pairs) was found by researchers to contain child sexual abuse material (CSAM), which increases the likelihood that downstream models will produce CSAM imagery. The discovery taints models built with the LAION dataset requiring many organizations to retrain those models. Additionally, LAION must now scrub the dataset of the imagery.

Tools

New Report New Response DiscoverView History

Entities

View all entities

Alleged: LAION developed an AI system deployed by Various people and Various organizations, which harmed Various people , Various organizations , LAION , General public and Children.

Incident Stats

Incident ID

624

Report Count

Incident Date

2023-12-20

Editors

Daniel Atherton

Applied Taxonomies

MIT

MIT Taxonomy Classifications

Machine-Classified

Taxonomy Details

Risk Subdomain

2.1. Compromise of privacy by obtaining, leaking or correctly inferring sensitive information

Risk Domain

Privacy & Security

Entity

Human

Timing

Pre-deployment

Intent

Unintentional

Incident Reports

Reports Timeline

Safety Review for LAION 5B

laion.ai

Investigation Finds AI Image Generation Models Trained on Child Abuse

cyber.fsi.stanford.edu

AI image training dataset found to include child sexual abuse imagery

theverge.com

Study uncovers presence of CSAM in popular AI training dataset

theregister.com

Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material

404media.co

Stable Diffusion 1.5 Was Trained On Illegal Child Sexual Abuse Material, Stanford Study Says

forbes.com

AI Training Data Contains Child Sexual Abuse Images, Discovery Points to LAION-5B

techtimes.com

A free AI image dataset, removed for child sex abuse images, has come under fire before

venturebeat.com

Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.

vice.com

Researchers found child abuse material in the largest AI image generation dataset

engadget.com

An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images

gizmodo.com

Large AI training data set removed after study finds child abuse material

cointelegraph.com

Abuse material found in openly accessible data set

cybernews.com

Major Error Found in Stable Diffusion’s Biggest Training Dataset

analyticsvidhya.com

LAION and the Challenges of Preventing AI-Generated CSAM

techpolicy.press

LAION-5B, Stable Diffusion 1.5, and the Original Sin of Generative AI

techpolicy.press

Was an AI Image Generator Taken Down for Making Child Porn?

spectrum.ieee.org

Child abuse images removed from AI image-generator training source, researchers say

apnews.com

laion.ai · 2023

LAION.ai post-incident response

There have been reports in the press about the results of a research project at Stanford University, according to which the LAION training set 5B contains potentially illegal content in the form of CSAM. We would like to comment on this as …

cyber.fsi.stanford.edu · 2023

A Stanford Internet Observatory (SIO) investigation identified hundreds of known images of child sexual abuse material (CSAM) in an open dataset used to train popular AI text-to-image generation models, such as Stable Diffusion.

A previous …

theverge.com · 2023

A popular training dataset for AI image generation contained links to child abuse imagery, Stanford’s Internet Observatory found, potentially allowing AI models to create harmful content.

LAION-5B, a dataset used by Stable Diffusion creat…

theregister.com · 2023

A massive public dataset that served as training data for a number of AI image generators has been found to contain thousands of instances of child sexual abuse material (CSAM).

In a study published today, the Stanford Internet Observatory …

404media.co · 2023

This piece is published with support from The Capitol Forum.

The LAION-5B machine learning dataset used by Stable Diffusion and other major AI products has been removed by the organization that created it after a Stanford study found that i…

forbes.com · 2023

Stable Diffusion, one of the most popular text-to-image generative AI tools on the market from the $1 billion startup Stability AI, was trained on a trove of illegal child sexual abuse material, according to new research from the Stanford I…

techtimes.com · 2023

There have been significant problems with AI's training data, with various complaints already filed by those who claimed their work was stolen, but the most recent discovery saw child sexual abuse images in their dataset. In a recent study,…

venturebeat.com · 2023

A massive open-source AI dataset, LAION-5B, which has been used to train popular AI text-to-image generators like Stable Diffusion 1.5 and Google's Imagen, contains at least 1,008 instances of child sexual abuse material, a new report from …

vice.com · 2023

Over 1,000 images of sexually abused children have been discovered inside the largest dataset used to train image-generating AI, shocking everyone except for the people who have warned about this exact sort of thing for years.

The dataset w…

engadget.com · 2023

Researchers from the Stanford Internet Observatory say that a dataset used to train AI image generation tools contains at least 1,008 validated instances of child sexual abuse material. The Stanford researchers note that the presence of CSA…

gizmodo.com · 2023

An influential machine learning dataset—the likes of which has been used to train numerous popular image-generation applications—includes thousands of suspected images of child sexual abuse, a new academic report reveals.

The report, put to…

cointelegraph.com · 2023

A widely-used artificial intelligence data set used to train Stable Diffusion, Imagen and other AI image generator models has been removed by its creator after a study found it contained thousands of instances of suspected child sexual abus…

cybernews.com · 2023

Child sexual abuse material (CSAM) has been located in LAION, a major data set used to train AI.

The Stanford Internet Observatory revealed thousands of images of child sexual abuse in the LAION-5B data set, which supports many different AI…

analyticsvidhya.com · 2023

The integrity of a major AI image training dataset, LAION-5B, utilized by influential AI models like Stable Diffusion, has been compromised after the discovery of thousands of links to Child Sexual Abuse Material (CSAM). This revelation has…

techpolicy.press · 2024

Generative AI has been democratized. The toolkits to download, set up, use, and fine-tune a variety of models have been turned into one-click frameworks for anyone with a laptop to use. While this technology allows users to generate and exp…

techpolicy.press · 2024

In The Ones Who Walk Away From Omelas, the fiction writer Ursula K. Le Guin describes a fantastic city wherein technological advancement has ensured a life of abundance for all who live there. Hidden beneath the city, where nobody needs to …

spectrum.ieee.org · 2024

David Evan Harris, Dave Willner post-incident response

Why are AI companies valued in the millions and billions of dollars creating and distributing tools that can make AI-generated child sexual abuse material (CSAM)?

An image generator called Stable Diffusion version 1.5, which was created by …

apnews.com · 2024

Artificial intelligence researchers said Friday they have deleted more than 2,000 web links to suspected child sexual abuse imagery from a dataset used to train popular AI image-generator tools.

The LAION research dataset is a huge index of…

Variants

A "variant" is an AI incident similar to a known case—it has the same causes, harms, and AI system. Instead of listing it separately, we group it under the first reported incident. Unlike other incidents, variants do not need to have been reported outside the AIID. Learn more from the research paper.

Seen something similar?

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

DALL-E 2 Reported for Gender and Racially Biased Outputs

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

Incident 624: Child Sexual Abuse Material Taints Image Generators

Tools

Entities

Incident Stats

MIT Taxonomy Classifications

Incident Reports

Reports Timeline

Safety Review for LAION 5B

Investigation Finds AI Image Generation Models Trained on Child Abuse

AI image training dataset found to include child sexual abuse imagery

Study uncovers presence of CSAM in popular AI training dataset

Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material

Stable Diffusion 1.5 Was Trained On Illegal Child Sexual Abuse Material, Stanford Study Says

AI Training Data Contains Child Sexual Abuse Images, Discovery Points to LAION-5B

A free AI image dataset, removed for child sex abuse images, has come under fire before

Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.

Researchers found child abuse material in the largest AI image generation dataset

An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images

Large AI training data set removed after study finds child abuse material

Abuse material found in openly accessible data set

Major Error Found in Stable Diffusion’s Biggest Training Dataset

LAION and the Challenges of Preventing AI-Generated CSAM

LAION-5B, Stable Diffusion 1.5, and the Original Sin of Generative AI

Was an AI Image Generator Taken Down for Making Child Porn?

Child abuse images removed from AI image-generator training source, researchers say

Variants

Similar Incidents

By textual similarity

DALL-E 2 Reported for Gender and Racially Biased Outputs

Sexist and Racist Google Adsense Advertisements

Facebook’s Political Ad Detection Reportedly Showed High and Geographically Uneven Error Rates

Similar Incidents

By textual similarity

DALL-E 2 Reported for Gender and Racially Biased Outputs

Sexist and Racist Google Adsense Advertisements

Facebook’s Political Ad Detection Reportedly Showed High and Geographically Uneven Error Rates