Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse

Report 3557

Associated Incidents

Incident 62418 Report
Child Sexual Abuse Material Taints Image Generators

Loading...
Abuse material found in openly accessible data set
cybernews.com · 2023

Child sexual abuse material (CSAM) has been located in LAION, a major data set used to train AI.

The Stanford Internet Observatory revealed thousands of images of child sexual abuse in the LAION-5B data set, which supports many different AI models.

The report shows that AI models such as Stable Diffusion and Google's Imagen "were trained on billions of scraped images in the LAION-5B dataset." This dataset is said to have been created through "unguided crawling that includes a significant amount of explicit material."

These images have allowed AI systems to produce realistic and explicit images of imaginary children while also altering images of clothed individuals into nude photos.

Previous Stanford Internet Observatory reports have deduced that machine-learning models can produce CSAM. However, the work assumed this was only possible by combining "two concepts" such as child and explicit actions.

Despite LAION's attempts to classify whether the content was sexually explicit or whether data contained underage explicit content, models were trained on a wide array of benign and graphic content.

The report concludes that having possession of a LAION-5B dataset implies the possession of "thousands of illegal images -- not including all of the intimate imagery published and gathered non-consensually."

There is no evidence to suggest that CSAM influences the model's output, and the likelihood of CSAM content exerting influence is slim.

Despite having a "zero tolerance policy for illegal content," a multitude of images containing CSAM are present in the LAION open-source data set.

LAION's 5B data set has since been taken offline, and the non-profit is working closely with the Internet Watch Foundation, a charity dedicated to protecting children worldwide by removing and preventing abusive content online.

Read the Source

Research

  • Defining an “AI Incident”
  • Defining an “AI Incident Response”
  • Database Roadmap
  • Related Work
  • Download Complete Database

Project and Community

  • About
  • Contact and Follow
  • Apps and Summaries
  • Editor’s Guide

Incidents

  • All Incidents in List Form
  • Flagged Incidents
  • Submission Queue
  • Classifications View
  • Taxonomies

2024 - AI Incident Database

  • Terms of use
  • Privacy Policy
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • e1b50cd