Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
発見する
投稿する
  • ようこそAIIDへ
  • インシデントを発見
  • 空間ビュー
  • テーブル表示
  • リスト表示
  • 組織
  • 分類法
  • インシデントレポートを投稿
  • 投稿ランキング
  • ブログ
  • AIニュースダイジェスト
  • リスクチェックリスト
  • おまかせ表示
  • サインアップ
閉じる
発見する
投稿する
  • ようこそAIIDへ
  • インシデントを発見
  • 空間ビュー
  • テーブル表示
  • リスト表示
  • 組織
  • 分類法
  • インシデントレポートを投稿
  • 投稿ランキング
  • ブログ
  • AIニュースダイジェスト
  • リスクチェックリスト
  • おまかせ表示
  • サインアップ
閉じる

レポート 3562

関連インシデント

インシデント 62418 Report
Child Sexual Abuse Material Taints Image Generators

Loading...
Major Error Found in Stable Diffusion’s Biggest Training Dataset
analyticsvidhya.com · 2023

The integrity of a major AI image training dataset, LAION-5B, utilized by influential AI models like Stable Diffusion, has been compromised after the discovery of thousands of links to Child Sexual Abuse Material (CSAM). This revelation has triggered concerns about the potential ramifications of such content infiltrating the AI ecosystem.

The Unveiling of Disturbing Content

Stanford Internet Observatory researchers are the ones who uncovered the unsettling truth behind the LAION-5B dataset. They revealed that the dataset contained over 3,000 suspected instances of CSAM. This extensive dataset, integral to the AI ecosystem, faced removal following the shocking discovery made by the Stanford team.

LAION-5B's Temporary Removal

LAION is a non-profit organization responsible for creating open-source tools for machine learning. In response to the findings, the organization decided to temporarily take down its datasets, including LAION-5B and another named LAION-400M. The organization expressed a commitment to ensuring the safety of its datasets before republishing them.

The Methodology Behind the Discovery

The Stanford researchers employed a combination of perceptual and cryptographic hash-based detection methods to identify instances of suspected CSAM in the LAION-5B dataset. Their study raised concerns about the indiscriminate scraping of the internet for AI training purposes. It further emphasized the dangers associated with such practices.

The Ripple Effect on AI Companies

Major generative AI companies, including Stable Diffusion, relied on LAION-5B for training their models. The Stanford paper highlighted the potential influence of CSAM on AI model outputs and the reinforcement of harmful images within the dataset. The repercussions extended to other models, such as Google's Imagen, which found inappropriate content in LAION's datasets during an audit.

Our Say

The revelations about the inclusion of Child Sexual Abuse Material in the LAION-5B dataset underscore the need for responsible practices in the development and utilization of AI training datasets. The incident raises questions about the efficacy of existing filtering mechanisms and the responsibility of organizations to consult with experts in ensuring the safety and legality of their datasets. As the AI community grapples with these challenges, a comprehensive reevaluation of dataset creation processes is imperative to prevent the inadvertent perpetuation of illegal and harmful content through AI models.

情報源を読む

リサーチ

  • “AIインシデント”の定義
  • “AIインシデントレスポンス”の定義
  • データベースのロードマップ
  • 関連研究
  • 全データベースのダウンロード

プロジェクトとコミュニティ

  • AIIDについて
  • コンタクトとフォロー
  • アプリと要約
  • エディタのためのガイド

インシデント

  • 全インシデントの一覧
  • フラグの立ったインシデント
  • 登録待ち一覧
  • クラスごとの表示
  • 分類法

2024 - AI Incident Database

  • 利用規約
  • プライバシーポリシー
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • e1b50cd