Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
発見する
投稿する
  • ようこそAIIDへ
  • インシデントを発見
  • 空間ビュー
  • テーブル表示
  • リスト表示
  • 組織
  • 分類法
  • インシデントレポートを投稿
  • 投稿ランキング
  • ブログ
  • AIニュースダイジェスト
  • リスクチェックリスト
  • おまかせ表示
  • サインアップ
閉じる
発見する
投稿する
  • ようこそAIIDへ
  • インシデントを発見
  • 空間ビュー
  • テーブル表示
  • リスト表示
  • 組織
  • 分類法
  • インシデントレポートを投稿
  • 投稿ランキング
  • ブログ
  • AIニュースダイジェスト
  • リスクチェックリスト
  • おまかせ表示
  • サインアップ
閉じる

レポート 3226

関連インシデント

インシデント 9963 Report
Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Loading...
Sarah Silverman is suing OpenAI and Meta for copyright infringement
theverge.com · 2023

Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.

The suits alleges, among other things, that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally-acquired datasets containing their works, which they say were acquired from “shadow library” websites like Bibliotik, Library Genesis, Z-Library, and others, noting the books are “available in bulk via torrent systems.”

Golden and Kadrey each declined to comment on the lawsuit, while Silverman’s team did not respond by press time.

In the OpenAI suit, the trio offers exhibits showing that when prompted, ChatGPT will summarize their books, infringing on their copyrights. Silverman’s Bedwetter is the first book shown being summarized by ChatGPT in the exhibits, while Golden’s book Ararat is also used as an example, as is Kadrey’s book Sandman Slim. The claim says the chatbot never bothered to “reproduce any of the copyright management information Plaintiffs included with their published works.”

As for the separate lawsuit against Meta, it alleges the authors’ books were accessible in datasets Meta used to train its LLaMA models, a quartet of open-source AI Models the company introduced in February.

The complaint lays out in steps why the plaintiffs believe the datasets have illicit origins — in a Meta paper detailing LLaMA, the company points to sources for its training datasets, one of which is called ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”

In both claims, the authors say that they “did not consent to the use of their copyrighted books as training material” for the companies’ AI models. Their lawsuits each contain six counts of various types of copyright violations, negligence, unjust enrichment, and unfair competition. The authors are looking for statutory damages, restitution of profits, and more.

Lawyers Joseph Saveri and Matthew Butterick, who are representing the three authors, write on their LLMlitigation website that they’ve heard from “writers, authors, and publishers who are con­cerned about [ChatGPT’s] uncanny abil­ity to gen­er­ate text sim­i­lar to that found in copy­righted tex­tual mate­ri­als, includ­ing thou­sands of books.”

Saveri has also started litigation against AI companies on behalf of programmers and artists. Getty Images also filed an AI lawsuit, alleging that Stability AI, who created the AI image generation tool Stable Diffusion, trained its model on “millions of images protected by copyright.” Saveri and Butterick are also representing authors Mona Awad and Paul Tremblay in a similar case over the company’s chatbot.

Lawsuits like this aren’t just a headache for OpenAI and other AI companies; they are challenging the very limits of copyright. As we’ve said on The Vergecast every time someone gets Nilay going on copyright law, we’re going to see lawsuits centered around this stuff for years to come.

We’ve reached out to Meta, OpenAI, and the Joseph Saveri Law Firm for comment, but they did not respond by press time.

情報源を読む

リサーチ

  • “AIインシデント”の定義
  • “AIインシデントレスポンス”の定義
  • データベースのロードマップ
  • 関連研究
  • 全データベースのダウンロード

プロジェクトとコミュニティ

  • AIIDについて
  • コンタクトとフォロー
  • アプリと要約
  • エディタのためのガイド

インシデント

  • 全インシデントの一覧
  • フラグの立ったインシデント
  • 登録待ち一覧
  • クラスごとの表示
  • 分類法

2024 - AI Incident Database

  • 利用規約
  • プライバシーポリシー
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • e1b50cd