Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse

Incident 997: Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

Description: Court records reveal that Meta employees allegedly discussed pirating books to train LLaMA 3, citing cost and speed concerns with licensing. Internal messages suggest Meta accessed LibGen, a repository of over 7.5 million pirated books, with apparent approval from Mark Zuckerberg. Employees allegedly took steps to obscure the dataset’s origins. OpenAI has also been implicated in using LibGen.
Editor Notes: Please refer to these two legal filings for more information; the incident date of 02/28/2023 is drawn from (2): (1) Case 3:23-cv-03417-VC, Document 417-6, filed 02/05/2025, Exhibit C, https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.449.4.pdf; and (2) Case 3:23-cv-03417-VC, Document 449-4, filed 02/20/2025, Woodhouse Exhibit 4, Exhibit C, https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.449.4.pdf. See also Incidents 995 and especially 996 for similarly related cases.

Tools

New ReportNew ReportNew ResponseNew ResponseDiscoverDiscoverView HistoryView History

Entities

View all entities
Alleged: OpenAI , Meta , OpenAI models , Llama 3 , Library Genesis (LibGen) , GPT-4 and BitTorrent developed and deployed an AI system, which harmed Writers , publishers , Journalists , Authors and Academic researchers.
Alleged implicated AI systems: OpenAI models , Llama 3 , Library Genesis (LibGen) , GPT-4 and BitTorrent

Incident Stats

Incident ID
997
Report Count
4
Incident Date
2023-02-28
Editors
Daniel Atherton

Incident Reports

Reports Timeline

Incident Occurrence+1
Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal
+1
The Unbelievable Scale of AI’s Pirated-Books Problem
Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

wired.com

Court docs allege Meta trained AI model using LibGen

Court docs allege Meta trained AI model using LibGen

theregister.com

The Unbelievable Scale of AI’s Pirated-Books Problem

The Unbelievable Scale of AI’s Pirated-Books Problem

theatlantic.com

Authors to protest in London against Meta AI trained using ‘shadow library’

Authors to protest in London against Meta AI trained using ‘shadow library’

theguardian.com

Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal
wired.com · 2025

Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its artificial intelligence models. Against the company’s wishes, a court unredacted informati…

Court docs allege Meta trained AI model using LibGen
theregister.com · 2025

Meta allegedly downloaded material from an online source that’s been sued for breaching copyright, because it wanted the material to train its AI models, according to a new court filing.

The accusation was made in a document [PDF] filed in …

The Unbelievable Scale of AI’s Pirated-Books Problem
theatlantic.com · 2025

Editor's note: This analysis is part of The Atlantic's investigation into the Library Genesis data set. You can access the search tool directly here. Find The Atlantic's search tool for movie and television writing used to train AI here.


W…

Authors to protest in London against Meta AI trained using ‘shadow library’
theguardian.com · 2025

Authors and other publishing industry professionals will stage a demonstration outside Meta’s London office today in protest of the organisation’s use of copyrighted books to train artificial intelligence.

Novelists Kate Mosse and Tracy Che…

Variants

A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.

Similar Incidents

Selected by our editors

The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

Dec 2023 · 2 reports

Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Oct 2020 · 2 reports
Previous IncidentNext Incident

Similar Incidents

Selected by our editors

The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

Dec 2023 · 2 reports

Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Oct 2020 · 2 reports

Research

  • Defining an “AI Incident”
  • Defining an “AI Incident Response”
  • Database Roadmap
  • Related Work
  • Download Complete Database

Project and Community

  • About
  • Contact and Follow
  • Apps and Summaries
  • Editor’s Guide

Incidents

  • All Incidents in List Form
  • Flagged Incidents
  • Submission Queue
  • Classifications View
  • Taxonomies

2024 - AI Incident Database

  • Terms of use
  • Privacy Policy
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • ecd56df