Incident 997: Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

Description: Court records reveal that Meta employees allegedly discussed pirating books to train LLaMA 3, citing cost and speed concerns with licensing. Internal messages suggest Meta accessed LibGen, a repository of over 7.5 million pirated books, with apparent approval from Mark Zuckerberg. Employees allegedly took steps to obscure the dataset’s origins. OpenAI has also been implicated in using LibGen.

Editor Notes: Please refer to these two legal filings for more information; the incident date of 02/28/2023 is drawn from (2): (1) Case 3:23-cv-03417-VC, Document 417-6, filed 02/05/2025, Exhibit C, https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.449.4.pdf; and (2) Case 3:23-cv-03417-VC, Document 449-4, filed 02/20/2025, Woodhouse Exhibit 4, Exhibit C, https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.449.4.pdf. See also Incidents 995 and especially 996 for similarly related cases.

Tools

New Report New Response DiscoverView History

Entities

View all entities

Alleged: OpenAI , Meta , OpenAI models , Llama 3 , Library Genesis (LibGen) , GPT-4 and BitTorrent developed and deployed an AI system, which harmed Writers , publishers , Journalists , Authors and Academic researchers.

Alleged implicated AI systems: OpenAI models , Llama 3 , Library Genesis (LibGen) , GPT-4 and BitTorrent

Incident Stats

Incident ID

997

Report Count

Incident Date

2023-02-28

Editors

Daniel Atherton

Incident Reports

Reports Timeline

Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

wired.com

Court docs allege Meta trained AI model using LibGen

theregister.com

The Unbelievable Scale of AI’s Pirated-Books Problem

theatlantic.com

Authors to protest in London against Meta AI trained using ‘shadow library’

theguardian.com

wired.com · 2025

Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its artificial intelligence models. Against the company’s wishes, a court unredacted informati…

theregister.com · 2025

Meta allegedly downloaded material from an online source that’s been sued for breaching copyright, because it wanted the material to train its AI models, according to a new court filing.

The accusation was made in a document [PDF] filed in …

theatlantic.com · 2025

Editor's note: This analysis is part of The Atlantic's investigation into the Library Genesis data set. You can access the search tool directly here. Find The Atlantic's search tool for movie and television writing used to train AI here.

W…

theguardian.com · 2025

Authors and other publishing industry professionals will stage a demonstration outside Meta’s London office today in protest of the organisation’s use of copyrighted books to train artificial intelligence.

Novelists Kate Mosse and Tracy Che…

Variants

A "variant" is an AI incident similar to a known case—it has the same causes, harms, and AI system. Instead of listing it separately, we group it under the first reported incident. Unlike other incidents, variants do not need to have been reported outside the AIID. Learn more from the research paper.

Seen something similar?