Description: Court records reveal that Meta employees allegedly discussed pirating books to train LLaMA 3, citing cost and speed concerns with licensing. Internal messages suggest Meta accessed LibGen, a repository of over 7.5 million pirated books, with apparent approval from Mark Zuckerberg. Employees allegedly took steps to obscure the dataset’s origins. OpenAI has also been implicated in using LibGen.
Editor Notes: Please refer to these two legal filings for more information; the incident date of 02/28/2023 is drawn from (2): (1) Case 3:23-cv-03417-VC, Document 417-6, filed 02/05/2025, Exhibit C, https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.449.4.pdf; and (2) Case 3:23-cv-03417-VC, Document 449-4, filed 02/20/2025, Woodhouse Exhibit 4, Exhibit C, https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.449.4.pdf. See also Incidents 995 and especially 996 for similarly related cases.
Entities
View all entitiesAlleged: OpenAI , Meta , OpenAI models , Llama 3 , Library Genesis (LibGen) , GPT-4 and BitTorrent developed and deployed an AI system, which harmed Writers , publishers , Journalists , Authors and Academic researchers.
Alleged implicated AI systems: OpenAI models , Llama 3 , Library Genesis (LibGen) , GPT-4 and BitTorrent
Incident Stats
Incident ID
997
Report Count
3
Incident Date
2023-02-28
Editors
Daniel Atherton
Incident Reports
Reports Timeline

Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its artificial intelligence models. Against the company’s wishes, a court unredacted informati…

Meta allegedly downloaded material from an online source that’s been sued for breaching copyright, because it wanted the material to train its AI models, according to a new court filing.
The accusation was made in a document [PDF] filed in …
Variants
A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.