Incident 996: Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Description: Meta and Bloomberg allegedly used Books3, a dataset containing 191,000 pirated books, to train their AI models, including LLaMA and BloombergGPT, without author consent. Lawsuits from authors such as Sarah Silverman and Michael Chabon claim this constitutes copyright infringement. Books3 includes works from major publishers like Penguin Random House and HarperCollins. Meta argues its AI outputs are not "substantially similar" to the original books, but legal challenges continue.

Tools

New Report New Response DiscoverView History

Entities

View all entities

Alleged: Various generative AI developers , Meta , EleutherAI , Bloomberg , The Pile and Shawn Presser developed an AI system deployed by Various generative AI developers , Meta , EleutherAI and Bloomberg, which harmed Zadie Smith , Writers , Verso , Stephen King , Sarah Silverman , Richard Kadrey , Publishers found in Books3 , Penguin Random House , Oxford University Press , Over 170,000 authors found in Books3 , Michael Pollan , Margaret Atwood , Macmillan , HarperCollins , General public , Creative industries , Christopher Golden and Authors.

Alleged implicated AI systems: The Pile , LLaMA , hugging face , GPT-J , Books3 , BloombergGPT and Bibliotik

Incident Stats

Incident ID

996

Report Count

Incident Date

2020-10-25

Editors

Daniel Atherton

Incident Reports

Reports Timeline

Sarah Silverman is suing OpenAI and Meta for copyright infringement

theverge.com

Revealed: The Authors Whose Pirated Books Are Powering Generative AI

theatlantic.com

AI guzzled millions of books without permission. Authors are fighting back.

washingtonpost.com

theverge.com · 2023

Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.

The suits alleges, among other things, that Op…

theatlantic.com · 2023

Updated at 1:40 p.m. ET on September 25, 2023

Editor's note: This article is part of The Atlantic's series on Books3. Check out our searchable Books3 database to find specific authors and titles. A deeper analysis of what is in the database…

washingtonpost.com · 2025

David Baldacci, the author of best-selling legal thrillers, watched his son ask ChatGPT to craft a plot in the style of a David Baldacci novel. Within five seconds, he told U.S. senators at a hearing this week on artificial intelligence and…

Variants

A "variant" is an AI incident similar to a known case—it has the same causes, harms, and AI system. Instead of listing it separately, we group it under the first reported incident. Unlike other incidents, variants do not need to have been reported outside the AIID. Learn more from the research paper.

Incident 996: Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Tools

Entities

Incident Stats

Incident Reports

Reports Timeline

Sarah Silverman is suing OpenAI and Meta for copyright infringement

Revealed: The Authors Whose Pirated Books Are Powering Generative AI

AI guzzled millions of books without permission. Authors are fighting back.

Sarah Silverman is suing OpenAI and Meta for copyright infringement

Revealed: The Authors Whose Pirated Books Are Powering Generative AI

AI guzzled millions of books without permission. Authors are fighting back.

Variants

Similar Incidents

Selected by our editors

Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

Similar Incidents

Selected by our editors

Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models