Incident 555: OpenAI's Training Data for LLMs Allegedly Comprised of Copyrighted Books

Description: Two authors alleged in a class action lawsuit OpenAI infringed authors' copyrights by incorporating illegal "shadow libraries" offering copyrighted books without permission in the training data of its generative LLMs, such as ChatGPT.
Editor Notes: Clarified title.


New ReportNew ReportNew ResponseNew ResponseDiscoverDiscoverView HistoryView History
Alleged: OpenAI developed and deployed an AI system, which harmed Paul Tremblay , Mona Awad and authors of copyrighted works.

Incident Stats

Incident ID
Report Count
Incident Date
Khoa Lam

Incident Reports

Reports Timeline

Lawsuit says OpenAI violated US authors' copyrights · 2023

Two US authors sued OpenAI in San Francisco federal court, claiming in a proposed class action that the company misused their works to "train" its popular generative artificial-intelligence system ChatGPT.

Massachusetts-based writers Paul T…


A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents