Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse
Entities

Common Crawl

Incidents involved as both Developer and Deployer

Incident 9561 Report
Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

2025-02-28

A dataset used to train large language models allegedly contained 12,000 live API keys and authentication credentials. Some of these were reportedly still active and allowed unauthorized access. Truffle Security found these secrets in a December 2024 Common Crawl archive, which spans 250 billion web pages. The affected credentials could have been exploited for unauthorized data access, service disruptions, financial fraud, and a variety of other malicious uses.

More

Incidents implicated systems

Incident 10442 Report
Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

2025-04-15

Researchers reportedly traced the appearance of the nonsensical phrase "vegetative electron microscopy" in scientific papers to contamination in AI training data. Testing indicated that large language models such as GPT-3, GPT-4, and Claude 3.5 may reproduce the term. The error allegedly originated from a digitization mistake that merged unrelated words during scanning, and a later translation error between Farsi and English.

More

Related Entities
Other entities that are related to the same incident. For example, if the developer of an incident is this entity but the deployer is another entity, they are marked as related entities.
 

Entity

Microsoft

Incidents involved as both Developer and Deployer
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

OpenAI

Incidents involved as both Developer and Deployer
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Microsoft Azure OpenAI Service

Incidents involved as Deployer
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

AWS

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Slack

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Mailchimp

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Google

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Intel

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Huawei

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

PayPal

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

IBM

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Tencent

Incidents Harmed By
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Common Crawl dataset (December 2024 archive)

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Microsoft Copilot

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Google Gemini

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Anthropic Claude

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

ChatGPT

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

xAI Grok

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

DeepSeek

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

LLMs trained on compromised data

Incidents implicated systems
  • Incident 956
    1 Report

    Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

More
Entity

Anthropic

Incidents involved as both Developer and Deployer
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Researchers

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

Incidents involved as Deployer
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Scientific authors

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

Incidents involved as Deployer
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Scientific publishers

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Peer reviewers

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Scholars

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Readers of scientific publications

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Scientific record

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Academic integrity

Incidents Harmed By
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

GPT-3

Incidents implicated systems
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

GPT-4

Incidents implicated systems
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More
Entity

Claude 3.5

Incidents implicated systems
  • Incident 1044
    2 Reports

    Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

More

Research

  • Defining an “AI Incident”
  • Defining an “AI Incident Response”
  • Database Roadmap
  • Related Work
  • Download Complete Database

Project and Community

  • About
  • Contact and Follow
  • Apps and Summaries
  • Editor’s Guide

Incidents

  • All Incidents in List Form
  • Flagged Incidents
  • Submission Queue
  • Classifications View
  • Taxonomies

2024 - AI Incident Database

  • Terms of use
  • Privacy Policy
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • 9427ecd