Common Crawl
Incidentes involucrados como desarrollador e implementador
Incidente 9561 Reporte
Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks
2025-02-28
A dataset used to train large language models allegedly contained 12,000 live API keys and authentication credentials. Some of these were reportedly still active and allowed unauthorized access. Truffle Security found these secrets in a December 2024 Common Crawl archive, which spans 250 billion web pages. The affected credentials could have been exploited for unauthorized data access, service disruptions, financial fraud, and a variety of other malicious uses.
MásIncidents implicated systems
Incidente 10442 Reportes
Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination
2025-04-15
Researchers reportedly traced the appearance of the nonsensical phrase "vegetative electron microscopy" in scientific papers to contamination in AI training data. Testing indicated that large language models such as GPT-3, GPT-4, and Claude 3.5 may reproduce the term. The error allegedly originated from a digitization mistake that merged unrelated words during scanning, and a later translation error between Farsi and English.
Más