Associated Incidents
On November 13th (local time), Anthropic announced that a Chinese government-sponsored attacker group had exploited its AI model "Claude" to automate approximately 30 attacks against companies and governments. The espionage activity was detected in mid-September, and Anthropic investigated it over the next 10 days, uncovering the full scope of the operation.
Anthropic claims this is the first documented case of a large-scale cyberattack being carried out without substantial human intervention. The campaign represents a further escalation from the discovery of the "Vive Hacking" incident reported in August. While humans remained in the loop to direct the operation in the Vibe Hacking incident, this latest attack involved far less human involvement, despite its expanded scale.
Designated "GTG-1002," this Chinese government-sponsored group utilized AI's agent-like capabilities at an unprecedented level, using it not just as an advisor but to execute cyberattacks.
Claude was originally trained to avoid harmful behavior. The attackers had to jailbreak Claude to participate in the attack. They did this by breaking it down into small, seemingly innocuous tasks and deceiving its guardrails by telling Claude that they were an employee of a legitimate cybersecurity company and being used to test defenses. This framework used "Claude Code" as an automated tool to autonomously perform approximately 80-90% of the tactical tasks of reconnaissance, vulnerability discovery, exploitation, credential theft, data analysis, and data exfiltration.
The operation targeted approximately 30 targets worldwide, including major technology companies, financial institutions, chemical manufacturers, and government agencies, with a small number of successful intrusions confirmed. The attackers notably gained access to confirmed high-value targets, including major technology companies and government agencies. However, AI occasionally creates hallucinations, such as fabricating non-working credentials or claiming publicly available information as confidential, hindering fully autonomous cyberattacks.
Upon detecting this activity, Anthropic immediately launched an investigation, terminating the exploited accounts, notifying affected entities appropriately, and collaborating with authorities while gathering actionable insights.
As a future response, Anthropic is expanding its detection capabilities and developing specialized cyberattack classifiers to flag malicious activity. It is also developing a prototype system for proactive early detection of autonomous cyberattacks and new techniques for investigating and mitigating large-scale, distributed attacks.
Anthropic also emphasizes that this technology is important for defense, and encourages companies to apply AI to defense in areas such as security operations center (SOC) automation, threat detection, vulnerability assessment, and incident response. Furthermore, the company emphasizes the need for continued investment in safeguards across AI platforms to prevent adversarial misuse.