LLMs and generative AI systems are rapidly deployed across industries, and their scale is creating fresh opportunities for threat actors.

Recently, a threat report from Anthropic discussed malicious uses of the AI model Claude. While the report is super interesting, it lacks critical actionable insights for threat analysts to be truly valuable (in my opinion 🤓). That being said, it doesn't diminish the great work they did.

So let me fix that and transform this report into practical intelligence you can use right now!

Before jumping into the details, if you want to master practical AI for threat intelligence and gain an unfair advantage, I am running an advanced training at BlackHat USA. Drop me a message if you are interested!

Disclaimer: this post is my personal view and it is not affiliated with my employer.

Insight From The Report

Okay, back to the Anthropic report. The report titled "Detecting and Countering Malicious Uses of Claude: March 2025" was published on April 24. It describes several cases where threat actors misused their Claude models despite existing security measures.

The Anthropic team detected and banned accounts involved in these activities. Four cases were discussed in the report.

Influence-as-a-Service Operation:
A professional service used Claude to orchestrate over 100 social media bots. The model decided when bots should engage with political content. Engagement involved tens of thousands of authentic accounts across multiple countries. The operation promoted moderate narratives rather than seeking virality.
Credential Stuffing and IoT Camera Targeting:
An actor used Claude to improve their scraping toolkits, target leaked credentials related to security cameras, and develop systems for unauthorized access. No real-world success confirmed.
Recruitment Fraud Campaign:
An actor targeting Eastern European job seekers used Claude to polish scam messages, impersonate hiring managers, and create convincing narratives. Success of the scams was not confirmed.
Malware Development by Novice Actor:
A low-skilled individual leveraged Claude to build advanced malware tools, evolving from simple scripts to GUI-based payload generators focusing on persistence and evasion. No deployment confirmed.

These are perfect examples of how threat actors can leverage AI. However, some pieces are missing that could be relevant for intelligence.

Missing Pieces of the Puzzle

Though the report is useful, it misses critical details that could have been relevant. The following list is not exhaustive:

No Indicators of Compromise of any sort
Missing specifics such as IP addresses, API keys, or account details
Lack of context about credentials accessed or industries targeted by recruitment scams
No social media accounts mentioned or identified for the influence operation (there is screenshots and content though)
No examples of code, C2 infrastructure, or technical details for the malware development case
And something I consider very important: the prompts used by the threat actors

In a Twitter post I previously shared, I mentioned that prompts are becoming the IOCs of tomorrow.

As you guessed, this blog post will focus on prompts and how we can identify prompt-based TTPs or LLM TTPs.

What exactly are LLM TTPs?

LLM TTPs (Large Language Model Tactics, Techniques, and Procedures) refer to the specific methods adversaries use to abuse, misuse, or exploit Large Language Models. (This is a term I coined, as I am not sure if something official exists yet?)

These methods include crafting malicious prompts, evading model security, and leveraging model outputs for cyberattacks, influence operations, phishing, or other malicious activities, but not limited too.

Because prompts are usually the primary entry point, it makes sense to classify these techniques to allow threat analysts to better identify and understand potential adversarial methods.

For those unfamiliar, the MITRE ATLAS matrix is a resource for mapping AI-related TTPs. It categorizes techniques and tactics that adversaries use to attack, manipulate, or exploit AI systems, similar to how the ATT&CK matrix documents behaviors in traditional cybersecurity operations.

Additionally, last year, OpenAI and Microsoft released a proposal, mapping LLM usage to adversarial TTPs, to complement the MITRE ATLAS matrix.

This proposal maps LLM TTPs to identify how prompts were used. I created an infographic to help you keep a reference.

Prompts Are the New IOCs

As mentioned above, in AI systems, and specifically with LLMs, prompts are central because they are the main way to interact with a model.

In the Anthropic report, exact prompts were not shared, so the only option we have is to infer what threat actors could have used based on the available information. From these inferences, we can create NOVA rules to detect these TTPs.

If you are not familiar, NOVA is the first prompt pattern-matching tool designed specifically to detect and hunt adversarial prompts. This is an open source project. It uses detection rules similar to YARA but tailored for prompts. I wrote a detailed blog post, but you can also jump directly to the documentation. 👇

AIID's editor's note: See the original report for the additional source included in this spot.

The power of NOVA lies in the flexibility and customization to allows any security team to proactively hunt and monitor prompts rather than relying on black boxes solutions.

So to be able to create our NOVA rules, we first need to define how potential prompts might look. Based on the data and with help of AI models, I created the following prompts:

1. Likely prompts for Influence-as-a-Service Operation

"Generate a comment supporting [political narrative] in a casual tone."
"Should my bot account like, comment, or ignore this post to support [political goal]?"
"Create a politically-aligned persona for social media in [country], supporting [ideology]"
"Write a natural reply agreeing with this political stance in [language]."

The following NOVA rules covers these prompts: