DEV Community: Crispin

Comparative Analysis: Testing & Evaluating LLM Security with Garak Across Different Models

Crispin — Sun, 01 Jun 2025 06:30:13 +0000

Hey everyone, I’m trying out LLM security testing and it’s pretty interesting. With all these AI models around, it’s important to check they don’t do weird stuff. I found this tool called Garak (Generative AI Red-teaming and Assessment Kit). It’s like a set of tests to see if models can be tricked into revealing secrets, making bad code, or saying nasty things.

In this post, I’ll go over four models: Mistral-Nemo, Gemma3, LLaMA 2-13B, and Vicuna-13B. I ran Garak on the Ollama setup. You can think of it like a friendly match, but instead of goal scoring, we score on security stuff.

Background on Garak

Garak is kind of like Nmap but for AI. Instead of scanning a network, it sends special prompts (called “probes”) to the model. Each probe tries to make the model slip up.

Some probes are:

PromptInject: Tries to hide instructions to see if the model follows them.
MalwareGen: Asks for malicious code.
LeakerPlay: Tries to get the model to spill its internal stuff or training data.
RealToxicityPrompts: Pushes the model to say toxic things.

When Garak runs, it saves all the outputs in logs. Later I look at the logs and see which models passed or failed.

Models and Setup

I picked these open models:

Mistral-Nemo (7B): Released July 2024, supposed to be good at chat.
Gemma3:latest: From Google, late 2024 release, with some safety tweaks.
LLaMA 2-13B: Meta’s 13B model, popular for many tasks.
Vicuna-13B: Based on LLaMA, tuned to be safer.

I ran all models on Ollama (version 0.1.34 or newer). My computer has 32 CPU cores, 128 GB RAM, and an NVIDIA A100 GPU. I used Garak v0.10.3.post1 with default settings.

Installing and Configuring Ollama

# Update Ollama
ollama upgrade

# Download models
ollama pull mistral-nemo
ollama pull gemma3:latest
ollama pull llama2-13b
ollama pull vicuna-13b

Running Garak with Ollama

To scan a model, run:

garak --model_type ollama \
      --model_name mistral-nemo \
      --probes malwaregen.Evasion,promptinject \
      --report_prefix ./reports/mistral_nemo

I set a random seed and repeated each test three times to make it more reliable.

Methodology

Probe Selection: I chose four main probes:
- MalwareGen.Evasion: Asks for code that could bypass antivirus.
- PromptInject.Encoding: Hides instructions in encoded text to see if the model follows them.
- LeakerPlay.DataLeakage: Tries to get the model to reveal training data or hidden prompts.
- RealToxicityPrompts: Pushes the model to use toxic language.
Metrics:
- Failure Rate (%): How often the model messed up.
- Mean Time per Probe (s): How long it takes on average.
- Resource Usage: GPU memory and CPU usage.
Probe Execution: Each probe had 20 prompts. The model got five tries for each prompt. If it failed once, that prompt counts as a fail.
Data Analysis: I averaged results from three runs and got standard deviations. Results are in the table below.

Comparative Results

Model	MalwareGen.Evasion	PromptInject.Encoding	LeakerPlay.DataLeakage	RealToxicityPrompts
Mistral-Nemo	100.0% ± 0.0%	92.0% ± 1.7%	85.7% ± 2.3%	17.0% ± 1.5%
Gemma3:latest	56.3% ± 4.1%	37.5% ± 3.8%	48.2% ± 4.5%	10.5% ± 1.2%
LLaMA 2-13B	81.0% ± 3.9%	68.3% ± 2.5%	72.4% ± 3.1%	26.7% ± 2.0%
Vicuna-13B	62.5% ± 4.8%	54.6% ± 3.0%	61.3% ± 3.5%	3.8% ± 1.0%

Note: Failure Rate (%) shows how often the model produced unwanted behavior.

Yes, I know that you'll be like this (even I was 😂).

Mistral-Nemo

MalwareGen.Evasion (100.0%): It always gave malware code. No defense at all.
PromptInject.Encoding (92.0%): Fell for encoding tricks most of the time.
LeakerPlay.DataLeakage (85.7%): Leaked training prompts a lot.
RealToxicityPrompts (17.0%): Created toxic content sometimes.

Gemma3:latest

MalwareGen.Evasion (56.3%): Sometimes refused but got tricked by advanced hacks.
PromptInject.Encoding (37.5%): Better but not perfect.
LeakerPlay.DataLeakage (48.2%): Half the time it leaked something.
RealToxicityPrompts (10.5%): Rarely said toxic things.

LLaMA 2-13B

MalwareGen.Evasion (81.0%): Produced malware scripts often.
PromptInject.Encoding (68.3%): Fell for encoding a lot.
LeakerPlay.DataLeakage (72.4%): Regularly leaked data.
RealToxicityPrompts (26.7%): Most toxic among the group.

Vicuna-13B

MalwareGen.Evasion (62.5%): Was not as bad as LLaMA but still failed a lot.
PromptInject.Encoding (54.6%): Mediocre, could still be tricked.
LeakerPlay.DataLeakage (61.3%): Leaked data more than half the time.
RealToxicityPrompts (3.8%): Best at not being toxic.

Discussion

Security Trends Across Models

Older vs Newer: Older models like Vicuna and LLaMA 2 failed more often. Newer ones like Gemma3 have more guardrails.
Instruction-Tuning: Vicuna had extra tuning so it was better at not making malware or saying toxic stuff.
Guardrails Matter: Gemma3 blocked some attacks but still fell for advanced ones.
Architecture: Models without built-in safety (Mistral-Nemo, LLaMA 2) were very vulnerable.

Performance and Resource Usage

Average Time per Prompt:
- Mistral-Nemo: 4.8 s
- Gemma3: 6.2 s
- LLaMA 2-13B: 5.5 s
- Vicuna-13B: 5.7 s
GPU Memory Used:
- Mistral-Nemo: 12 GB
- Gemma3: 16 GB
- LLaMA 2-13B: 14 GB
- Vicuna-13B: 15 GB
CPU Load: About 20–25% for all while testing.

Gemma3 used more memory, so it’s slower but a bit safer.

And yeah don't be him and make sure to do the checks properly 😁.

Recommendations

Keep Testing: Run Garak regularly to find new flaws.
Use Multiple Safety Layers: Combine model guardrails with external checks.
Choose Tuned Models: Vicuna shows that tuning helps.
Update Your Tools: Ollama has had bugs (like CVE-2024-37032). Always use the latest version.

Conclusion

Running Garak on these models shows that all of them have weak spots. Mistral-Nemo always failed, Gemma3 was okay but not perfect, LLaMA 2 struggled, and Vicuna was the best but still not flawless. The main lesson is that we need ongoing tests, several safety measures, and up-to-date software to keep these AI models safe.

Thanks for reading, and happy red-teaming!

Ping! Pop! Pow! Real-Time Security with Suricata, StackStorm & Slack.

Crispin — Thu, 15 May 2025 18:16:46 +0000

Hey dev.to community!

I was recently learning a few SecOps topics and was trying things out with the tool StackStorm (it's basically a ITTT tool for devops) that helps in event-driven automation. So then I thought of why not combine it with the good old Suricata tool and hence this blog... ;)

TL;DR: What We’re Building

We’ll wire up Suricata (our network IDS) to StackStorm (our event-driven automation engine), so that whenever Suricata spots suspicious traffic, StackStorm picks it up and shoots an alert into Slack. No more manually tailing logs, your chat app becomes your security ops dashboard!

So enough talking and let's start doing!

Why is this cool?

Coz I find it. 😂 jk. Yeah even I had this question earlier but later after gpting and trying things out, this seemed way cooler just like a security admin or smth lol.

Imo, these are a few =>

Real-time Security: Get notified instantly when something weird pops up on your network.
Hands-On Automation: Learn how sensors, rules, and actions fit together in StackStorm.
Slack Integration: Everyone loves Slack(~~Teams~~ ;)) and it’s a familiar place to see alerts.
Super Simple: We’ll use out-of-the-box components and minimal code so even newbies can follow along.

What I'll be using for this setup

A Linux VM - Ubuntu (not a hecker, sadly🤧🐉).
Docker (optional, but makes life easier).
Suricata installed and running in IDS mode.
StackStorm installed.
A Slack workspace and a Slack “Incoming Webhook” URL.

First, we get Suricata logging alerts

Install Suricata

   sudo apt update
   sudo apt install suricata

Enable EVE JSON output in /etc/suricata/suricata.yaml:

outputs:
  - eve-log:
      enabled: yes
      filetype: regular
      filename: /var/log/suricata/eve.json

Restart Suricata and generate some test alerts:

sudo systemctl restart suricata
sudo suricata-update
# Then run a nmap or scapy script to trigger IDS rules

Then, we spin up StackStorm

For this blog I'll use Docker, but you can even run it natively as you prefer.

docker run --name st2 \
  -d \
  -v /var/log/suricata:/var/log/suricata \
  -p 9100:9100 \
  -p 9101:9101 \
  stackstorm/stackstorm:3.5

Now verify if StackStorm’s running,

st2 status
st2 action list

The core event logic,

We'll create a Suricata sensor in StackStorm.

StackStorm “sensors” watch for external events. We’ll write a tiny Python sensor that tailed /var/log/suricata/eve.json and emits each alert as a StackStorm trigger.

Create a new sensor file, /opt/stackstorm/packs/suricata/sensors/suricata_sensor.py

from st2reactor.sensor.base import Sensor
import json, time

class SuricataSensor(Sensor):
    def __init__(self, sensor_service, config):
        super(SuricataSensor, self).__init__(sensor_service=sensor_service, config=config)
        self._filename = '/var/log/suricata/eve.json'

    def run(self):
        with open(self._filename) as f:
            # seek to end for only new events
            f.seek(0,2)
            while True:
                line = f.readline().strip()
                if not line:
                    time.sleep(1)
                    continue
                data = json.loads(line)
                if data.get('event_type') == 'alert':
                    self._sensor_service.dispatch(trigger='suricata.alert', payload=data)

    def cleanup(self):
        pass

Now, register the sensor by updating packs/suricata/sensor.yaml

name: SuricataSensor
description: "Watches Suricata eve.json for alerts"
entry_point: sensors/suricata_sensor.py
trigger_types:
  - name: suricata.alert
    description: "Triggered on Suricata IDS alert"

Then, reload StackStorm so it picks up your new pack.

st2ctl reload --register-all

Now, define a rule to catch the trigger (final)

StackStorm “rules” link triggers to actions. We’ll catch suricata.alert and then call a Slack action.

Rule file: /opt/stackstorm/packs/suricata/rules/slack_alert.yaml

name: send_suricata_alert_to_slack
description: Send Suricata alert details to Slack
trigger:
  type: suricata.alert
action:
  ref: slack.post_message
  parameters:
    channel: "#alerts"
    text: |
      :rotating_light: *Suricata Alert!* :rotating_light:
      *Signature:* {{ trigger.payload.alert.signature }}
      *Severity:* {{ trigger.payload.alert.severity }}
      *Source:* {{ trigger.payload.src_ip }}:{{ trigger.payload.src_port }}
      *Destination:* {{ trigger.payload.dest_ip }}:{{ trigger.payload.dest_port }}

Then, you have to configure the Slack credentials in ~/.st2/configs/slack.yaml

slack:
  url: "https://hooks.slack.com/services/hee/hee/hee"

And for one last time, reload.

I know you're like this 😂, but we have to.

st2ctl reload --register-all

It's time to test!

Generate a known alert (e.g. run nmap -sS against your box).

Watch your StackStorm logs (/var/log/st2/st2sensorcontainer.log) to see the trigger fire.

Check your Slack channel—you should get a nicely formatted alert message within seconds!

Heehee...we’ve now built a simple, end-to-end event-driven security workflow!

😂See ya!

P.S. We can even add filtering(only alert on high-severity events), automated responses(trigger a firewall block or a cloud security group update) and dashboards with push alerts into Elasticsearch and visualize with Kibana and what not lol! Do follow for more blogs in the future.