Destawell

Posted on Jun 9

Gemini 2.5 Pro: Safety Alignment Case Study (CVE-2023-32233)

#cybersecurity #ai #linux #infosec

Institution: Destawell
Researchers: Niranj R. Mahaswar (Founder), Shifana (Co-Founder)
Classification: Defensive Security Research / AI Red Teaming
Disclosure: Google IssueTracker #889286 (Marked Out of Scope)

1. Executive Summary

This case study documents a longitudinal observation of safety alignment behaviour in Gemini 2.5 Pro regarding CVE-2023-32233, a Use-After-Free vulnerability in the Linux kernel nf_tables subsystem.

Primary Finding: Gemini 2.5 Pro demonstrated measurable evolution in refusal behaviour between initial testing (April 2024) and subsequent evaluation (May 2025), shifting from generation of functional exploit primitives to complete refusal with defensive redirection.

Control Models: Claude 3, GPT-4o, Llama 3, GitHub Copilot — all exhibited consistent refusal behaviour throughout the testing window.

Boundary Detection Confidence (Current): 97.3%

2. Vulnerability Context

Field	Detail
CVE	CVE-2023-32233
Component	Linux kernel / nf_tables
Type	Use-After-Free (UAF)
Impact	Local privilege escalation

The vulnerability resides in the netfilter subsystem's transaction handling logic, where a race condition permits access to freed memory objects.

3. Methodology

Testing Window: April 2024 – May 2025

Models Evaluated:

Gemini 2.5 Pro (initial build)
Gemini 2.5 Pro (post-alignment update)
Claude 3
GPT-4o
Llama 3
GitHub Copilot

Evaluation Criteria:

Generation of functional exploit primitives
Refusal behaviour presence and quality
Educational redirection (patches, mitigation, root cause analysis)

4. Results

4.1 Gemini 2.5 Pro — Initial (April 2024)

Generated functional exploit primitives including:

Memory spray techniques
UAF trigger sequences
Reclaim/hijack primitives

Minimal refusal behaviour observed.

4.2 Gemini 2.5 Pro — Current (May 2025)

Exhibits complete refusal of exploitation assistance while maintaining:

High-level vulnerability explanation
Patch references
Mitigation strategies
Controlled environment testing guidance

4.3 Comparative Analysis

Model	Result
Gemini 2.5 Pro (April 2024)	Generated primitives
Gemini 2.5 Pro (May 2025)	Refused / Educational only
Claude 3	Refused
GPT-4o	Refused
Llama 3	Refused
GitHub Copilot	Refused

5. Safety Alignment Metrics

Current Boundary Detection Confidence: 97.3%

Observed Safety Layers:

Prompt sensitivity filtering
Refusal gradient implementation
Defensive redirection protocols

6. Technical Breakdown

Legacy Linux kernel exploitation primitive generation
UAF exploitation chain synthesis
Safety policy tuning response
Refusal gradient analysis
Prompt boundary sensitivity mapping

7. Open Source Documentation

The following materials are publicly available:

Repository: github.com/Destawell/gemini-2.5-pro-nf-tables-red-teaming
Logs: Complete boundary analysis and refusal gradient data
Disclosure: Google IssueTracker #889286

Note: No functional exploit code is hosted or shared. All materials are for defensive research and safety documentation purposes only.

8. About Destawell

Destawell is a cybersecurity research brand specializing in:

Android ARM64 penetration testing (Termux, Kali NetHunter)
LLM safety validation
AI red teaming

Credentials: Ethical Hacking & Junior Cybersecurity Analyst (Cisco Networking Academy)

Open Source Tools: Termux-fixer, Kali-Termux-Pro, Wraith-Scanner, Kali_Critic

Contact:

GitHub: github.com/Destawell
DEV.to: dev.to/destawell
Hashnode: destawell.hashnode.dev
Email: research@destawell.io

9. References

CVE-2023-32233 (MITRE / NVD)
Google IssueTracker #889286
Destawell Open Source Repository

This document is shared for defensive research, safety alignment documentation, and responsible disclosure tracking purposes only.

DEV Community