Forensic Summary
Cisco's AI Threat Intelligence team has demonstrated that bounded pixel-level perturbations can recover the attack effectiveness of degraded typographic images against vision-language models (VLMs), enabling hidden prompt injection that bypasses both human review and content filters. The technique works by optimising perturbations against open-source embedding models and transferring results to proprietary systems like GPT-4o and Claude, exposing a cross-model transferability risk. The attack allows adversaries to embed instructions—such as data exfiltration commands—inside images that appear as visual noise to human observers.
Read the full technical deep-dive on Grid the Grey: https://gridthegrey.com/posts/pixel-level-perturbations-enable-invisible-prompt-injection-in-vision-language/
Top comments (0)