Forensic Summary
Meta has released Llama Guard 4, a 12B multimodal safety classifier designed to detect and filter unsafe content in both image and text inputs/outputs for production LLM deployments. The model addresses jailbreak attempts and harmful content generation across 14 hazard categories defined by the MLCommons taxonomy. Alongside it, two lightweight Llama Prompt Guard 2 classifiers (86M and 22M parameters) target prompt injection and prompt attack detection.
Read the full technical deep-dive on Grid the Grey: https://gridthegrey.com/posts/welcoming-llama-guard-4-on-hugging-face-hub/
Top comments (0)