I believe that as AI developers, our responsibility extends beyond just shipping features. We are building systems that interact with humans in their most vulnerable moments.
I recently implemented a critical safety module for my platform, WTMF (https://wtmf.ai/), that detects and intervenes during crises, specifically focusing on self-harm and suicidal ideation.
In this post, I will break down the multi-layer architecture I built to handle these sensitive situations with zero latency and high empathy.
The Strategy: Detect, Flag, and Re-Steer
A standard AI response might not be sufficient when a user expresses distress. My solution involves three core layers:
- Identification: High-speed pattern matching for distress signals.
- Behavioral Transformation: Dynamic modification of the LLM's system prompt on the fly.
- Administrative Visibility: Persistent database flagging for human review.
1. High-Precision Crisis Detection
I avoided complex asynchronous NLP models for the initial detection to ensure zero latency. Instead, I used a battle-tested regular expression approach that flags the session immediately.
const selfHarmRegex = /\b(suicide|kill myself|end my life|want to die|self harm|cut myself|overdose|jump off|hang myself|hurt myself|no reason to live|slit|slit my wrists|knife)\b/i;
const isSelfHarm = selfHarmRegex.test(message);
2. Dynamic LLM Re-Steering
The most impactful part of this module is how it changes the AI's behavior. When a distress signal is detected, I don't just block the message. I re-steer the AI.
I dynamically inject a CRITICAL INSTRUCTION into the LLM's system prompt. This ensures that the response isn't just about the persona, but about human safety.
if (isSelfHarm) {
systemPrompt += "\n\nCRITICAL INSTRUCTION: The user has expressed thoughts of self-harm or suicide. You MUST be extremely supportive, empathetic, and gently encourage them to seek professional help or connect with loved ones. Do not give medical advice, but be a gentle, safe space for them.";
}
This transforms the AI from a standard assistant into a supportive, empathetic listener that prioritizes life over logic.
3. Persistent Safety Flagging
Finally, the session is flagged in the backend database. This allows administrators to prioritize these sessions for human intervention and support.
if (isSelfHarm) {
updateData.isFlagged = true;
updateData.flagReason = "self_harm";
}
await db.update(chats).set(updateData).where(eq(chats.id, chat.id));
Conclusion
Building with AI is powerful, but building with safety is essential. I am committed to ensuring that WTMF doesn't just process data, but handles human emotions with the care they deserve.
Check out WTMF: https://wtmf.ai/
What safety layers are you building into your AI projects? I would love to hear your thoughts in the comments.
Top comments (0)