sentinel-safety

Posted on Apr 25

Inside SENTINEL: How 13 Microservices Detect Child Grooming by Behavior, Not Keywords

#opensource #architecture #python #security

Keyword filters are a solved problem — solved by predators. They learned years ago to spell things differently, avoid flagged words, and simply groom slowly enough that no single message triggers a filter. The result: every major platform relying solely on keyword detection is running safety infrastructure that the most dangerous users have already mapped and bypassed.

SENTINEL takes a different approach. Instead of asking "does this message contain a bad word?", it asks "does this person's behavior, over time, resemble the trajectory of a predator approaching a minor?"

This post covers how that works at an engineering level.

The Four Signal Layers

SENTINEL's risk scoring is built on four independent signal layers feeding into a weighted ensemble:

1. Linguistic Analysis

NLP signals beyond keyword matching: sentiment trajectory across a conversation, escalation in intimacy markers, attempts to isolate the target from other users, and lexical similarity to known grooming conversation patterns. Models are trained on synthetic and research-derived datasets — never real user data.

2. Graph Analysis

Who is talking to whom, at what frequency, and with what structural characteristics. A 40-year-old account with zero peer-age connections making rapid friend requests to accounts flagged as likely minors looks very different from an 18-year-old talking to their gaming friends. Graph signals detect coordinated targeting, unusual relationship formation rates, and network centrality anomalies.

3. Temporal Analysis

Grooming has a temporal signature. Conversation escalation follows recognizable progressions. Contact frequency patterns — how often someone messages a specific user, at what times, with what regularity — are informative signals independent of content. SENTINEL builds time-series models of behavioral escalation across sessions.

4. Fairness Audit Layer

Before any composite score is emitted, it passes through demographic parity checks. If the system would flag members of one demographic group at a materially different rate than another for identical behavior, the score is held until the discrepancy is resolved. This is enforced at runtime, not just during training.

The four layers produce a composite score from 0–100 with four tiers: trusted, watch, restrict, critical.

The 13 Microservices

SENTINEL ships as a Docker Compose stack of 13 independent services. Each can be deployed incrementally — you do not need the full stack to get value.

Core Pipeline

1. event-ingestor — The entry point. Accepts raw events (messages, relationship changes, login events) via REST API or webhook. Normalizes, validates, and routes to the internal queue. Handles 10k+ events/second per instance.

2. nlp-scorer — Consumes events from the queue. Runs the linguistic analysis pipeline: tokenization, entity extraction, sentiment analysis, escalation detection. Emits linguistic signal scores to the aggregator.

3. graph-builder — Maintains the relationship graph in a vector database. On each new relationship event, updates edge weights, recalculates centrality, and flags anomalous graph formation. Uses incremental graph algorithms to avoid full recomputation.

4. temporal-tracker — Maintains per-user time-series of behavioral events. Computes rate-of-change signals, session frequency patterns, and contact escalation curves.

5. risk-aggregator — The ensemble. Pulls scores from the three signal services, applies the weighted ensemble model, runs the fairness gate, and writes the final risk score to the score store.

6. score-store — PostgreSQL-backed store for all risk scores with full history. Every score change is recorded with the contributing signals and their weights. The record contains not just "the score is 74" but which six signals contributed how much at what timestamp.

Compliance and Audit

7. audit-chain — Every moderator action, every automated action, every score change produces a cryptographically signed audit event. Events are chained (each includes the hash of the previous), making retroactive tampering detectable. Retained for 7 years, designed to serve as legal evidence.

8. compliance-engine — Per-tenant regulatory configuration. Handles GDPR right-to-erasure (soft-deletes with zero-knowledge proof of deletion), COPPA data retention limits, DSA reporting endpoint generation, and OSA audit export formatting.

9. alert-dispatcher — Watches the score store for threshold crossings. On critical tier transitions, fires webhook callbacks, generates moderator queue entries, and (if configured) prepares NCMEC CyberTipline-formatted evidence packages.

Federation Layer

10. federation-gateway — The privacy-preserving threat intelligence layer. When a user reaches critical tier, a cryptographic signal (not identifying data, not message content) is shared with opted-in peer platforms. Peers receive a risk signal for a pseudonymous identifier and can check for a matching user in their own system.

11. identity-resolver — Maps between external platform identifiers and SENTINEL's internal pseudonymous IDs. Raw platform user IDs never appear in logs, federation signals, or audit exports.

Developer Interface

12. api-gateway — The external-facing REST API. Handles authentication, rate limiting, per-tenant routing, and SDK compatibility. The Python and Node.js SDKs talk exclusively to this service.

13. dashboard-service — The moderator web UI. Displays risk score queues, behavioral timelines, graph visualizations, and the human review workflow. Every score comes with a plain-language explanation of why, specifically to reduce moderator burnout from opaque black-box outputs.

How the Fairness Gate Works

Before any risk score leaves the risk-aggregator, it runs through the fairness gate:

def fairness_gate(score, signals, demographic_proxy):
    baseline_rate = get_population_flag_rate(demographic_proxy)
    predicted_rate = estimate_flag_rate(score, signals, demographic_proxy)

    disparity = abs(predicted_rate - baseline_rate) / baseline_rate

    if disparity > PARITY_THRESHOLD:
        raise FairnessViolation(
            f"Demographic parity violation: {disparity:.2%} disparity detected"
        )

    return score

The threshold is configurable per deployment. When a FairnessViolation is raised, the score is quarantined and flagged for human review rather than propagated downstream. This is not a soft warning — it is a hard stop.

The default threshold (5% disparity) is derived from NIST's AI Risk Management Framework recommendations.

The Federation Protocol

The federation protocol is the most architecturally interesting piece. The goal: share threat intelligence across platforms without sharing any of the data that makes that intelligence sensitive.

The flow:

Platform A detects a critical-tier user. The federation-gateway generates a hashed, salted pseudonymous token from the user's behavioral signals.
The token is broadcast to opted-in peers via a gossip protocol over mutual TLS.
Platform B receives the token. Its identity-resolver checks whether any of its users produce a matching token under the shared salt.
If a match is found, Platform B's risk-aggregator applies a federation risk boost to that user's score.

No messages are shared. No usernames. No IPs. Platform A never learns which users on Platform B were matched. A predator banned on one platform gets flagged on another within minutes, with zero raw data crossing platform boundaries.

This is v1 of the federation protocol. The roadmap includes k-anonymity enhancements and a formal differential privacy layer.

Integration

The entire integration surface is the event ingestor API:

from sentinel_safety import SentinelClient
import hashlib

client = SentinelClient(api_key="your_key", tenant_id="your_tenant")

# Send a message event
client.ingest_event({
    "event_type": "message",
    "sender_id": "user_abc",
    "recipient_id": "user_xyz",
    "platform_room_id": "room_123",
    "timestamp": "2026-04-25T12:00:00Z",
    # Content hash only — raw messages never leave your platform
    "content_hash": hashlib.sha256(message_content.encode()).hexdigest(),
})

# Get current risk score
score = client.get_risk_score("user_abc")
print(score.tier)       # "watch"
print(score.score)      # 47
print(score.reasoning)  # Plain-language explanation of contributing signals

Content is never sent to SENTINEL — only a hash, alongside behavioral metadata. NLP analysis runs client-side via the SDK; only extracted signal scores reach the ingestor. Raw messages never leave your platform.

Time to first integration: under an hour.

Tech Stack

Python 3.12, FastAPI for all internal services
PostgreSQL (score store, audit chain)
Redis (event queue, session state)
Qdrant (vector database for graph embeddings)
Docker Compose for local and self-hosted deployment
OpenTelemetry throughout for observability

No proprietary cloud services required. Deployable on any provider.

What Is Next

SENTINEL v1.0 is live: github.com/sentinel-safety/SENTINEL

The roadmap: federated learning enhancements (on-device model updates without data sharing), k-anonymity improvements to the federation protocol, expansion of the research dataset beyond the current v1 baseline, and formal academic publication of the behavioral detection methodology.

If you are building a platform where minors are present and have not yet implemented proactive safety measures, SENTINEL is designed so there is no excuse not to. Setup is a Docker Compose file and an API key. Compliance infrastructure is included. The audit trail is automatic.

Commercial licensing for platforms over $100k annual revenue: sentinel.childsafety@gmail.com

SENTINEL is built and maintained by the Sentinel Foundation. v1.0 released April 2026.

DEV Community