sentinel-safety

Posted on Apr 25

Grooming operates over time. Here's how behavioral detection tracks it.

#security #opensource #machinelearning

Every system designed to detect child grooming has the same problem: it's looking at the wrong unit of analysis.

Grooming doesn't happen in a message. It happens across weeks of messages — a slow accumulation of trust, a gradual shift in conversational register, an escalation in contact frequency that would look unremarkable if you sampled any individual session but reads clearly as a pattern when you step back and look at the whole trajectory.

When you build a detection system around message-level classification, you're designing for a problem that doesn't exist. Predators don't send a message that contains the whole grooming attempt. They send a hundred messages across a month, each one just slightly further than the last.

This post is about how temporal signal analysis changes the problem — and specifically, how SENTINEL's temporal layer works.

What keyword filters see

A keyword filter has a view like this:

[message] → [classifier] → flag / no flag

Each message is independent. The system has no memory. What happened in last Tuesday's session doesn't affect how it evaluates today's message.

This maps cleanly onto spam detection, where the signals that make a message spam are usually present in the message itself. It maps badly onto grooming, where the signal is the shape of behavior over time, not the content of individual messages.

A systematic review of the grooming detection literature by An et al. (arXiv:2503.05727, 2025) found that behavioral and temporal features are "consistently underexplored relative to linguistic features across the published literature" despite showing strong discriminative power in the studies that do use them. The architecture of most detection systems — trained on datasets of individual message excerpts — has driven the field toward a unit of analysis that the problem doesn't support.

What the behavioral evidence actually shows

Research on documented grooming cases consistently identifies a set of behavioral patterns that operate across sessions rather than within them:

Escalation velocity. Grooming tends to follow a measurable escalation trajectory: initial low-stakes contact, relationship development, increasing intimacy and exclusivity, then requests for personal information, image sharing, or off-platform contact. The rate at which this escalation moves is a signal. Fast escalation from a new contact is a very different pattern from a years-long friendship.

Contact frequency evolution. Early in grooming, contact is typically sporadic and positioned as casual. As trust develops, contact frequency increases and becomes more purposeful. The shift from irregular to regular to daily to multiple-times-daily contact, across sessions rather than within a single session, is a behavioral signature.

Session-bridging behavior. Predators often end sessions in ways that create continuity with the next one — leaving threads open, referencing the next time they'll talk, creating a sense of ongoing relationship rather than discrete conversations. This cross-session threading is observable as a temporal pattern.

Off-platform migration attempts. Requests to move a conversation from a platform to a private channel (WhatsApp, Signal, Snapchat) tend to cluster at a specific point in the grooming trajectory, after sufficient trust has been established but before the predator feels confident enough to escalate overtly on the monitored platform. The timing of this request, relative to the arc of the relationship, is a signal.

None of these patterns are visible in a message. They're only visible as trajectories.

How SENTINEL's temporal layer works

SENTINEL analyzes user behavior across four signal layers: linguistic, graph, temporal, and fairness. The temporal layer is specifically designed to capture the escalation patterns that cross-session behavioral analysis makes visible.

The core object is what we call the behavioral profile: a rolling window of signals accumulated across sessions for a given user-to-user relationship or a given user's behavior on the platform. This profile is updated with each new event and used to compute temporal features.

The key temporal signals SENTINEL tracks:

Escalation velocity. The rate at which the composite behavioral risk score is increasing over time. A user whose score has risen from 15 to 60 over three weeks looks very different from a user whose score reached 60 in a single session. The trajectory itself carries information.

Contact frequency gradient. How the rate of contact between two users has changed over time. The first week of contact looked casual; by week four, there are multiple sessions per day. The gradient of this change is computed as a temporal signal.

Session boundary behavior. How sessions end and begin. Does the conversation pick up immediately where it left off? Are there explicit continuity markers? Does the session-ending message create an open loop that the next session closes?

Time-of-day pattern shifts. Contact shifting to unusual hours — late night, early morning — is a known escalation marker. SENTINEL tracks whether the distribution of contact times has changed over the observation window.

These signals are composited into a temporal risk contribution that's added to the overall behavioral risk score alongside the linguistic and graph signal contributions.

The practical implication: why trajectory matters more than threshold

The classic approach to classification systems is to set a threshold: if the confidence score exceeds X, flag the content. For message-level classifiers, this makes sense — the score reflects confidence in the single message being malicious.

For temporal systems, the threshold intuition breaks down. The point is not whether today's message exceeds a threshold, but whether the shape of behavior over time matches known grooming trajectories.

SENTINEL scores users rather than messages. The risk score for a user reflects the accumulated weight of behavioral evidence across their entire history on the platform, with decay applied to older signals so that low-risk periods can recover a user's standing. A single suspicious message raises the score modestly. A sustained pattern of escalating contact, register shifts, and frequency increases over three weeks raises it substantially.

This means a moderator review queue populated by SENTINEL's scores looks different from a queue populated by per-message classification scores. The cases at the top of the queue are there because of a behavioral trajectory — because something has been building, not because one message happened to cross a threshold.

What explainability looks like for temporal signals

One of SENTINEL's design requirements is that every risk score comes with a structured plain-language explanation of the signals that contributed to it. For temporal signals, this looks like:

"Contact frequency between this user and [target] has increased 4.2x over the past 21 days. Time-of-day distribution has shifted toward late evening hours. Risk score increased 18 points this week driven primarily by contact frequency escalation."

This explanation structure matters for two reasons.

For human moderators: reviewing a case with this context is fundamentally different from reviewing a number. The moderator understands why the system flagged this user, can evaluate whether the behavioral trajectory matches their knowledge of the specific situation, and can make a better decision about what action, if any, is warranted.

For legal defensibility: if a moderation action is challenged — or if the platform needs to document its proactive detection methodology for DSA or UK Online Safety Act audit purposes — a structured explanation of the behavioral trajectory is far more useful than a classifier confidence score.

The data problem

The honest limitation of temporal detection is that it requires time-series data, which creates challenges that message-level systems don't face.

Most academic grooming detection datasets are collections of chat logs — often the PAN12 benchmark dataset — without full temporal context. Training and evaluating temporal detection systems requires longitudinal data: the full arc of a relationship over time, with session boundaries preserved. This data is scarce in research settings, because it requires either real platform data (which raises obvious consent and ethical issues) or synthetic data generation with careful attention to temporal realism.

SENTINEL ships with a synthetic research dataset of 50 annotated grooming conversations, designed for temporal analysis. It's a starting point; extending it is an explicit project goal. Academic researchers who want to build on this or contribute temporal datasets are specifically invited to engage — reach out at sentinel.childsafety@gmail.com.

Where this leaves detection systems

The practical implication is that building effective grooming detection requires choosing a unit of analysis that matches the phenomenon: not the message, not even the session, but the behavioral trajectory across sessions over time.

Systems that operate at the message level will always face the fundamental evasion problem: a predator who knows not to send any individual message that crosses a threshold can groom successfully while generating only normal-looking messages at each individual checkpoint. Systems that track behavioral trajectories can detect the escalation pattern even when no individual message is above threshold.

This is why SENTINEL's architecture is built around behavioral profiling rather than per-message classification — and why the temporal layer is central to the detection model rather than an add-on.

SENTINEL is open source and free for platforms under $100k annual revenue: https://github.com/sentinel-safety/SENTINEL

For questions, dataset contributions, or research collaboration: sentinel.childsafety@gmail.com

DEV Community