James Smith

Posted on Apr 10

Machine Learning and Scam Detection: The Future of Online Safety

#machinelearning #ai #cybersecurity #datascience

ML to blocklists: the next five years of the arms race between fraud and detection and what the arms race really looks like.
A text file was the most dominant method of detecting an online scam in 2003. The state of the art was blocklists: lists of known-bad domains, IP addresses, and email senders. Teams of human analysts regularly updated them on a weekly basis, and they were sent to email clients and browsers and were reasonably effective against an opponent who was slow and at a modest scale.
It is now twenty-two years later; the arch-rival registers ten thousand domains every day, writes customized phishing messages on-the-fly with fine-tuned LLMs, tours their attacks through legal CDN networks, and pre-tests their campaigns against detection systems before deploying them. But the text file remains technically alive a mere seven layers deep within one of the neural ensembles that processes four hundred features in less than a second.
The tale of machine learning revolutionizing scam detection, what the modern generation of systems actually does, and where the research future is going. Due to the fact that even the next five years of this arms race will not be determined by the construction of the larger model but rather by who will get the right questions in the right sequence.

Three Generations of Scam Detection.

To figure out what is happening with ML, it is necessary to map the development of rule-based systems to the modern hybrid architectures. The successive generations dealt with the failure mode of the previous generation and left the new generation with new failure modes to resolve.

The Real Things that the Current ML Systems Do.

The current systems of production scam detection available in the market today, both within Google Safe Browsing and Microsoft Defender SmartScreen and standalone services, are not single-model systems. They consist of groups of specialized classifiers running concurrently, and outputs are combined by a meta-learner that weights each signal based on its predictive accuracy for the particular input type.

The URL classifier

The quickest component, a gradient-boosted tree classifier that uses URL string features, has a running time of less than 3 ms. It takes a 47-dimensional feature vector as its input derived based on the raw URL: domain entropy, subdomain depth, TLD risk score, brand keywords in non-SLD position, path depth, and density of special characters. The classifier itself was previously presented in the research in this sequence, but what should be highlighted here is its adversarial robustness profile: it is the least difficult component to evade (register a clean-looking domain with a safe TLD) and the most crucial component to scale performance (it filters 60-70% of clearly safe URLs before any expensive analysis is done).

The content classifier is transformer-based.

The greatest architectural change that has taken place over the past four years is the use of fine-tuned transformer models for web absence page content analysis. The BERT variant that is trained on a collection of validated scam pages and genuine pages learns semantic representations of content that learn intent and not surface features. A phishing site that will omit all the words on a blacklist but still create a semantically identical sentence, such as typing in your password in order to verify your account, will still be rated highly by the content classifier, as it is a model that interprets intent, not tokens.
This is the element that rendered the practice of keyword evasion no longer a method of attack. It is also the element that is the most endangered by the content created by the LLM adversarial machine that can create semantic equivalents of malicious intent wrapped in the guise of a legitimate statement and, therefore, not detected by the classifier.

The level of the graph neural network.

The graph neural network (GNN) component is the latest addition to the production pipelines, and it represents the relationship between entities, in this case, domains, IP addresses, registrant identities, payment processors, and hosting providers as a graph and learns fraud patterns based on topology rather than individual node features. One domain that appears clean on its own might be a direct neighbor of seventeen confirmed fraud domains in the entity graph. The GNN is able to do this; the URL classifier and content model cannot.
GNN-based recognition was the one that recognized the 27-domain cluster in the case study reported in another part of this series a concerted effort where no single domain would have been found as causing a high-confidence verdict, but the graph topology was clear. Why Scam Alerts can raise coordinated campaigns that domain-level tools fail to identify at all is because it combines this graph-based signal with its URL lexical analysis and community report feeds.
fraud_gnn.py — simplified DGL implementation
`Simplified GNN message-passing for fraud detection
class FraudGNN(nn.Module):
def init(self, in_feats, hidden_size, num_classes):
super().init()
self.conv1 = GraphConv(in_feats, hidden_size)
self.conv2 = GraphConv(hidden_size, hidden_size)
self.classifier = nn.Linear(hidden_size, num_classes)

def forward(self, g, features):
    # Layer 1: aggregate neighbour features
    h = F.relu(self.conv1(g, features))
    # Layer 2: second-order neighbourhood propagation
    h = F.relu(self.conv2(g, h))
    # Node-level fraud probability
    return self.classifier(h)

. A domain with 17 fraud-network neighbours receives
. high aggregated fraud signal even if its own features
. score clean on URL and content classifiers.`

The Arms Race of Adversarial Status Quo.

Any increase in detection ability generates a selection pressure on the population of attackers. Those operators that fail to adjust to a new detection technique cease to be successful and go out of business. The operators that are able to adapt become survivors and perfect their evasion strategy and distribute it. The outcome here is a mutually antagonistic co-evolution that drives the two parties towards growing more sophisticated.

The most important entry on this table is the LLM-vs-LLM row. This is because the detection community is currently actively training classifiers on the phishing content generated by LLM, i.e., the same technology used to generate the attack is being used to label the training data to be used by the defense. This forms an intriguing equilibrium dilemma: with more and more powerful attacker LLMs, a new training set must be produced on a regular basis. This sub-conflict is won by the organization that has faster model iteration, which is not necessarily the superior base model.

The Next Five Years: Four Technologies that will Make the Field.

FedML. Platform network federated learning

The inherent scam detection conflict lies between privacy and signal. The wealthiest scam suggests living within personal platform data, email messages, transactional patterns, and user behavior, which cannot be centralized with major privacy and regulation repercussions. The solution proposed by federated learning is to train models locally on the data of each platform and only aggregate model gradients, but not the raw data. Google has already implemented federated learning to detect spam on the device in Gmail. The next architectural frontier is the extension to cross-platform fraud detection, i.e., a fraud pattern identified in one payment network tells the detection in another without exchanging data.
Detection gain: Can unlock private-platform signals in the form of data centralization; the scope of training set diversity is vastly increased.

GNN+. Active research: temporal graph networks to evolve a topology of fraud

Existing GNN models treat the entity graph as a snapshot of the relationships at a given time. Fraud infrastructure, in turn, is nonstatic: clusters are launched and dumped, old domains are repurposed, and hosting providers are changed based on takedowns. Temporal Graph Networks (TGNs) can be used to learn how the graph changes over time, not only what nodes are related to each other but also how the graph varies, effectively recycling infrastructure patterns into a signal to be detected, as opposed to a reset.
Detection gain: Identifies the patterns of domain reuse and infrastructure recycling that cannot be detected through a static analysis of the graph.

XAI. Explainable AI to regulatory compliance and user trust

Regulatory demands requiring explain ability aspects are growing as scam detection is becoming an infrastructure built into banking, payment processing, hiring platforms, and government services. The EU AI Act, which is applicable to high-risk automated decision systems, states that the user must be able to explain why a decision has been made. An uncompliant black-box classifier with no explanation of the obtained fraud verdict cannot be legally used in an increasing number of jurisdictions. The SHAP values, attention visualization, and counterfactual explanation generation become more of a production need rather than a research tool.
Detection gain: Regulatory compliance + user trust + support of false positive remediation by means of an appeals process.

LLM². Adversarial red-teaming at scale, with LLM

The LLAM classifier is not the most valuable system in the detection pipeline; it is the red team. LLMA is capable of compromising thousands of new phishing variations each hour, with attack vectors that human red teamers would consider never to attempt and a blind spot to classifiers that actual adversaries can discover even before the adversary. Continuous probing production classifiers, generation and labeling of adversarial examples during retraining, and quantifying the robustness margin of deployed systems against the current level of attacker capability are now done using automated red-teaming pipelines.
Detection gain: New attack vectors hardening continuous classifiers; automatic blind-spot discovery.

The Signal Which Straight Models Shall Undergo

All the above architectural enhancements would render automated detection more rapid, precise, and resistant to known evasion methods. The basic knowledge issue that none of them deals with is that, by definition, a fresh campaign has no training examples.
This is what community reporting fills in this gap and why platforms like Scam Alerts are structurally complementary to, and not replaced by, more sophisticated ML systems. A user who sends a suspicious URL to Scam Alerts.com is producing a ground-truth signal to a campaign that could have been launched several hours ago and has no history of classifier training. That signal instantly spreads as a heavyweight feature in the composite risk score, delivering coverage that no degree of model sophistication can create out of thin air.
The architecture of the future is a feedback loop: ML models present human review and community verification candidates; community reports present labeled examples that train the models; retrained models detect zero-day better; retrained models display more novel campaigns to community verification. The other is complemented by each of the components.
It is the organizations that are developing towards this architecture, that is, automated signal and community intelligence supporting one another in an endless cycle, that are creating the type of system that can actually keep up with an adversary that is creating new systems every day, running on an industrial scale and ready to test their evasion tactics against your detection systems before they explode.

DEV Community