James Smith

Posted on Apr 9

Case Study: How a Scam Checker Prevented a Large-Scale Fraud Attempt

#webdev #ai #security #cybersecurity

Within the 11-hour window of detection that halted an organized infrastructure fraud scheme in 40,000 potential victims.
On a Thursday morn, 06:14, in February 2025, one URL posting generated an automated escalation warning within a scam-detection system. The URL was of what seemed a peer-to-peer trading marketplace of energy, a fairly legit-looking site with live pricing charts, a white paper, and an onboarding flow which, although still in early design, gave new users an introductory rate of 12%/yr on investments in energy tokens.
It had been sent to the Telegram channel with 40,000 subscribers, of whom the submitting user was a receiver. They did not know whether it was real. They put it through the checker just in case.
Eleven hours and eight minutes later, at 17:22 that same day, the platform acknowledged, validated, and disseminated a block on 23 related domains, followed the campaign to an established fraud infrastructure operator, and helped to provide intelligence to three national cybercrime units. None of the platform's users who checked the URL prior to use had carried out any verified financial transaction.
A technical narrative of that eleven-hour window, including what signals the detection pipeline was receiving, how the classifier grew to be a coordinated takedown, and what it can tell us about the structure of modern fraud prevention at scale, is presented.

The Campaign: What the Fraudsters Built.

The campaign did not go by word of mouth. Before the original submission to the public, the infrastructure was put together in a span of about six weeks. The domain registration history, hosting history, and schedule of content deployment reassembled by examining DNS logs, WHOIS data, and versions of pages in cache revealed a systematic build-out in three waves.
Phase one: registration of the domain. An overall seven-day window was used to register twenty-seven domains with three different registrars, two privacy masking services, and payment options, which were sent through a cryptocurrency mixing service. The domains had a consistent naming structure of the use of energy-sector words paired with legitimacy-signaling suffixes; words such as "exchange," "verified," "certified," and "network" were found in all versions.
Phase two: implementation of content. A high-quality web template was rolled out to all domains at the same time, with small cosmetic differences between them to overcome naive deduplication tests. The site had a live pricing feed that was based on a real commodity data API—providing it with dynamic, convincingly realistic market data and a timer that counted the seconds until the offer expired.
Phase three: distribution. The campaign was distributed in eleven Telegram channels, four subreddits, and two Discord servers, and the distribution was planned to start within specific time ranges when most people are typically online. The largest single distribution point was the Telegram channel with 40,000 subscribers, the users of which provided the original URL. Posting rights in that channel had been bought by the operator. The whole distribution phase was introduced in a window of 90 minutes. The first Scam Alert was received 47 minutes later than such a launch.

The Detection Timeline: 11 Hours, 8 Minutes.

Signal-Level Analysis: Determining What the Classifier Read.

The URL lexical classifier generated the 0.71 primary domain initial risk score, which passed its Layer 4 escalation threshold of 0.65, in less than 3 ms. The following is the feature vector it extracted:
url_feature_vector.py — primary domain, 06:14:33 UTC
Output of url_feature_extractor.py on primary submission
{ 'domain_age_days' : 43, 'tld_risk_score' : 0.62, # .network TLD 'brand_in_subdomain' : False, 'host_entropy' : 3.91, # above 3.8 threshold 'special_char_count' : 4, 'is_ip_host' : False, 'path_depth' : 3, 'has_redirect_param' : False, 'price_claim_in_url' : True, # 'certified' detected 'financial_vocab_density': 0.44, # high for URL alone 'composite_risk_score' : 0.71

That is the financial_vocab_density feature that is worth mentioning. It quantifies the percentage of the URL path occupied by tokens, which are represented in a curated list of vocabulary in the financial sector: certified, yield, verified, returns, and exchange. Such a low URL vocabulary score of 0.44 in itself is a statistically significant warning of a valid financial services area, which generally is not required to cram its URL with credibility-signaling words.

The cluster was discovered using the DOM fingerprinting.

The cross-domain structural fingerprinting step was the most technically important step because it detected all 27 domains as a coordinated cluster at T+17 minutes. Fingerprinting methodology removes a normalized hash of the structural skeleton of the DOM tag hierarchy, pattern of class names, sequence of form fields, and order of script loading and does not rely on the surface-level content, such as text, images, and color schemes.
Two sites that have the same branding and different DOM fingerprints are different. Two websites with entirely different web looks and identical DOM fingerprints are all but identical templates i.e., the same operator in the fraud case. The 94 percent fingerprint identification rate in 22 of 27 domains, with a shared third-party API key hardcoded into the page JavaScript, was enough of a confirmation that the cluster would pass with an evidentiary standard.

The payment flow anomaly

The ultimate signal was the payment processor check at T+54 minutes. Each domain was run in a headless Chromium instance with the full checkout process of adding a product to the cart, going through payment, and the JavaScript network layer had been configured to log all POST destinations. No known payment processor SDK in 27 domains, coupled with card data being POSTed to subdomains of the attacker's own infrastructure, resulted in a HIGH CONFIDENCE fraud verdict that did not rely on any single signal. The meeting of the six independent signal layers with the same direction at the same time led to it.

Composite Signal Matrix: Primary Domain Verdict.

There are four technical lessons from this case.

1. Cluster analysis is stronger than individual URL analysis.

The 27-domain cluster was discovered in 17 minutes, not due to the obviating of the fraudulence of 1 domain, but the registration fingerprinting linked them together. An individual check on any of the domains could have resulted in a moderate rather than a high-risk check. The network cluster method was a graph method that multiplied the signal strength of each individual indicator in the whole network.

2. Checking the execution of the payment flow cannot be replaced as a late-stage signal.

TLD scoring and WHOIS age are quick and avoidable. The operator registering domains 90 days beforehand and a typical TLD will pass surface-level tests. The absence of a payment processor check, for which it is necessary to execute the checkout flow in a headless browser, cannot be avoided without connecting to an actual payment processor, which would leave a financial identity of the operator traceable. It is the indicator that causes an actual expenditure on avoidance.

3. Community timing is a signal on its own and not merely a source of data.

The 14 community reports that came within 2.5 hours post the launch of the platform were not only confirmatory data, but also their velocity in itself was also informative. It does not take a legitimate financial platform 14 fraud reports to be created in the first two and a half hours of its existence. The probability of report arrival rate, which was modeled as a Poisson distribution of legitimate site report rates, was an anomaly of 9-sigma. Report velocity is now a first-class production classifier feature, which is time-windowed.

4. Attribution makes takedown faster but does not need protection.

The attribution of threat at T+5h occurred when the high-confidence decision on the fraud had already been made, and the blocklist was already spreading. Attribution has its value to law enforcement and to predicting future campaigns by the same operator; however, it is not on the critical path to worry about protection of the victims. The detection architecture is correctly designed in order to partition the two goals: protect users fast and protect the protection attribute carefully.

What this case shows about real-time fraud prevention.

The target victim group of 40,000 subscribers was found in that Telegram channel. The objective of the campaign was to turn a small percentage of them, even 0.5 percent, into financial victims before it could be detected. That conversion opportunity was reduced to near-zero by the eleven-hour window since the first user to make a check immediately started a detection cascade that secured all future users.
This is the network effect of scam detection based on the community. The 30-second check of one apprehensive user, the URL being run on a site like ScamAlerts, prior to considering it, did not just save that user. It caused an automated pipeline to activate, which safeguarded the whole downstream population. The detection system is constructed in such a way that a single truthful signal, along with a sufficiently large number of independent verification layers, is enough to make a coordinated response.
The architecture outlined in this case study, namely, layered signal extraction, domain cluster analysis, payment flow execution, and community report velocity modeling, is the working basis of such platforms as Scam Alerts.com. The case does not merely indicate that these systems are effective but specifically why they are effective: as they provide the speed and immediacy of the automated analysis with the depth and scope of the community intelligence in such a manner that neither of them alone is effective.

Reading materials and technical resources.

Check the suspicious URL, domain, or phone number in real time at Scam Alerts.com, the site that has the detection architecture around which this case study is based.

DEV Community