DEV Community

Cover image for TwinShield: How We Built a Living Fraud Detection System with Digital Twins and MongoDB
Akshaj Sri
Akshaj Sri

Posted on

TwinShield: How We Built a Living Fraud Detection System with Digital Twins and MongoDB

Authors: Tupurani Sree Rama Akshaj,Bhuvanesh Naidu, Aakash Samudrala, Chandravadan Rao

What TwinShield Actually Is
The core idea behind TwinShield is that fraud doesn't happen in isolation. A single suspicious transaction tells you something, but what tells you much more is how different that transaction is from everything that user has ever done before.
That's what the Digital Twin layer does. Every user in the system has a living document — a profile that tracks their average transaction amount, the devices they typically use, the locations they transact from, their total history, their anomaly count, and a rolling risk score. Every time a new transaction comes in, that document updates itself.
So when the AI engine flags something as suspicious, you're not just getting a score in isolation. You're getting a verdict that's being compared against a continuously evolving baseline of who that user actually is.
That framing — a digital twin of a bank user — is what made MongoDB the right call. A twin isn't a set of rows across four tables. It's one coherent thing that you read and write as a unit.

The Two Collections That Run Everything
We kept the data model simple on purpose. Two collections: transactions and user_profiles.
transactions is pretty much what it sounds like. Every financial event that comes through the system lands here as one document — the user ID, the amount, the timestamp, the device, whether that device is trusted, the location, the IP. Once the AI engine finishes scoring it, we also write the anomaly score, the risk level (LOW, MEDIUM, or HIGH), and a boolean flag for whether it's been classified as an anomaly.
Everything in one place. No joins to run when the dashboard needs to pull recent anomalies.

user_profiles is where the actual Digital Twin lives. This document is never static — it rebuilds itself after every transaction:

The rolling average calculation in TransactionService runs after every transaction and writes the updated values back. The twin just... keeps learning.

Why This Wouldn't Have Worked as Well in a Relational DB
Take typicalDevices and typicalLocations. In MySQL, those would have needed their own lookup tables. Every time we wanted to check whether a user's current device is in their known-device list, we'd be running a join. Every time we wanted to update that list, we'd be touching a separate table.
In MongoDB, they're just arrays in the same document. We read them, check if the current device is in there, and update the list — in the same operation, touching one document.

That simplicity adds up fast when you're doing this for every transaction, in real time.
And then there's the schema flexibility thing. Midway through development we decided to add peakAnomalyScore to the user profile. In MongoDB, we added the field and started writing to it. Old documents just returned null for that field until they got their next update. No migration script, no ALTER TABLE, no downtime, no drama. That happened a couple more times over the course of the project. Each time: zero overhead.

The Spring Data Repository Layer
Because we used Spring Data MongoDB rather than JPA or Hibernate, the repository layer stayed very clean.
Spring Data just translates those method names into MongoDB queries. There's no query language to maintain separately, no XML mapping, no annotation soup. It reads almost like pseudocode, which made the backend much easier to reason about.

The AI Engine: Isolation Forest
For anomaly detection, we went with Isolation Forest from scikit-learn, running inside a Python Flask service that the Spring Boot backend calls over HTTP.
The intuition behind Isolation Forest is pretty elegant: if you randomly partition a dataset over and over, anomalous data points — the rare, structurally weird ones — get isolated from the rest faster than normal ones. The model measures how quickly each point gets cut off, and that isolation speed becomes its anomaly score.
For fraud detection specifically, this is a good fit. Fraud is by definition the outlier — the transaction that doesn't look like anything else in the user's history.

The Six Features We Feed It
We don't pass raw transaction data to the model. We engineer six features first:

The raw score from the model gets normalized to 0–1. Above 0.65 becomes MEDIUM risk. Above 0.80 is HIGH.
At startup, the model pre-trains on 600 synthetic normal transactions so it has a baseline before any real data comes in. After that, you can retrain it through the /train endpoint as real transactions accumulate.

The Fallback: Because Things Break
One thing we were pretty deliberate about is that the Flask AI service is a separate process. If it goes down, we didn't want the whole system to fail.
So inside the Spring Boot backend, we built a Java-based rule engine as a fallback:

When the AI call fails, the backend catches the exception and routes through this instead. Every transaction still gets a verdict. The system degrades gracefully rather than going down.
It's not as sophisticated as the Isolation Forest model, but it covers the obvious cases and it keeps the whole thing alive under failure conditions. We'd rather have a slightly less accurate system than a broken one.

The Simulation Engine
We built a simulation engine that injects four types of attack patterns on demand:

Large night transfers — High-value transactions during unusual hours
Untrusted device attacks — Transactions from a device not in the user's history
Geo-suspicious transactions — Locations the user has never transacted from
Combined attacks — All three signals at once

Every simulated transaction runs through the exact same pipeline as a real one. AI engine scores it, MongoDB stores it, the user's Digital Twin updates. You can trigger a fraud scenario in seconds and watch the dashboard react.
This turned out to be genuinely important. When we were showing the system to people, being able to hit a button and watch it detect a combined attack in real time was far more convincing than showing static test logs. Demos matter.

The Dashboard: Eight-Second Polling
The React frontend polls the summary endpoint every eight seconds. One API call returns total transaction count, anomaly count and rate, active user count, and how many Digital Twins have crossed into high-risk territory.
For the detailed view, findByOverallRiskScoreGreaterThan(0.7) pulls all user profiles where accumulated evidence has pushed the rolling risk score into dangerous territory — users whose Digital Twin has drifted far from their normal baseline.
It's not fancy, but it's live and it's real. Watching a user's risk score climb in real time after a simulated attack sequence is oddly satisfying.

What We Actually Learned
The biggest shift was thinking about data as documents rather than rows. Once you make that switch, the Digital Twin concept clicks into place naturally. A user's behavioural profile is a coherent thing — you want to read it whole, update it whole, and query it directly. It maps to a document much better than it maps to a set of relational tables.
The schema flexibility sounds like a minor convenience until you're mid-project and you realize you need to add a field. Then it becomes one of the most valuable things about MongoDB. We weren't writing migration scripts. We were building features.
If you're working on anything that involves evolving user profiles, real-time scoring, or live behavioural modelling, the document model is worth thinking seriously about. For us, the fit between MongoDB and the Digital Twin architecture wasn't coincidental. It was the right tool for what we were actually trying to build.

What's Next
Replacing the polling dashboard with MongoDB Aggregation Pipelines for server-side time-series fraud trends
Exploring Atlas Vector Search for semantic similarity queries across past transactions
A scheduled retraining pipeline that pulls fresh MongoDB data and updates the Isolation Forest model automatically

Special Thanks
A huge thank you to our mentor @chanda_rajkumar for the guidance, the critical feedback, and for pushing us to think more carefully about the architecture at every stage. This project is significantly better because of his involvement.

Resources
GitHub: https://github.com/SriAkshaj-720/TwinShieldV2.git
Spring Data MongoDB: docs.spring.io/spring-data/mongodb
Isolation Forest (scikit-learn): scikit-learn.org
MongoDB Atlas: mongodb.com/atlas
Youtube Link:-

Top comments (0)