<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akshaj Sri</title>
    <description>The latest articles on DEV Community by Akshaj Sri (@akshaj_sri_c502f4da482948).</description>
    <link>https://dev.to/akshaj_sri_c502f4da482948</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3899917%2F8debecb3-ba61-40ee-8910-8b32e11021f5.jpg</url>
      <title>DEV Community: Akshaj Sri</title>
      <link>https://dev.to/akshaj_sri_c502f4da482948</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akshaj_sri_c502f4da482948"/>
    <language>en</language>
    <item>
      <title>TwinShield: How We Built a Living Fraud Detection System with Digital Twins and MongoDB</title>
      <dc:creator>Akshaj Sri</dc:creator>
      <pubDate>Mon, 27 Apr 2026 15:57:23 +0000</pubDate>
      <link>https://dev.to/akshaj_sri_c502f4da482948/twinshield-how-we-built-a-living-fraud-detection-system-with-digital-twins-and-mongodb-1ng5</link>
      <guid>https://dev.to/akshaj_sri_c502f4da482948/twinshield-how-we-built-a-living-fraud-detection-system-with-digital-twins-and-mongodb-1ng5</guid>
      <description>&lt;p&gt;Authors: Tupurani Sree Rama Akshaj,Bhuvanesh Naidu, Aakash Samudrala, Chandravadan Rao&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What TwinShield Actually Is&lt;/strong&gt;&lt;br&gt;
The core idea behind TwinShield is that fraud doesn't happen in isolation. A single suspicious transaction tells you something, but what tells you much more is how different that transaction is from everything that user has ever done before.&lt;br&gt;
That's what the Digital Twin layer does. Every user in the system has a living document — a profile that tracks their average transaction amount, the devices they typically use, the locations they transact from, their total history, their anomaly count, and a rolling risk score. Every time a new transaction comes in, that document updates itself.&lt;br&gt;
So when the AI engine flags something as suspicious, you're not just getting a score in isolation. You're getting a verdict that's being compared against a continuously evolving baseline of who that user actually is.&lt;br&gt;
That framing — a digital twin of a bank user — is what made MongoDB the right call. A twin isn't a set of rows across four tables. It's one coherent thing that you read and write as a unit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90meaq5romq17ci89e59.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90meaq5romq17ci89e59.png" alt=" " width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Two Collections That Run Everything&lt;/strong&gt;&lt;br&gt;
We kept the data model simple on purpose. Two collections: &lt;strong&gt;transactions&lt;/strong&gt; and &lt;strong&gt;user_profiles.&lt;/strong&gt;&lt;br&gt;
transactions is pretty much what it sounds like. Every financial event that comes through the system lands here as one document — the user ID, the amount, the timestamp, the device, whether that device is trusted, the location, the IP. Once the AI engine finishes scoring it, we also write the anomaly score, the risk level (LOW, MEDIUM, or HIGH), and a boolean flag for whether it's been classified as an anomaly.&lt;br&gt;
Everything in one place. No joins to run when the dashboard needs to pull recent anomalies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjquy2yk62fd5dru3isgc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjquy2yk62fd5dru3isgc.png" alt=" " width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenp86ggtng6695ht0do1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenp86ggtng6695ht0do1.png" alt=" " width="744" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;user_profiles&lt;/strong&gt; is where the actual Digital Twin lives. This document is never static — it rebuilds itself after every transaction:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnaeizaracziadizl1k0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnaeizaracziadizl1k0r.png" alt=" " width="749" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The rolling average calculation in &lt;strong&gt;TransactionService&lt;/strong&gt; runs after every transaction and writes the updated values back. The twin just... keeps learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Wouldn't Have Worked as Well in a Relational DB&lt;/strong&gt;&lt;br&gt;
Take &lt;strong&gt;typicalDevices&lt;/strong&gt; and &lt;strong&gt;typicalLocations&lt;/strong&gt;. In MySQL, those would have needed their own lookup tables. Every time we wanted to check whether a user's current device is in their known-device list, we'd be running a join. Every time we wanted to update that list, we'd be touching a separate table.&lt;br&gt;
In MongoDB, they're just arrays in the same document. We read them, check if the current device is in there, and update the list — in the same operation, touching one document.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwh2hnw7evlmk0ly2sjsi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwh2hnw7evlmk0ly2sjsi.png" alt=" " width="728" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That simplicity adds up fast when you're doing this for every transaction, in real time.&lt;br&gt;
And then there's the schema flexibility thing. Midway through development we decided to add &lt;strong&gt;peakAnomalyScore&lt;/strong&gt; to the user profile. In MongoDB, we added the field and started writing to it. Old documents just returned null for that field until they got their next update. No migration script, no ALTER TABLE, no downtime, no drama. That happened a couple more times over the course of the project. Each time: zero overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Spring Data Repository Layer&lt;/strong&gt;&lt;br&gt;
Because we used Spring Data MongoDB rather than JPA or Hibernate, the repository layer stayed very clean. &lt;br&gt;
Spring Data just translates those method names into MongoDB queries. There's no query language to maintain separately, no XML mapping, no annotation soup. It reads almost like pseudocode, which made the backend much easier to reason about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI Engine: Isolation Forest&lt;/strong&gt;&lt;br&gt;
For anomaly detection, we went with Isolation Forest from scikit-learn, running inside a Python Flask service that the Spring Boot backend calls over HTTP.&lt;br&gt;
The intuition behind Isolation Forest is pretty elegant: if you randomly partition a dataset over and over, anomalous data points — the rare, structurally weird ones — get isolated from the rest faster than normal ones. The model measures how quickly each point gets cut off, and that isolation speed becomes its anomaly score.&lt;br&gt;
For fraud detection specifically, this is a good fit. Fraud is by definition the outlier — the transaction that doesn't look like anything else in the user's history.&lt;/p&gt;

&lt;p&gt;The Six Features We Feed It&lt;br&gt;
We don't pass raw transaction data to the model. We engineer six features first:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqodo9ot2440zhp2jx71r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqodo9ot2440zhp2jx71r.png" alt=" " width="681" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft56g1b9crthzwp384er2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft56g1b9crthzwp384er2.png" alt=" " width="746" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The raw score from the model gets normalized to 0–1. Above 0.65 becomes MEDIUM risk. Above 0.80 is HIGH.&lt;br&gt;
At startup, the model pre-trains on 600 synthetic normal transactions so it has a baseline before any real data comes in. After that, you can retrain it through the /train endpoint as real transactions accumulate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fallback: Because Things Break&lt;/strong&gt;&lt;br&gt;
One thing we were pretty deliberate about is that the Flask AI service is a separate process. If it goes down, we didn't want the whole system to fail.&lt;br&gt;
So inside the Spring Boot backend, we built a Java-based rule engine as a fallback:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuluxkq8ev01dvttatq0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuluxkq8ev01dvttatq0z.png" alt=" " width="752" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When the AI call fails, the backend catches the exception and routes through this instead. Every transaction still gets a verdict. The system degrades gracefully rather than going down.&lt;br&gt;
It's not as sophisticated as the Isolation Forest model, but it covers the obvious cases and it keeps the whole thing alive under failure conditions. We'd rather have a slightly less accurate system than a broken one.&lt;/p&gt;

&lt;p&gt;The Simulation Engine&lt;br&gt;
We built a simulation engine that injects four types of attack patterns on demand:&lt;/p&gt;

&lt;p&gt;Large night transfers — High-value transactions during unusual hours&lt;br&gt;
Untrusted device attacks — Transactions from a device not in the user's history&lt;br&gt;
Geo-suspicious transactions — Locations the user has never transacted from&lt;br&gt;
Combined attacks — All three signals at once&lt;/p&gt;

&lt;p&gt;Every simulated transaction runs through the exact same pipeline as a real one. AI engine scores it, MongoDB stores it, the user's Digital Twin updates. You can trigger a fraud scenario in seconds and watch the dashboard react.&lt;br&gt;
This turned out to be genuinely important. When we were showing the system to people, being able to hit a button and watch it detect a combined attack in real time was far more convincing than showing static test logs. Demos matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Dashboard: Eight-Second Polling&lt;/strong&gt;&lt;br&gt;
The React frontend polls the summary endpoint every eight seconds. One API call returns total transaction count, anomaly count and rate, active user count, and how many Digital Twins have crossed into high-risk territory.&lt;br&gt;
For the detailed view, findByOverallRiskScoreGreaterThan(0.7) pulls all user profiles where accumulated evidence has pushed the rolling risk score into dangerous territory — users whose Digital Twin has drifted far from their normal baseline.&lt;br&gt;
It's not fancy, but it's live and it's real. Watching a user's risk score climb in real time after a simulated attack sequence is oddly satisfying.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l69f89kkpu7hee8jggw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l69f89kkpu7hee8jggw.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What We Actually Learned&lt;/strong&gt;&lt;br&gt;
The biggest shift was thinking about data as documents rather than rows. Once you make that switch, the Digital Twin concept clicks into place naturally. A user's behavioural profile is a coherent thing — you want to read it whole, update it whole, and query it directly. It maps to a document much better than it maps to a set of relational tables.&lt;br&gt;
The schema flexibility sounds like a minor convenience until you're mid-project and you realize you need to add a field. Then it becomes one of the most valuable things about MongoDB. We weren't writing migration scripts. We were building features.&lt;br&gt;
If you're working on anything that involves evolving user profiles, real-time scoring, or live behavioural modelling, the document model is worth thinking seriously about. For us, the fit between MongoDB and the Digital Twin architecture wasn't coincidental. It was the right tool for what we were actually trying to build.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzv33dsdqxflsn1319nab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzv33dsdqxflsn1319nab.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's Next&lt;/strong&gt;&lt;br&gt;
Replacing the polling dashboard with MongoDB Aggregation Pipelines for server-side time-series fraud trends&lt;br&gt;
Exploring Atlas Vector Search for semantic similarity queries across past transactions&lt;br&gt;
A scheduled retraining pipeline that pulls fresh MongoDB data and updates the Isolation Forest model automatically&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Special Thanks&lt;/strong&gt;&lt;br&gt;
A huge thank you to our mentor &lt;a class="mentioned-user" href="https://dev.to/chanda_rajkumar"&gt;@chanda_rajkumar&lt;/a&gt; for the guidance, the critical feedback, and for pushing us to think more carefully about the architecture at every stage. This project is significantly better because of his involvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/SriAkshaj-720/TwinShieldV2.git" rel="noopener noreferrer"&gt;https://github.com/SriAkshaj-720/TwinShieldV2.git&lt;/a&gt;&lt;br&gt;
Spring Data MongoDB: docs.spring.io/spring-data/mongodb&lt;br&gt;
Isolation Forest (scikit-learn): scikit-learn.org&lt;br&gt;
MongoDB Atlas: mongodb.com/atlas&lt;br&gt;
Youtube Link:-   &lt;iframe src="https://www.youtube.com/embed/muTlqpD6jdo"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>mongodb</category>
      <category>cybersecurity</category>
    </item>
  </channel>
</rss>
