DEV Community

vishalmysore
vishalmysore

Posted on

Fraud Detection with Knowledge Graphs: A Protégé and VidyaAstra Approach

Fraud detection is one of the most common use cases presented for knowledge graph applications. It has been already covered in a variety of articles and implementations across the industry. I will present how to do it via Protégé and the VidyaAstra plugin.

Machine learning, neural networks and artificial intelligence techniques are commonly used to detect credit card frauds and money laundering schemes. However, these approaches have significant limitations because these techniques rely on statistical models which can be easily fooled by hackers using synthetic data sets or through other methods known as adversarial attacks.

Graph databases and knowledge graphs represent the state of the art in fraud detection.

The reason is that they contain a massive amount of interconnected data, and even if one piece of information is incorrect or missing, the system can still identify fraudulent patterns through relationship analysis.

Knowledge graphs can be used to detect:

  • Fake identities (e.g., people trying to open accounts with fake ID cards)
  • Credit card fraud (e.g., someone applying for credit with a stolen credit card)
  • Money laundering (e.g., circular money flows and layering schemes)

This article presents my approach to building fraud detection knowledge graphs using Protégé and the VidyaAstra plugin, based on industry-standard techniques discussed in fraud detection literature.


Table of Contents

  1. What is a Knowledge Graph?
  2. Why Knowledge Graphs for Fraud Detection?
  3. Limitations of Traditional ML Approaches
  4. The Knowledge Graph Advantage
  5. Building a Fraud Detection Knowledge Graph
  6. Real-World Use Case: Circular Money Flow Detection
  7. Using VidyaAstra with Protégé
  8. Graph Algorithms for Fraud Detection
  9. Getting Started

What is a Knowledge Graph?

A knowledge graph is a database of facts and relations between different entities.

A knowledge graph can be used to represent the world, with objects being concepts or physical things, and their attributes, relationships and metadata. For example, a financial institution could have a knowledge graph containing information about its customers, accounts, loans, transactions, and employees. The institute might also have a separate knowledge graph containing information about its offices and locations.

In fraud detection, a knowledge graph represents:

  • Entities: Accounts, customers, transactions, merchants, IP addresses, devices, locations
  • Relationships: sends_money_to, owns_account, uses_device, located_at, shares_email_with
  • Attributes: transaction amounts, timestamps, risk scores, customer profiles

Graph databases are a natural way of building a knowledge graph because they provide an efficient way of storing relations between entities. A fact can be represented as an entity and its relationship with another entity can be represented as an edge between them. This representation enables us to use graph algorithms on our knowledge graph to find answers to various questions such as whether a user is fraudulent or if a transaction pattern indicates money laundering.


Why Knowledge Graphs for Fraud Detection?

Limitations of Traditional ML Approaches

Limitations of Traditional ML Approaches

Machine learning, neural networks, and AI techniques have made significant strides in fraud detection, but they face critical limitations:

1. Vulnerability to Adversarial Attacks

  • Statistical models can be fooled by synthetic data sets
  • Adversarial attacks can bypass pattern recognition
  • Hackers can craft transactions that appear legitimate to ML models

2. Black Box Problem

  • Neural networks provide predictions without explanations
  • Regulators and compliance officers need to understand WHY a transaction was flagged
  • Difficult to justify account freezing or SAR filing based on opaque model outputs

3. Statistical Limitations

  • Require large amounts of labeled fraud data (which is rare)
  • Struggle with new fraud patterns not seen in training data
  • High false positive rates (often 90%+ in production)
  • Cannot capture complex multi-hop relationships

4. Missing Contextual Understanding

  • Treat transactions in isolation
  • Don't understand relationships between entities
  • Can't reason about patterns like "money returning to origin through intermediaries"

The Knowledge Graph Solution

Knowledge graphs address these limitations by:

Relationship-Native: Connections are first-class citizens, not expensive joins

Context-Aware: Every entity exists within a web of relationships

Explainable: Query results show the exact path of reasoning

Pattern-Based: Define fraud patterns once, detect them everywhere

Robust: Missing or incorrect data doesn't break relationship analysis

Most importantly: Knowledge graphs combine the power of graph algorithms (DFS, cycle detection, community finding) with semantic reasoning (ontologies, inference rules) to detect fraud patterns that are invisible to traditional approaches.


The Knowledge Graph Advantage

How Knowledge Graphs Detect Fraud

The power of knowledge graphs for fraud detection comes from their ability to model and query complex relationships:

Traditional Database Approach:
SELECT * FROM transactions 
WHERE amount > 10000 AND suspicious_flag = TRUE
→ Finds individual suspicious transactions (high false positives)

Knowledge Graph Approach:
MATCH (a1:Account)-[:SENDS_MONEY]->(a2:Account)-[:SENDS_MONEY]->
      (a3:Account)-[:SENDS_MONEY]->(a4:Account)-[:SENDS_MONEY]->(a1)
WHERE a1.id = a2.id
→ Finds circular money flows (actual fraud pattern)
Enter fullscreen mode Exit fullscreen mode

Key Capabilities

1. Multi-Hop Relationship Queries

Find patterns like:

  • Account A sends to B, B sends to C, C sends back to A (circular flow)
  • Multiple accounts sharing the same email, phone, or device ID
  • Transaction chains that end at known fraudulent merchants
  • Short paths between unrelated accounts (potential collusion)

2. Pattern Matching

Define suspicious patterns once in your ontology:

<owl:Class rdf:about="#CircularMoneyFlow">
  <rdfs:subClassOf rdf:resource="#FraudPattern"/>
  <rdfs:comment>
    Money returns to originating account through intermediaries
  </rdfs:comment>
</owl:Class>
Enter fullscreen mode Exit fullscreen mode

Then detect them automatically using graph algorithms and SPARQL queries.

3. Semantic Reasoning

The ontology enables automatic inference:

Facts:
- Transaction_T1 connects Account_A to Account_B
- Account_A shares_email_with Account_C
- Account_C shares_device_with Account_D

Inferred Knowledge:
- Account_A potentially_colluding_with Account_D
- Risk_Score increases due to device sharing
- Pattern matches "Account Takeover" fraud type
Enter fullscreen mode Exit fullscreen mode

Building a Fraud Detection Knowledge Graph

Step 1: Data Modeling

The most important step is creating a graph of relationships between various pieces of information about users and transactions. The key is to associate all available information with account IDs:

Core Entities:

  • Accounts: Unique identifiers for financial accounts
  • Customers: People or businesses who own accounts
  • Transactions: Money transfers between accounts
  • Devices: Phones, computers used to access accounts
  • Locations: IP addresses, physical addresses
  • Merchants: Businesses receiving payments

Relationships to Model:

  • owns_account: Customer → Account
  • sends_money_to: Account → Account
  • uses_device: Account → Device
  • accessed_from: Account → IP Address
  • shares_email: Account → Account
  • shares_phone: Account → Account
  • located_at: Account → Location

Attributes:

  • Account: account_number, creation_date, account_type
  • Transaction: amount, timestamp, currency, status
  • Customer: name, DOB, tax_id, email, phone
  • Device: device_id, device_type, OS, browser

Step 2: Define Suspicious Patterns

Using a knowledge graph, you can build powerful rules that detect known fraudulent behavior:

Common Fraud Patterns to Look For:

A. Common Attributes (Identity Fraud)

  • Multiple accounts using the same email address
  • Multiple accounts using the same phone number
  • Same tax identification number across different names
  • Same device accessing unrelated accounts

B. Circular Money Flow (Money Laundering)

  • Money sent from Account A → B → C → D → back to A
  • Short timeframe between transactions in the cycle
  • Equal or similar amounts at each step
  • No legitimate business relationship between accounts

C. Rapid Transactions (Layering)

  • Short paths between multiple accounts
  • High transaction velocity (many transactions in short time)
  • Large total amount split across many small transactions
  • Transactions outside normal patterns (e.g., 3 AM on weekends)

D. Structuring (Smurfing)

  • Multiple transactions just below reporting threshold ($10,000)
  • Same source account splitting large amount
  • Coordinated timing across different accounts
  • Similar amounts to avoid detection

E. Account Takeover

  • Sudden change in transaction patterns
  • New device or location accessing account
  • Large withdrawals shortly after access change
  • Password/email changes followed by transfers

Step 3: Create the Ontology

Using Protégé and VidyaAstra, you can create an OWL ontology that formally defines your fraud detection domain:

Using VidyaAstra's "Create New Ontology" Mode:

Description:
"Create a fraud detection ontology for anti-money laundering. 
Include the following:

Entities:
- Account (with properties: account_id, balance, account_type, creation_date)
- Customer (with properties: customer_id, name, email, phone, tax_id)
- Transaction (with properties: transaction_id, amount, timestamp, currency)
- Device (with properties: device_id, type, ip_address)
- FraudPattern (parent class for all fraud types)

Fraud Pattern Types:
- CircularMoneyFlow (subclass of FraudPattern)
- MoneyLaundering (subclass of FraudPattern)
- Structuring (subclass of FraudPattern)
- AccountTakeover (subclass of FraudPattern)
- IdentityFraud (subclass of FraudPattern)

Relationships:
- sendsMoneyTo (Account to Account)
- ownsAccount (Customer to Account)
- usesDevice (Account to Device)
- sharesEmail (Account to Account)
- sharesPhone (Account to Account)
- involvedIn (Account to FraudPattern)
- detectedBy (FraudPattern to DetectionAlgorithm)

Detection Algorithms:
- DFS (Depth First Search for cycle detection)
- TarjanSCC (Strongly Connected Components)
- LouvainCommunity (Community detection for fraud rings)

Risk Levels:
- HighRisk, MediumRisk, LowRisk

Include data properties for risk scores, transaction amounts, and timestamps."
Enter fullscreen mode Exit fullscreen mode

VidyaAstra will generate a complete OWL ontology in 20-30 seconds, including:

  • All class definitions with proper hierarchy
  • Object properties with domain and range
  • Data properties with appropriate types
  • Sample individuals for testing

Real-World Use Case: Circular Money Flow Detection

The Scenario

Circular money flow is a classic money laundering technique where funds are moved through a series of accounts and eventually return to the originating account. This creates the appearance of legitimate business activity while obscuring the illicit origin of funds.

Example Pattern:

Account A sends $50,000 → Account B
Account B sends $50,000 → Account C  
Account C sends $50,000 → Account D
Account D sends $50,000 → Account A (returns to origin)
Enter fullscreen mode Exit fullscreen mode

This pattern is difficult to detect with traditional database queries because it requires:

  • Multi-hop relationship traversal (4 steps)
  • Cycle detection algorithms
  • Understanding that the pattern indicates fraud

Building the Ontology with VidyaAstra

Step 1: Create the Fraud Detection Ontology (5 minutes)

Open Protégé and launch the VidyaAstra plugin. Select "Create New Ontology" mode and provide this description:

Create a fraud detection ontology for anti-money laundering with circular money flow detection.

Include:
- Account entities with properties (account_id, balance, creation_date)
- Transaction entities linking accounts
- CircularMoneyFlow fraud pattern class
- MoneyLaundering parent class
- DFS and Tarjan cycle detection algorithms
- Risk levels (High, Medium, Low)
- Investigation and compliance action classes

Add relationships:
- sendsMoneyTo (Account to Account)
- involvedIn (Account to FraudPattern)
- detectedBy (FraudPattern to Algorithm)
- triggers (RiskLevel to Action)

Add data properties:
- riskScore (decimal 0-1)
- cycleLength (integer)
- totalAmount (decimal)
- detectionTimestamp (datetime)
Enter fullscreen mode Exit fullscreen mode

VidyaAstra will generate a complete OWL ontology including all classes, properties, and basic individuals.

Step 2: Add Sample Transaction Data

Add these individuals to your ontology to represent the circular money flow pattern:

<!-- Accounts in the cycle -->
<owl:NamedIndividual rdf:about="#Account_A">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-001</accountId>
  <sendsMoneyTo rdf:resource="#Account_B"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<owl:NamedIndividual rdf:about="#Account_B">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-002</accountId>
  <sendsMoneyTo rdf:resource="#Account_C"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<owl:NamedIndividual rdf:about="#Account_C">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-003</accountId>
  <sendsMoneyTo rdf:resource="#Account_D"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<owl:NamedIndividual rdf:about="#Account_D">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-004</accountId>
  <sendsMoneyTo rdf:resource="#Account_A"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<!-- The fraud pattern instance -->
<owl:NamedIndividual rdf:about="#CircularFlow_001">
  <rdf:type rdf:resource="#CircularMoneyFlow"/>
  <cycleLength rdf:datatype="&xsd;integer">4</cycleLength>
  <totalAmount rdf:datatype="&xsd;decimal">200000</totalAmount>
  <riskScore rdf:datatype="&xsd;decimal">0.92</riskScore>
  <detectedBy rdf:resource="#DFS_Algorithm"/>
  <hasRiskLevel rdf:resource="#HighRisk"/>
</owl:NamedIndividual>
Enter fullscreen mode Exit fullscreen mode

Step 3: Query Using Natural Language

Switch to "Basic Query" mode in VidyaAstra and ask:

"Show me all accounts involved in circular money flow"
Enter fullscreen mode Exit fullscreen mode

VidyaAstra will:

  1. Translate your question to SPARQL
  2. Execute the query against the ontology
  3. Return results in plain English

Example SPARQL generated:

SELECT ?account ?accountId WHERE {
  ?account rdf:type :Account .
  ?account :involvedIn ?pattern .
  ?pattern rdf:type :CircularMoneyFlow .
  ?account :accountId ?accountId .
}
Enter fullscreen mode Exit fullscreen mode

Result:

Found 4 accounts involved in circular money flow:
- Account_A (ACC-001)
- Account_B (ACC-002)
- Account_C (ACC-003)
- Account_D (ACC-004)

Pattern: A → B → C → D → A
Risk Score: 0.92 (High Risk)
Detection Method: DFS Algorithm
Enter fullscreen mode Exit fullscreen mode

Step 4: Run Graph Algorithms

The ontology structure allows you to run various graph algorithms to detect and analyze fraud patterns:

Step 4: Run Graph Algorithms

The ontology structure allows you to run various graph algorithms to detect and analyze fraud patterns:

1. Depth-First Search (DFS) for Cycle Detection

DFS is used to detect cycles in the transaction graph. It traverses the graph starting from each account and checks if it can return to the starting point.

# SPARQL query to find 4-hop cycles using DFS pattern
SELECT ?a1 ?a2 ?a3 ?a4
WHERE {
  ?a1 :sendsMoneyTo ?a2 .
  ?a2 :sendsMoneyTo ?a3 .
  ?a3 :sendsMoneyTo ?a4 .
  ?a4 :sendsMoneyTo ?a1 .
}
Enter fullscreen mode Exit fullscreen mode

Complexity: O(V + E) where V = accounts, E = transactions

Best for: Finding simple cycles quickly

2. Tarjan's Strongly Connected Components (SCC)

Identifies groups of accounts where money can flow between any two accounts in the group. This is more sophisticated than simple cycle detection.

# Find accounts that are part of strongly connected components
SELECT ?account
WHERE {
  ?account :sendsMoneyTo+ ?otherAccount .
  ?otherAccount :sendsMoneyTo+ ?account .
  FILTER(?account != ?otherAccount)
}
Enter fullscreen mode Exit fullscreen mode

Best for: Detecting complex fraud rings where money circulates among multiple accounts

3. Louvain Algorithm for Community Detection

Groups accounts into communities based on transaction patterns. Fraudulent accounts often form tight communities.

Use case: Identify clusters of accounts that primarily transact with each other, suggesting coordination or collusion.

4. PageRank for Account Importance

Assigns importance scores to accounts based on incoming and outgoing transaction patterns.

Use case: Identify "hub" accounts that are central to money laundering operations.

5. Shortest Path Analysis

Find the shortest path between two accounts to understand how money flows.

# Using property paths to find connections
SELECT ?intermediateAccount
WHERE {
  :SuspiciousAccount_A :sendsMoneyTo+ ?intermediateAccount .
  ?intermediateAccount :sendsMoneyTo+ :SuspiciousAccount_B .
}
Enter fullscreen mode Exit fullscreen mode

Use case: Track how illicit funds move from source to destination.


Using VidyaAstra with Protégé

VidyaAstra extends Protégé with three key capabilities that make fraud detection ontology development accessible:

1. Basic Query Mode - Natural Language to SPARQL

Instead of writing complex SPARQL queries manually, fraud analysts can ask questions in plain English:

Traditional Approach:

SELECT ?account ?riskScore
WHERE {
  ?account :involvedIn ?pattern .
  ?pattern rdf:type :CircularMoneyFlow .
  ?pattern :riskScore ?riskScore .
  FILTER (?riskScore > 0.8)
}
Enter fullscreen mode Exit fullscreen mode

VidyaAstra Approach:

Simply ask: "Which accounts are involved in high-risk circular money flows?"
Enter fullscreen mode Exit fullscreen mode

The plugin:

  1. Analyzes the current ontology structure
  2. Sends the context + question to an LLM (GPT-4, Claude, Nvidia)
  3. Gets back valid SPARQL
  4. Executes it and returns results in plain English

2. Create New Ontology Mode - AI-Generated OWL

Instead of manually creating classes, properties, and individuals in OWL/RDF XML, describe what you need:

"Create a fraud detection ontology for credit card fraud including:
- Transaction entities with amount, timestamp, merchant
- Customer entities with account info
- Fraud patterns: velocity checks, geographic anomalies, merchant risk
- Detection rules for unusual spending patterns"
Enter fullscreen mode Exit fullscreen mode

VidyaAstra generates a complete, valid OWL ontology in ~20 seconds.

3. Modify Ontology Mode - Intelligent Updates

Extend existing ontologies without manual XML editing:

"Add a new fraud type called 'Account Takeover' that includes:
- Login from new device
- Password change
- Followed by large withdrawal
Link it to HighRisk level"
Enter fullscreen mode Exit fullscreen mode

The plugin updates your ontology, validates consistency, and applies changes immediately.


Graph Algorithms for Fraud Detection

Here are the key graph algorithms used in knowledge graph-based fraud detection systems:

1. Cycle Detection Algorithms

Depth-First Search (DFS)

  • Purpose: Find circular money flows
  • How it works: Traverses the graph and maintains a recursion stack to detect back edges (cycles)
  • Complexity: O(V + E)
  • Implementation in SPARQL:
SELECT ?start ?end
WHERE {
  ?start :sendsMoneyTo+ ?end .
  ?end :sendsMoneyTo+ ?start .
  FILTER(?start != ?end)
}
Enter fullscreen mode Exit fullscreen mode

Tarjan's Strongly Connected Components

  • Purpose: Find groups of accounts where money circulates
  • How it works: Uses DFS with low-link values to identify SCCs in one pass
  • Complexity: O(V + E)
  • Use case: Detect complex fraud rings, not just simple cycles

2. Path Finding Algorithms

Shortest Path (Dijkstra/Bellman-Ford)

  • Purpose: Find how money moves from point A to B
  • Use case: Track layering schemes where money passes through multiple intermediaries
  • SPARQL Property Paths:
SELECT (COUNT(?intermediate) AS ?pathLength)
WHERE {
  :Account_A :sendsMoneyTo+ ?intermediate .
  ?intermediate :sendsMoneyTo+ :Account_B .
}
Enter fullscreen mode Exit fullscreen mode

All Paths Enumeration

  • Purpose: Find all possible routes money can take
  • Use case: Identify alternative laundering paths, redundant connections

3. Community Detection Algorithms

Louvain Algorithm

  • Purpose: Identify clusters of accounts that transact primarily with each other
  • How it works: Optimizes modularity by iteratively moving nodes between communities
  • Use case: Detect organized fraud rings, mule account networks

Label Propagation

  • Purpose: Fast community detection for large graphs
  • How it works: Nodes adopt the most common label among their neighbors
  • Use case: Real-time fraud ring detection

4. Centrality Algorithms

PageRank

  • Purpose: Identify important "hub" accounts in the transaction network
  • Use case: Find money mules, central accounts in laundering operations

Betweenness Centrality

  • Purpose: Find accounts that act as bridges between different parts of the network
  • Use case: Identify intermediary accounts used for layering

Degree Centrality

  • Purpose: Count incoming/outgoing transactions per account
  • Use case: Detect accounts with unusual transaction volumes

5. Pattern Matching Algorithms

Subgraph Isomorphism

  • Purpose: Find instances of known fraud patterns in the transaction graph
  • How it works: Match a template pattern against the full graph
  • SPARQL Example:
# Match the "smurfing" pattern: one source, multiple small transactions
SELECT ?source (COUNT(?dest) AS ?numTransactions) (SUM(?amount) AS ?total)
WHERE {
  ?source :sendsMoneyTo ?dest .
  ?transaction :from ?source ;
               :to ?dest ;
               :amount ?amount .
  FILTER(?amount < 10000)
}
GROUP BY ?source
HAVING (COUNT(?dest) > 10 && SUM(?amount) > 50000)
Enter fullscreen mode Exit fullscreen mode

6. Temporal Analysis

Time-Window Queries

  • Purpose: Detect rapid transaction sequences
  • Use case: Layering detection, velocity checks
SELECT ?account (COUNT(?tx) AS ?count)
WHERE {
  ?tx :fromAccount ?account ;
      :timestamp ?time .
  FILTER(?time > "2024-11-23T00:00:00"^^xsd:dateTime &&
         ?time < "2024-11-23T01:00:00"^^xsd:dateTime)
}
GROUP BY ?account
HAVING (COUNT(?tx) > 10)
Enter fullscreen mode Exit fullscreen mode

Visualization and Manual Inspection

In addition to automated graph algorithms, fraud analysts need to visually inspect suspicious patterns. Protégé's built-in visualization tools, combined with VidyaAstra's query capabilities, allow manual exploration:

OntoGraf Visualization

  1. Open OntoGraf view in Protégé
  2. Select a suspicious account (e.g., Account_A)
  3. Visualize relationships (:sendsMoneyTo, :sharesEmail, etc.)
  4. Manually trace money flow paths

SPARQL-Based Exploration

Use VidyaAstra to iteratively drill down:

Query 1: "Show accounts with more than 5 outgoing transactions"
Query 2: "Which of these accounts share email addresses?"
Query 3: "Show the transaction history for Account_XYZ"
Query 4: "Are any of these accounts involved in fraud patterns?"
Enter fullscreen mode Exit fullscreen mode

This iterative, conversational approach combines automated detection with human expertise.


Integration with External Graph Databases

While Protégé is excellent for ontology development and testing, production fraud detection systems typically use dedicated graph databases for scalability:

Integration Architecture

┌─────────────────────────────────────────┐
│  Protégé + VidyaAstra                   │
│  • Ontology design & testing            │
│  • Pattern definition                   │
│  • Query prototyping                    │
└─────────────────────────────────────────┘
          ↓ (Export OWL)
┌─────────────────────────────────────────┐
│  Graph Database (Production)            │
│  • Apache Jena Fuseki                   │
│  • GraphDB                              │
│  • Neo4j (with neosemantics plugin)     │
│  • Amazon Neptune                       │
└─────────────────────────────────────────┘
          ↓
┌─────────────────────────────────────────┐
│  Real-time Transaction Processing       │
│  • Stream processing (Kafka, Flink)     │
│  • Pattern matching                     │
│  • Alert generation                     │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Export Process

  1. Design ontology in Protégé with VidyaAstra
  2. Test queries on sample data
  3. Export OWL file
  4. Load into production triple store:
# Apache Jena Fuseki
curl -X POST \
  -H "Content-Type: application/rdf+xml" \
  --data-binary @fraud-detection-ontology.owl \
  http://localhost:3030/fraud/data

# GraphDB
curl -X POST \
  -H "Content-Type: application/rdf+xml" \
  --data-binary @fraud-detection-ontology.owl \
  http://localhost:7200/repositories/fraud/statements
Enter fullscreen mode Exit fullscreen mode
  1. Populate with production transaction data
  2. Run graph algorithms at scale

Getting Started


Getting Started

Prerequisites

  1. Protégé 5.6.4+ - Download from https://protege.stanford.edu/
  2. Java 11+ - Required for running Protégé and plugins
  3. LLM API Access - OpenAI, Anthropic Claude, or Nvidia NGC API key
  4. VidyaAstra Plugin - Download from https://github.com/vishalmysore/vidyaastra-plugin

Installation

Step 1: Install Protégé

Download and install Protégé for your operating system.

Step 2: Install VidyaAstra Plugin

# Windows
Copy-Item vidyaastra-1.0.1.jar "C:\Program Files\Protege-5.6.7\plugins\"

# macOS
cp vidyaastra-1.0.1.jar "/Applications/Protege.app/Contents/Java/plugins/"

# Linux
cp vidyaastra-1.0.1.jar "$HOME/Protege-5.6.7/plugins/"
Enter fullscreen mode Exit fullscreen mode

Step 3: Launch Protégé and Activate Plugin

  1. Start Protégé
  2. Go to Window → Views → Ontology Views → VidyaAstra View
  3. The VidyaAstra panel will appear

Step 4: Configure API Key

Enter your OpenAI/Claude/Nvidia API key in the VidyaAstra preferences.

Quick Start Example

Create Your First Fraud Detection Ontology:

  1. Select "Create New Ontology" mode
  2. Enter this description:
Create a fraud detection ontology for money laundering detection.
Include:
- Account and Transaction entities
- CircularMoneyFlow fraud pattern
- sendsMoneyTo relationship
- DFS cycle detection algorithm
- Risk levels and scores
Enter fullscreen mode Exit fullscreen mode
  1. Click "Ask AI" and wait 20-30 seconds
  2. Save the generated ontology as fraud-detection.owl

Query Your Ontology:

  1. Switch to "Basic Query" mode
  2. Ask: "Show me all circular money flow patterns"
  3. VidyaAstra translates to SPARQL and returns results

Modify Your Ontology:

  1. Switch to "Modify Ontology" mode
  2. Request: "Add a Structuring fraud pattern for transactions below $10,000"
  3. Changes are applied and validated automatically

Technical Implementation Details

How VidyaAstra Works

1. Natural Language Query Processing

// Simplified flow
String userQuery = "Which accounts have high risk scores?";

// 1. Extract ontology context
String context = extractClassesAndProperties(activeOntology);

// 2. Build LLM prompt
String prompt = "Given this ontology:\n" + context + 
                "\nTranslate to SPARQL: " + userQuery;

// 3. Call LLM
String sparqlQuery = llm.complete(prompt);

// 4. Execute query
ResultSet results = ontology.executeQuery(sparqlQuery);

// 5. Format results
String answer = formatAsNaturalLanguage(results);
Enter fullscreen mode Exit fullscreen mode

2. AI Ontology Generation

// Simplified flow
String description = "Create fraud detection ontology...";

// 1. Generate with strict prompt
String systemPrompt = "Generate valid OWL/RDF XML only. " +
                      "No markdown, no explanations.";

// 2. Get LLM response
String owlXml = llm.complete(systemPrompt, description);

// 3. Clean and validate
owlXml = removeMarkdown(owlXml);
owlXml = fixCommonXmlIssues(owlXml);

// 4. Validate with OWL API
OWLOntology ont = manager.loadFromString(owlXml);

// 5. Save
saveOntology(ont, "generated-ontology.owl");
Enter fullscreen mode Exit fullscreen mode

SPARQL Query Examples

Find Circular Money Flows:

PREFIX : <http://example.org/fraud#>

SELECT DISTINCT ?account1 ?account2 ?account3 ?account4
WHERE {
  ?account1 :sendsMoneyTo ?account2 .
  ?account2 :sendsMoneyTo ?account3 .
  ?account3 :sendsMoneyTo ?account4 .
  ?account4 :sendsMoneyTo ?account1 .
}
Enter fullscreen mode Exit fullscreen mode

Find Accounts Sharing Email:

SELECT ?account1 ?account2 ?email
WHERE {
  ?account1 :hasEmail ?email .
  ?account2 :hasEmail ?email .
  FILTER(?account1 != ?account2)
}
Enter fullscreen mode Exit fullscreen mode

Find High-Risk Patterns:

SELECT ?pattern ?riskScore
WHERE {
  ?pattern rdf:type :FraudPattern .
  ?pattern :riskScore ?riskScore .
  FILTER(?riskScore > 0.80)
}
ORDER BY DESC(?riskScore)
Enter fullscreen mode Exit fullscreen mode

Temporal Analysis - Rapid Transactions:

SELECT ?account (COUNT(?tx) AS ?txCount)
WHERE {
  ?tx :fromAccount ?account ;
      :timestamp ?time .
  FILTER(?time >= "2024-11-23T00:00:00"^^xsd:dateTime &&
         ?time <= "2024-11-23T02:00:00"^^xsd:dateTime)
}
GROUP BY ?account
HAVING (COUNT(?tx) > 5)
Enter fullscreen mode Exit fullscreen mode

Conclusion

Why Knowledge Graphs for Fraud Detection

Fraud detection is fundamentally a relationship problem:

  • Money flows through networks of accounts
  • Fraudsters create patterns across transactions
  • Detection requires multi-hop analysis
  • Explanations need semantic context

Traditional approaches struggle with:

  • ML/Neural Networks: Black boxes vulnerable to adversarial attacks, can't explain decisions
  • Rule-Based Systems: Brittle, high false positives, miss complex patterns
  • SQL Databases: Multi-hop queries are slow and complex

Knowledge graphs solve these problems by natively representing relationships and enabling graph algorithms.

Why Protégé + VidyaAstra

Protégé provides:

  • Industry-standard OWL ontology editor
  • SPARQL query engine
  • Reasoning capabilities (Pellet, HermiT, ELK)
  • Visualization tools

VidyaAstra adds:

  • Natural language query interface (no SPARQL expertise needed)
  • AI-powered ontology generation (minutes vs. weeks)
  • Intelligent ontology modification
  • Multi-LLM support (OpenAI, Claude, Nvidia)

Together, they enable fraud analysts to build and query knowledge graphs without deep technical expertise in ontologies or SPARQL.

Next Steps

  1. Download Protégé and VidyaAstra
  2. Create your first fraud detection ontology using the examples in this article
  3. Load sample transaction data
  4. Query using natural language
  5. Extend with your specific fraud patterns
  6. Deploy to production graph database when ready

Resources

Software

Documentation

Sample Ontology

The complete fraud detection ontology example is available in this repository:

  • File: fraud-detection-ontology.owl
  • Includes: 28 classes, 12 object properties, 8 data properties
  • Sample Data: Circular money flow with 4 accounts

About

Author: Vishal Mysore

Repository: https://github.com/vishalmysore/vidyaastra-plugin

Disclaimer

This article presents my approach to fraud detection using knowledge graphs, building on industry-standard techniques with Protégé and the VidyaAstra plugin. The circular money flow use case is a well-documented fraud pattern in financial crime literature and has been covered in many articles before, and this implementation demonstrates how ontologies and graph algorithms can detect such patterns effectively.

https://medium.com/neo4j/find-circular-money-flow-with-neo4j-c9138e1c3183
https://www.journalofaccountancy.com/issues/2009/dec/20091793/
https://digitaldealer.com/news/circular-bank-statement-fraud-the-new-synthetic-income-scam-dealers-lenders-must-fight/168087/

Important Notes:

  • The views and techniques presented here are my own/
  • This is an educational demonstration using publicly available fraud detection patterns documented in academic and industry literature
  • The examples use fictional data and scenarios for illustration purposes only
  • This implementation is not production-ready and should not be used for actual fraud detection without proper validation, compliance review, and security hardening
  • Organizations implementing fraud detection systems should consult with their legal, compliance, and security teams

Top comments (0)