vishalmysore

Posted on Nov 23

Fraud Detection with Knowledge Graphs: A Protégé and VidyaAstra Approach

#cybersecurity #database #tutorial

Fraud detection is one of the most common use cases presented for knowledge graph applications. It has been already covered in a variety of articles and implementations across the industry. I will present how to do it via Protégé and the VidyaAstra plugin.

Machine learning, neural networks and artificial intelligence techniques are commonly used to detect credit card frauds and money laundering schemes. However, these approaches have significant limitations because these techniques rely on statistical models which can be easily fooled by hackers using synthetic data sets or through other methods known as adversarial attacks.

Graph databases and knowledge graphs represent the state of the art in fraud detection.

The reason is that they contain a massive amount of interconnected data, and even if one piece of information is incorrect or missing, the system can still identify fraudulent patterns through relationship analysis.

Knowledge graphs can be used to detect:

Fake identities (e.g., people trying to open accounts with fake ID cards)
Credit card fraud (e.g., someone applying for credit with a stolen credit card)
Money laundering (e.g., circular money flows and layering schemes)

This article presents my approach to building fraud detection knowledge graphs using Protégé and the VidyaAstra plugin, based on industry-standard techniques discussed in fraud detection literature.

What is a Knowledge Graph?
Why Knowledge Graphs for Fraud Detection?
Limitations of Traditional ML Approaches
The Knowledge Graph Advantage
Building a Fraud Detection Knowledge Graph
Real-World Use Case: Circular Money Flow Detection
Using VidyaAstra with Protégé
Graph Algorithms for Fraud Detection
Getting Started

What is a Knowledge Graph?

A knowledge graph is a database of facts and relations between different entities.

A knowledge graph can be used to represent the world, with objects being concepts or physical things, and their attributes, relationships and metadata. For example, a financial institution could have a knowledge graph containing information about its customers, accounts, loans, transactions, and employees. The institute might also have a separate knowledge graph containing information about its offices and locations.

In fraud detection, a knowledge graph represents:

Entities: Accounts, customers, transactions, merchants, IP addresses, devices, locations
Relationships: sends_money_to, owns_account, uses_device, located_at, shares_email_with
Attributes: transaction amounts, timestamps, risk scores, customer profiles

Graph databases are a natural way of building a knowledge graph because they provide an efficient way of storing relations between entities. A fact can be represented as an entity and its relationship with another entity can be represented as an edge between them. This representation enables us to use graph algorithms on our knowledge graph to find answers to various questions such as whether a user is fraudulent or if a transaction pattern indicates money laundering.

Why Knowledge Graphs for Fraud Detection?

Limitations of Traditional ML Approaches

Machine learning, neural networks, and AI techniques have made significant strides in fraud detection, but they face critical limitations:

1. Vulnerability to Adversarial Attacks

Statistical models can be fooled by synthetic data sets
Adversarial attacks can bypass pattern recognition
Hackers can craft transactions that appear legitimate to ML models

2. Black Box Problem

Neural networks provide predictions without explanations
Regulators and compliance officers need to understand WHY a transaction was flagged
Difficult to justify account freezing or SAR filing based on opaque model outputs

3. Statistical Limitations

Require large amounts of labeled fraud data (which is rare)
Struggle with new fraud patterns not seen in training data
High false positive rates (often 90%+ in production)
Cannot capture complex multi-hop relationships

4. Missing Contextual Understanding

Treat transactions in isolation
Don't understand relationships between entities
Can't reason about patterns like "money returning to origin through intermediaries"

The Knowledge Graph Solution

Knowledge graphs address these limitations by:

✅ Relationship-Native: Connections are first-class citizens, not expensive joins

✅ Context-Aware: Every entity exists within a web of relationships

✅ Explainable: Query results show the exact path of reasoning

✅ Pattern-Based: Define fraud patterns once, detect them everywhere

✅ Robust: Missing or incorrect data doesn't break relationship analysis

Most importantly: Knowledge graphs combine the power of graph algorithms (DFS, cycle detection, community finding) with semantic reasoning (ontologies, inference rules) to detect fraud patterns that are invisible to traditional approaches.

The Knowledge Graph Advantage

How Knowledge Graphs Detect Fraud

The power of knowledge graphs for fraud detection comes from their ability to model and query complex relationships:

Traditional Database Approach:
SELECT * FROM transactions 
WHERE amount > 10000 AND suspicious_flag = TRUE
→ Finds individual suspicious transactions (high false positives)

Knowledge Graph Approach:
MATCH (a1:Account)-[:SENDS_MONEY]->(a2:Account)-[:SENDS_MONEY]->
      (a3:Account)-[:SENDS_MONEY]->(a4:Account)-[:SENDS_MONEY]->(a1)
WHERE a1.id = a2.id
→ Finds circular money flows (actual fraud pattern)

Key Capabilities

1. Multi-Hop Relationship Queries

Find patterns like:

Account A sends to B, B sends to C, C sends back to A (circular flow)
Multiple accounts sharing the same email, phone, or device ID
Transaction chains that end at known fraudulent merchants
Short paths between unrelated accounts (potential collusion)

2. Pattern Matching

Define suspicious patterns once in your ontology:

<owl:Class rdf:about="#CircularMoneyFlow">
  <rdfs:subClassOf rdf:resource="#FraudPattern"/>
  <rdfs:comment>
    Money returns to originating account through intermediaries
  </rdfs:comment>
</owl:Class>

Then detect them automatically using graph algorithms and SPARQL queries.

3. Semantic Reasoning

The ontology enables automatic inference:

Facts:
- Transaction_T1 connects Account_A to Account_B
- Account_A shares_email_with Account_C
- Account_C shares_device_with Account_D

Inferred Knowledge:
- Account_A potentially_colluding_with Account_D
- Risk_Score increases due to device sharing
- Pattern matches "Account Takeover" fraud type

Building a Fraud Detection Knowledge Graph

Step 1: Data Modeling

The most important step is creating a graph of relationships between various pieces of information about users and transactions. The key is to associate all available information with account IDs:

Core Entities:

Accounts: Unique identifiers for financial accounts
Customers: People or businesses who own accounts
Transactions: Money transfers between accounts
Devices: Phones, computers used to access accounts
Locations: IP addresses, physical addresses
Merchants: Businesses receiving payments

Relationships to Model:

owns_account: Customer → Account
sends_money_to: Account → Account
uses_device: Account → Device
accessed_from: Account → IP Address
shares_email: Account → Account
shares_phone: Account → Account
located_at: Account → Location

Attributes:

Account: account_number, creation_date, account_type
Transaction: amount, timestamp, currency, status
Customer: name, DOB, tax_id, email, phone
Device: device_id, device_type, OS, browser

Step 2: Define Suspicious Patterns

Using a knowledge graph, you can build powerful rules that detect known fraudulent behavior:

Common Fraud Patterns to Look For:

A. Common Attributes (Identity Fraud)

Multiple accounts using the same email address
Multiple accounts using the same phone number
Same tax identification number across different names
Same device accessing unrelated accounts

B. Circular Money Flow (Money Laundering)

Money sent from Account A → B → C → D → back to A
Short timeframe between transactions in the cycle
Equal or similar amounts at each step
No legitimate business relationship between accounts

C. Rapid Transactions (Layering)

Short paths between multiple accounts
High transaction velocity (many transactions in short time)
Large total amount split across many small transactions
Transactions outside normal patterns (e.g., 3 AM on weekends)

D. Structuring (Smurfing)

Multiple transactions just below reporting threshold ($10,000)
Same source account splitting large amount
Coordinated timing across different accounts
Similar amounts to avoid detection

E. Account Takeover

Sudden change in transaction patterns
New device or location accessing account
Large withdrawals shortly after access change
Password/email changes followed by transfers

Step 3: Create the Ontology

Using Protégé and VidyaAstra, you can create an OWL ontology that formally defines your fraud detection domain:

Using VidyaAstra's "Create New Ontology" Mode:

Description:
"Create a fraud detection ontology for anti-money laundering. 
Include the following:

Entities:
- Account (with properties: account_id, balance, account_type, creation_date)
- Customer (with properties: customer_id, name, email, phone, tax_id)
- Transaction (with properties: transaction_id, amount, timestamp, currency)
- Device (with properties: device_id, type, ip_address)
- FraudPattern (parent class for all fraud types)

Fraud Pattern Types:
- CircularMoneyFlow (subclass of FraudPattern)
- MoneyLaundering (subclass of FraudPattern)
- Structuring (subclass of FraudPattern)
- AccountTakeover (subclass of FraudPattern)
- IdentityFraud (subclass of FraudPattern)

Relationships:
- sendsMoneyTo (Account to Account)
- ownsAccount (Customer to Account)
- usesDevice (Account to Device)
- sharesEmail (Account to Account)
- sharesPhone (Account to Account)
- involvedIn (Account to FraudPattern)
- detectedBy (FraudPattern to DetectionAlgorithm)

Detection Algorithms:
- DFS (Depth First Search for cycle detection)
- TarjanSCC (Strongly Connected Components)
- LouvainCommunity (Community detection for fraud rings)

Risk Levels:
- HighRisk, MediumRisk, LowRisk

Include data properties for risk scores, transaction amounts, and timestamps."

VidyaAstra will generate a complete OWL ontology in 20-30 seconds, including:

All class definitions with proper hierarchy
Object properties with domain and range
Data properties with appropriate types
Sample individuals for testing

Real-World Use Case: Circular Money Flow Detection

The Scenario

Circular money flow is a classic money laundering technique where funds are moved through a series of accounts and eventually return to the originating account. This creates the appearance of legitimate business activity while obscuring the illicit origin of funds.

Example Pattern:

Account A sends $50,000 → Account B
Account B sends $50,000 → Account C  
Account C sends $50,000 → Account D
Account D sends $50,000 → Account A (returns to origin)

This pattern is difficult to detect with traditional database queries because it requires:

Multi-hop relationship traversal (4 steps)
Cycle detection algorithms
Understanding that the pattern indicates fraud

Building the Ontology with VidyaAstra

Step 1: Create the Fraud Detection Ontology (5 minutes)

Open Protégé and launch the VidyaAstra plugin. Select "Create New Ontology" mode and provide this description:

Create a fraud detection ontology for anti-money laundering with circular money flow detection.

Include:
- Account entities with properties (account_id, balance, creation_date)
- Transaction entities linking accounts
- CircularMoneyFlow fraud pattern class
- MoneyLaundering parent class
- DFS and Tarjan cycle detection algorithms
- Risk levels (High, Medium, Low)
- Investigation and compliance action classes

Add relationships:
- sendsMoneyTo (Account to Account)
- involvedIn (Account to FraudPattern)
- detectedBy (FraudPattern to Algorithm)
- triggers (RiskLevel to Action)

Add data properties:
- riskScore (decimal 0-1)
- cycleLength (integer)
- totalAmount (decimal)
- detectionTimestamp (datetime)

VidyaAstra will generate a complete OWL ontology including all classes, properties, and basic individuals.

Step 2: Add Sample Transaction Data

Add these individuals to your ontology to represent the circular money flow pattern:

<!-- Accounts in the cycle -->
<owl:NamedIndividual rdf:about="#Account_A">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-001</accountId>
  <sendsMoneyTo rdf:resource="#Account_B"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<owl:NamedIndividual rdf:about="#Account_B">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-002</accountId>
  <sendsMoneyTo rdf:resource="#Account_C"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<owl:NamedIndividual rdf:about="#Account_C">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-003</accountId>
  <sendsMoneyTo rdf:resource="#Account_D"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<owl:NamedIndividual rdf:about="#Account_D">
  <rdf:type rdf:resource="#Account"/>
  <accountId>ACC-004</accountId>
  <sendsMoneyTo rdf:resource="#Account_A"/>
  <involvedIn rdf:resource="#CircularFlow_001"/>
</owl:NamedIndividual>

<!-- The fraud pattern instance -->
<owl:NamedIndividual rdf:about="#CircularFlow_001">
  <rdf:type rdf:resource="#CircularMoneyFlow"/>
  <cycleLength rdf:datatype="&xsd;integer">4</cycleLength>
  <totalAmount rdf:datatype="&xsd;decimal">200000</totalAmount>
  <riskScore rdf:datatype="&xsd;decimal">0.92</riskScore>
  <detectedBy rdf:resource="#DFS_Algorithm"/>
  <hasRiskLevel rdf:resource="#HighRisk"/>
</owl:NamedIndividual>

Step 3: Query Using Natural Language

Switch to "Basic Query" mode in VidyaAstra and ask:

"Show me all accounts involved in circular money flow"

VidyaAstra will:

Translate your question to SPARQL
Execute the query against the ontology
Return results in plain English

Example SPARQL generated:

SELECT ?account ?accountId WHERE {
  ?account rdf:type :Account .
  ?account :involvedIn ?pattern .
  ?pattern rdf:type :CircularMoneyFlow .
  ?account :accountId ?accountId .
}

Result:

Found 4 accounts involved in circular money flow:
- Account_A (ACC-001)
- Account_B (ACC-002)
- Account_C (ACC-003)
- Account_D (ACC-004)

Pattern: A → B → C → D → A
Risk Score: 0.92 (High Risk)
Detection Method: DFS Algorithm

Step 4: Run Graph Algorithms

The ontology structure allows you to run various graph algorithms to detect and analyze fraud patterns:

Step 4: Run Graph Algorithms

The ontology structure allows you to run various graph algorithms to detect and analyze fraud patterns:

1. Depth-First Search (DFS) for Cycle Detection

DFS is used to detect cycles in the transaction graph. It traverses the graph starting from each account and checks if it can return to the starting point.

# SPARQL query to find 4-hop cycles using DFS pattern
SELECT ?a1 ?a2 ?a3 ?a4
WHERE {
  ?a1 :sendsMoneyTo ?a2 .
  ?a2 :sendsMoneyTo ?a3 .
  ?a3 :sendsMoneyTo ?a4 .
  ?a4 :sendsMoneyTo ?a1 .
}

Complexity: O(V + E) where V = accounts, E = transactions

Best for: Finding simple cycles quickly

2. Tarjan's Strongly Connected Components (SCC)

Identifies groups of accounts where money can flow between any two accounts in the group. This is more sophisticated than simple cycle detection.

# Find accounts that are part of strongly connected components
SELECT ?account
WHERE {
  ?account :sendsMoneyTo+ ?otherAccount .
  ?otherAccount :sendsMoneyTo+ ?account .
  FILTER(?account != ?otherAccount)
}

Best for: Detecting complex fraud rings where money circulates among multiple accounts

3. Louvain Algorithm for Community Detection

Groups accounts into communities based on transaction patterns. Fraudulent accounts often form tight communities.

Use case: Identify clusters of accounts that primarily transact with each other, suggesting coordination or collusion.

4. PageRank for Account Importance

Assigns importance scores to accounts based on incoming and outgoing transaction patterns.

Use case: Identify "hub" accounts that are central to money laundering operations.

5. Shortest Path Analysis

Find the shortest path between two accounts to understand how money flows.

# Using property paths to find connections
SELECT ?intermediateAccount
WHERE {
  :SuspiciousAccount_A :sendsMoneyTo+ ?intermediateAccount .
  ?intermediateAccount :sendsMoneyTo+ :SuspiciousAccount_B .
}

Use case: Track how illicit funds move from source to destination.

Using VidyaAstra with Protégé

VidyaAstra extends Protégé with three key capabilities that make fraud detection ontology development accessible:

1. Basic Query Mode - Natural Language to SPARQL

Instead of writing complex SPARQL queries manually, fraud analysts can ask questions in plain English:

Traditional Approach:

SELECT ?account ?riskScore
WHERE {
  ?account :involvedIn ?pattern .
  ?pattern rdf:type :CircularMoneyFlow .
  ?pattern :riskScore ?riskScore .
  FILTER (?riskScore > 0.8)
}

VidyaAstra Approach:

Simply ask: "Which accounts are involved in high-risk circular money flows?"

The plugin:

Analyzes the current ontology structure
Sends the context + question to an LLM (GPT-4, Claude, Nvidia)
Gets back valid SPARQL
Executes it and returns results in plain English

2. Create New Ontology Mode - AI-Generated OWL

Instead of manually creating classes, properties, and individuals in OWL/RDF XML, describe what you need:

"Create a fraud detection ontology for credit card fraud including:
- Transaction entities with amount, timestamp, merchant
- Customer entities with account info
- Fraud patterns: velocity checks, geographic anomalies, merchant risk
- Detection rules for unusual spending patterns"

VidyaAstra generates a complete, valid OWL ontology in ~20 seconds.

3. Modify Ontology Mode - Intelligent Updates

Extend existing ontologies without manual XML editing:

"Add a new fraud type called 'Account Takeover' that includes:
- Login from new device
- Password change
- Followed by large withdrawal
Link it to HighRisk level"

The plugin updates your ontology, validates consistency, and applies changes immediately.

Graph Algorithms for Fraud Detection

Here are the key graph algorithms used in knowledge graph-based fraud detection systems:

1. Cycle Detection Algorithms

Depth-First Search (DFS)

Purpose: Find circular money flows
How it works: Traverses the graph and maintains a recursion stack to detect back edges (cycles)
Complexity: O(V + E)
Implementation in SPARQL:

SELECT ?start ?end
WHERE {
  ?start :sendsMoneyTo+ ?end .
  ?end :sendsMoneyTo+ ?start .
  FILTER(?start != ?end)
}

Tarjan's Strongly Connected Components

Purpose: Find groups of accounts where money circulates
How it works: Uses DFS with low-link values to identify SCCs in one pass
Complexity: O(V + E)
Use case: Detect complex fraud rings, not just simple cycles

2. Path Finding Algorithms

Shortest Path (Dijkstra/Bellman-Ford)

Purpose: Find how money moves from point A to B
Use case: Track layering schemes where money passes through multiple intermediaries
SPARQL Property Paths:

SELECT (COUNT(?intermediate) AS ?pathLength)
WHERE {
  :Account_A :sendsMoneyTo+ ?intermediate .
  ?intermediate :sendsMoneyTo+ :Account_B .
}

All Paths Enumeration

Purpose: Find all possible routes money can take
Use case: Identify alternative laundering paths, redundant connections

3. Community Detection Algorithms

Louvain Algorithm

Purpose: Identify clusters of accounts that transact primarily with each other
How it works: Optimizes modularity by iteratively moving nodes between communities
Use case: Detect organized fraud rings, mule account networks

Label Propagation

Purpose: Fast community detection for large graphs
How it works: Nodes adopt the most common label among their neighbors
Use case: Real-time fraud ring detection

4. Centrality Algorithms

PageRank

Purpose: Identify important "hub" accounts in the transaction network
Use case: Find money mules, central accounts in laundering operations

Betweenness Centrality

Purpose: Find accounts that act as bridges between different parts of the network
Use case: Identify intermediary accounts used for layering

Degree Centrality

Purpose: Count incoming/outgoing transactions per account
Use case: Detect accounts with unusual transaction volumes

5. Pattern Matching Algorithms

Subgraph Isomorphism

Purpose: Find instances of known fraud patterns in the transaction graph
How it works: Match a template pattern against the full graph
SPARQL Example:

# Match the "smurfing" pattern: one source, multiple small transactions
SELECT ?source (COUNT(?dest) AS ?numTransactions) (SUM(?amount) AS ?total)
WHERE {
  ?source :sendsMoneyTo ?dest .
  ?transaction :from ?source ;
               :to ?dest ;
               :amount ?amount .
  FILTER(?amount < 10000)
}
GROUP BY ?source
HAVING (COUNT(?dest) > 10 && SUM(?amount) > 50000)

6. Temporal Analysis

Time-Window Queries

Purpose: Detect rapid transaction sequences
Use case: Layering detection, velocity checks

SELECT ?account (COUNT(?tx) AS ?count)
WHERE {
  ?tx :fromAccount ?account ;
      :timestamp ?time .
  FILTER(?time > "2024-11-23T00:00:00"^^xsd:dateTime &&
         ?time < "2024-11-23T01:00:00"^^xsd:dateTime)
}
GROUP BY ?account
HAVING (COUNT(?tx) > 10)

Visualization and Manual Inspection

In addition to automated graph algorithms, fraud analysts need to visually inspect suspicious patterns. Protégé's built-in visualization tools, combined with VidyaAstra's query capabilities, allow manual exploration:

OntoGraf Visualization

Open OntoGraf view in Protégé
Select a suspicious account (e.g., Account_A)
Visualize relationships (:sendsMoneyTo, :sharesEmail, etc.)
Manually trace money flow paths

SPARQL-Based Exploration

Use VidyaAstra to iteratively drill down:

Query 1: "Show accounts with more than 5 outgoing transactions"
Query 2: "Which of these accounts share email addresses?"
Query 3: "Show the transaction history for Account_XYZ"
Query 4: "Are any of these accounts involved in fraud patterns?"

This iterative, conversational approach combines automated detection with human expertise.

Integration with External Graph Databases

While Protégé is excellent for ontology development and testing, production fraud detection systems typically use dedicated graph databases for scalability:

Integration Architecture

┌─────────────────────────────────────────┐
│  Protégé + VidyaAstra                   │
│  • Ontology design & testing            │
│  • Pattern definition                   │
│  • Query prototyping                    │
└─────────────────────────────────────────┘
          ↓ (Export OWL)
┌─────────────────────────────────────────┐
│  Graph Database (Production)            │
│  • Apache Jena Fuseki                   │
│  • GraphDB                              │
│  • Neo4j (with neosemantics plugin)     │
│  • Amazon Neptune                       │
└─────────────────────────────────────────┘
          ↓
┌─────────────────────────────────────────┐
│  Real-time Transaction Processing       │
│  • Stream processing (Kafka, Flink)     │
│  • Pattern matching                     │
│  • Alert generation                     │
└─────────────────────────────────────────┘

Export Process

Design ontology in Protégé with VidyaAstra
Test queries on sample data
Export OWL file
Load into production triple store:

# Apache Jena Fuseki
curl -X POST \
  -H "Content-Type: application/rdf+xml" \
  --data-binary @fraud-detection-ontology.owl \
  http://localhost:3030/fraud/data

# GraphDB
curl -X POST \
  -H "Content-Type: application/rdf+xml" \
  --data-binary @fraud-detection-ontology.owl \
  http://localhost:7200/repositories/fraud/statements

Populate with production transaction data
Run graph algorithms at scale

Getting Started

Prerequisites

Protégé 5.6.4+ - Download from https://protege.stanford.edu/
Java 11+ - Required for running Protégé and plugins
LLM API Access - OpenAI, Anthropic Claude, or Nvidia NGC API key
VidyaAstra Plugin - Download from https://github.com/vishalmysore/vidyaastra-plugin

Installation

Step 1: Install Protégé

Download and install Protégé for your operating system.

Step 2: Install VidyaAstra Plugin

# Windows
Copy-Item vidyaastra-1.0.1.jar "C:\Program Files\Protege-5.6.7\plugins\"

# macOS
cp vidyaastra-1.0.1.jar "/Applications/Protege.app/Contents/Java/plugins/"

# Linux
cp vidyaastra-1.0.1.jar "$HOME/Protege-5.6.7/plugins/"

Step 3: Launch Protégé and Activate Plugin

Start Protégé
Go to Window → Views → Ontology Views → VidyaAstra View
The VidyaAstra panel will appear

Step 4: Configure API Key

Enter your OpenAI/Claude/Nvidia API key in the VidyaAstra preferences.

Quick Start Example

Create Your First Fraud Detection Ontology:

Select "Create New Ontology" mode
Enter this description:

Create a fraud detection ontology for money laundering detection.
Include:
- Account and Transaction entities
- CircularMoneyFlow fraud pattern
- sendsMoneyTo relationship
- DFS cycle detection algorithm
- Risk levels and scores

Click "Ask AI" and wait 20-30 seconds
Save the generated ontology as fraud-detection.owl

Query Your Ontology:

Switch to "Basic Query" mode
Ask: "Show me all circular money flow patterns"
VidyaAstra translates to SPARQL and returns results

Modify Your Ontology:

Switch to "Modify Ontology" mode
Request: "Add a Structuring fraud pattern for transactions below $10,000"
Changes are applied and validated automatically

Technical Implementation Details

How VidyaAstra Works

1. Natural Language Query Processing

// Simplified flow
String userQuery = "Which accounts have high risk scores?";

// 1. Extract ontology context
String context = extractClassesAndProperties(activeOntology);

// 2. Build LLM prompt
String prompt = "Given this ontology:\n" + context + 
                "\nTranslate to SPARQL: " + userQuery;

// 3. Call LLM
String sparqlQuery = llm.complete(prompt);

// 4. Execute query
ResultSet results = ontology.executeQuery(sparqlQuery);

// 5. Format results
String answer = formatAsNaturalLanguage(results);

2. AI Ontology Generation

// Simplified flow
String description = "Create fraud detection ontology...";

// 1. Generate with strict prompt
String systemPrompt = "Generate valid OWL/RDF XML only. " +
                      "No markdown, no explanations.";

// 2. Get LLM response
String owlXml = llm.complete(systemPrompt, description);

// 3. Clean and validate
owlXml = removeMarkdown(owlXml);
owlXml = fixCommonXmlIssues(owlXml);

// 4. Validate with OWL API
OWLOntology ont = manager.loadFromString(owlXml);

// 5. Save
saveOntology(ont, "generated-ontology.owl");

SPARQL Query Examples

Find Circular Money Flows:

PREFIX : <http://example.org/fraud#>

SELECT DISTINCT ?account1 ?account2 ?account3 ?account4
WHERE {
  ?account1 :sendsMoneyTo ?account2 .
  ?account2 :sendsMoneyTo ?account3 .
  ?account3 :sendsMoneyTo ?account4 .
  ?account4 :sendsMoneyTo ?account1 .
}

Find Accounts Sharing Email:

SELECT ?account1 ?account2 ?email
WHERE {
  ?account1 :hasEmail ?email .
  ?account2 :hasEmail ?email .
  FILTER(?account1 != ?account2)
}

Find High-Risk Patterns:

SELECT ?pattern ?riskScore
WHERE {
  ?pattern rdf:type :FraudPattern .
  ?pattern :riskScore ?riskScore .
  FILTER(?riskScore > 0.80)
}
ORDER BY DESC(?riskScore)

Temporal Analysis - Rapid Transactions:

SELECT ?account (COUNT(?tx) AS ?txCount)
WHERE {
  ?tx :fromAccount ?account ;
      :timestamp ?time .
  FILTER(?time >= "2024-11-23T00:00:00"^^xsd:dateTime &&
         ?time <= "2024-11-23T02:00:00"^^xsd:dateTime)
}
GROUP BY ?account
HAVING (COUNT(?tx) > 5)

Conclusion

Why Knowledge Graphs for Fraud Detection

Fraud detection is fundamentally a relationship problem:

Money flows through networks of accounts
Fraudsters create patterns across transactions
Detection requires multi-hop analysis
Explanations need semantic context

Traditional approaches struggle with:

ML/Neural Networks: Black boxes vulnerable to adversarial attacks, can't explain decisions
Rule-Based Systems: Brittle, high false positives, miss complex patterns
SQL Databases: Multi-hop queries are slow and complex

Knowledge graphs solve these problems by natively representing relationships and enabling graph algorithms.

Why Protégé + VidyaAstra

Protégé provides:

Industry-standard OWL ontology editor
SPARQL query engine
Reasoning capabilities (Pellet, HermiT, ELK)
Visualization tools

VidyaAstra adds:

Natural language query interface (no SPARQL expertise needed)
AI-powered ontology generation (minutes vs. weeks)
Intelligent ontology modification
Multi-LLM support (OpenAI, Claude, Nvidia)

Together, they enable fraud analysts to build and query knowledge graphs without deep technical expertise in ontologies or SPARQL.

Next Steps

Download Protégé and VidyaAstra
Create your first fraud detection ontology using the examples in this article
Load sample transaction data
Query using natural language
Extend with your specific fraud patterns
Deploy to production graph database when ready

Resources

Software

Protégé: https://protege.stanford.edu/
VidyaAstra Plugin: https://github.com/vishalmysore/vidyaastra-plugin

Documentation

OWL 2 Primer: https://www.w3.org/TR/owl2-primer/
SPARQL 1.1 Query Language: https://www.w3.org/TR/sparql11-query/
OWL API: https://github.com/owlcs/owlapi

Sample Ontology

The complete fraud detection ontology example is available in this repository:

File: fraud-detection-ontology.owl
Includes: 28 classes, 12 object properties, 8 data properties
Sample Data: Circular money flow with 4 accounts

About

Author: Vishal Mysore

Repository: https://github.com/vishalmysore/vidyaastra-plugin

Disclaimer

This article presents my approach to fraud detection using knowledge graphs, building on industry-standard techniques with Protégé and the VidyaAstra plugin. The circular money flow use case is a well-documented fraud pattern in financial crime literature and has been covered in many articles before, and this implementation demonstrates how ontologies and graph algorithms can detect such patterns effectively.

https://medium.com/neo4j/find-circular-money-flow-with-neo4j-c9138e1c3183
https://www.journalofaccountancy.com/issues/2009/dec/20091793/
https://digitaldealer.com/news/circular-bank-statement-fraud-the-new-synthetic-income-scam-dealers-lenders-must-fight/168087/

Important Notes:

The views and techniques presented here are my own/
This is an educational demonstration using publicly available fraud detection patterns documented in academic and industry literature
The examples use fictional data and scenarios for illustration purposes only
This implementation is not production-ready and should not be used for actual fraud detection without proper validation, compliance review, and security hardening
Organizations implementing fraud detection systems should consult with their legal, compliance, and security teams

Table of Contents

What is a Knowledge Graph?

Why Knowledge Graphs for Fraud Detection?

Limitations of Traditional ML Approaches

Limitations of Traditional ML Approaches

The Knowledge Graph Solution

The Knowledge Graph Advantage

How Knowledge Graphs Detect Fraud

Key Capabilities

Building a Fraud Detection Knowledge Graph

Step 1: Data Modeling

Step 2: Define Suspicious Patterns

Step 3: Create the Ontology

Real-World Use Case: Circular Money Flow Detection

The Scenario

Building the Ontology with VidyaAstra

Step 1: Create the Fraud Detection Ontology (5 minutes)

Step 2: Add Sample Transaction Data

Step 3: Query Using Natural Language

Step 4: Run Graph Algorithms

Step 4: Run Graph Algorithms

Using VidyaAstra with Protégé

1. Basic Query Mode - Natural Language to SPARQL

2. Create New Ontology Mode - AI-Generated OWL

3. Modify Ontology Mode - Intelligent Updates

Graph Algorithms for Fraud Detection

1. Cycle Detection Algorithms

2. Path Finding Algorithms

3. Community Detection Algorithms

4. Centrality Algorithms

5. Pattern Matching Algorithms

6. Temporal Analysis

Visualization and Manual Inspection

OntoGraf Visualization

SPARQL-Based Exploration

Integration with External Graph Databases

Integration Architecture

Export Process

Getting Started

Getting Started

Prerequisites

Installation

Quick Start Example

Technical Implementation Details

How VidyaAstra Works

SPARQL Query Examples

Conclusion

Why Knowledge Graphs for Fraud Detection

Why Protégé + VidyaAstra

Next Steps

Resources

Software

Documentation

Sample Ontology

About

Disclaimer