Imagine querying your vast, disparate datasets using plain English, getting not just exact matches, but semantically relevant information. That is the power of MindsDB Knowledge Bases, and I'm excited to share how I integrated them into a Django application.
Introduction
Data is everywhere, but truly understanding it often requires complex SQL queries or specialized tools. What if you could ask your data questions naturally, like "Show me clients in high-income urban areas" or "Summarize suspicious transactions"? This is exactly what MindsDB's new Knowledge Bases (KBs) aim to enable.
This article details how a django transaction application can turn a dashboard of information. I decided to build a simple, yet powerful, Django web application that allows users to perform semantic searches on mock financial data and even get AI-powered summaries. The application only has three tables to keep it simple. It is based on the dataset from the following: Kaggle Dataset We will start by building a normal Django application with postgres as the backend database. We will later then show how to make your data available to MindsDB. Django knowledge is neccessary to follow along as I will not explain the framework.
Bridging Natural Language and Structured Data
Traditional databases excel at structured queries (e.g., SELECT * FROM clients WHERE age > 30
). However, they fall short when you want to query based on meaning or context, like finding "high-risk clients" or "transactions indicative of fraud." This gap often requires complex machine learning pipelines.
MindsDB Knowledge Bases address this by:
- Semantic Search: Understanding the meaning behind your queries, not just keywords.
- Unified Access: Querying data across different sources (like my PostgreSQL DB).
- Metadata Filtering: Combining semantic search with traditional SQL filters.
Project Overview: A Django App for Financial Data Insights
My application is a Django-based web interface designed to interact with a PostgreSQL database containing mock financial data:
-
Client
: Information about individual clients (age, gender, address, income). -
Card
: Details about credit/debit cards associated with clients. -
Transaction
: Records of financial transactions (amount, merchant, date, location).
The core of the application's intelligence comes from MindsDB, which transforms this structured data into semantically searchable Knowledge Bases. The Django app then acts as a user-friendly frontend to query these KBs.
Django Application Setup
The Django project, named transaction_dashboard_proj
or any other name, uses PostgreSQL as its database and python-dotenv
for environment variable management. The name of the app will be called finance
and swiftly added to a list of apps in the settings.py
file.
Herer is the requirements file for the project:
Django>=5.2.3
psycopg2-binary>=2.9.9
python-dotenv>=1.0.0
pandas>=2.0.0
mindsdb_sdk>=1.0.0
The finance/models.py
defines the database schema:
# core/models.py
from django.db import models
class Client(models.Model):
"""Client model - separate from Django User model since clients cannot login"""
current_age = models.IntegerField()
retirement_age = models.IntegerField()
birth_year = models.IntegerField()
birth_month = models.IntegerField()
gender = models.CharField(max_length=10)
address = models.TextField()
latitude = models.DecimalField(max_digits=9, decimal_places=6)
longitude = models.DecimalField(max_digits=9, decimal_places=6)
per_capita_income = models.DecimalField(max_digits=12, decimal_places=2)
yearly_income = models.DecimalField(max_digits=12, decimal_places=2)
total_debt = models.DecimalField(max_digits=12, decimal_places=2)
credit_score = models.IntegerField(default=0)
num_credit_cards = models.IntegerField(default=0)
class Meta:
db_table = 'client'
ordering = ['id']
def __str__(self):
return f"Client {self.id}"
class Card(models.Model):
"""Card model linked to clients"""
CARD_BRAND_CHOICES = [
('visa', 'Visa'),
('mastercard', 'Mastercard'),
('amex', 'American Express'),
('discover', 'Discover'),
]
CARD_TYPE_CHOICES = [
('credit', 'Credit'),
('debit', 'Debit'),
('prepaid', 'Prepaid'),
]
client = models.ForeignKey(Client, on_delete=models.CASCADE, related_name='cards')
card_brand = models.CharField(max_length=20, choices=CARD_BRAND_CHOICES)
card_type = models.CharField(max_length=25, choices=CARD_TYPE_CHOICES)
card_number = models.CharField(max_length=20) # Masked card number
expires = models.CharField(max_length=12)
cvv = models.CharField(max_length=4) # Should be encrypted in production
has_chip = models.BooleanField(default=True)
num_cards_issued = models.IntegerField(default=1)
credit_limit = models.DecimalField(max_digits=12, decimal_places=2, null=True, blank=True)
acct_open_date = models.CharField(max_length=8, default="01/1970")
year_pin_last_changed = models.IntegerField(default=2025)
card_on_dark_web = models.CharField(default="No")
class Meta:
db_table = 'cards'
ordering = ['id']
def __str__(self):
return f"Card {self.id} - {self.card_brand} {self.card_type}"
class Transaction(models.Model):
"""Transaction model linked to clients and cards"""
CHIP_CHOICES = [
('chip', 'Chip Transaction'),
('swipe', 'Swipe Transaction'),
]
date = models.DateTimeField()
client = models.ForeignKey(Client, on_delete=models.CASCADE, related_name='transactions')
card = models.ForeignKey(Card, on_delete=models.CASCADE, related_name='transactions')
amount = models.DecimalField(max_digits=10, decimal_places=2)
use_chip = models.CharField(default="Chip Transaction", choices=CHIP_CHOICES)
merchant_id = models.CharField(max_length=50)
merchant_city = models.CharField(max_length=100)
merchant_state = models.CharField(max_length=100, null=True, blank=True)
zip = models.CharField(max_length=10, null=True, blank=True)
mcc = models.IntegerField(default=1000)
errors = models.CharField(max_length=200, null=True, blank=True)
class Meta:
db_table = 'transactions'
ordering = ['-date', 'id']
def __str__(self):
return f"Transaction {self.id} - ${self.amount} on {self.date}"
Note: Clients are not Django users and do not have login capabilities in this application, focusing purely on data querying. Some of the fields may not be exactly PCI compliant or best practice, but we working with the dataset provided for testing.
Run Django migrations to create the tables in the database:
python manage.py makemigrations
python manage.py migrate
The Kaggle Dataset
Download the dataset from the Financial Transactions Dataset: Analytics from Kaggle, specifically targeting only 3 files needed for the table models above. They are the Transaction data (transactions_data.csv
), Card information( cards_dat.csv
) and the Users data(users_data
) which is renamed client for our Django app to avoid confusion with the django auth_user. Please note the transactions_data file is massive and will need to be split into 3 or more files for easier handling. You can find a script online to split it up.
After downloading the files we will copy them into the database by using the copy command in psql
. Start with client data(users_data.csv), followed by card data, and finally transaction data(as it has card & client foreign keys).
Log into psql:
\connect fraud_mindsdb;
\copy client(id,current_age,retirement_age,birth_year,birth_month,gender,address,latitude,longitude,per_capita_income,yearly_income,total_debt,credit_score,num_credit_cards) FROM '/tmp/users_data.csv' DELIMITER ',' CSV header;
\copy cards(id, client_id, card_brand, card_type, card_number, expires, cvv, has_chip, num_cards_issued, credit_limit, acct_open_date, year_pin_last_changed, card_on_dark_web) FROM '/tmp/cards_data.csv' DELIMITER ',' CSV header;
\copy transactions(id, date,client_id, card_id, amount,use_chip,merchant_id, merchant_city, merchant_state, zip, mcc, errors) '/tmp/transaction_data.csv' DELIMITER ',' CSV header;
Connecting MindsDB to PostgreSQL
We are going to run MindsDB locally instead of using the cloud version. You will need docker installed. To run MindsDB Web Studio locally execute the following:
docker run -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
If you get a 'Connection refused' error run the following to allow MindsDB to access your local Postgres instance:
docker run --network=host mindsdb/mindsdb
The first step in MindsDB is to connect it to our existing PostgreSQL database. This allows MindsDB to access the Client, Card, and Transaction tables. I ran this command directly in the MindsDB Web Studio:
CREATE DATABASE finance_db
WITH ENGINE = 'postgres',
PARAMETERS = {
"host": "your_db_host",
"port": 5432,
"user": "your_user",
"password": "your_password",
"database": "your_postgresdb"
};
This creates a virtual database finance _db inside MindsDB, mirroring our PostgreSQL schema.
Creating MindsDB Knowledge Bases
Next, we define two Knowledge Bases: one for Client data (client_kb) and one for Transaction data (transaction_kb). The key here is defining the CONTENT_COLUMN (what MindsDB will semantically understand) and METADATA_COLUMNS (additional data to retrieve and filter on).
-- For Clients:
CREATE KNOWLEDGE_BASE client_kb
USING
embedding_model = {
"provider": "gemini",
"model_name": 'gemini-embedding-exp-03-07',
"api_key": "xxxxxx"
},
reranking_model = {
"provider": "gemini",
"model_name": "gemini-pro",
"api_key": "xxxxxx"
},
content_column = 'address',
metadata_columns = ['per_capita_income', 'current_age', 'gender', 'birth_year'];
-- For Transactions:
CREATE KNOWLEDGE_BASE transaction_kb
USING
embedding_model = {
"provider": "gemini",
"model_name": 'gemini-embedding-exp-03-07',
"api_key": "xxxxxx"
},
reranking_model = {
"provider": "gemini",
"model_name": "gemini-pro",
"api_key": "xxxxxx"
},
content_column = ['merchant_city', 'merchant_state'],
metadata_columns = ['amount', 'use_chip', 'date', 'client_id'];
After defining the KBs, we ingest the existing data in the postgres database of the Django app:
INSERT INTO client_kb (content, metadata)
SELECT
address as content,
JSON_OBJECT(
'id', id,
'per_capita_income', per_capita_income,
'current_age', current_age,
'gender', gender,
'birth_year', birth_year
) as metadata
FROM finance_db.core_client;
-- Insert transaction data with merchant location as content
INSERT INTO transaction_kb (content, metadata)
SELECT
CONCAT(merchant_city, ', ', merchant_state) as content,
JSON_OBJECT(
'id', id,
'amount', amount,
'use_chip', use_chip,
'date', date,
'client_id', client_id
) as metadata
FROM finance_db.core_transaction;
Indexing for Performance
To ensure efficient semantic queries, it's crucial to create indexes on the Knowledge Bases:
CREATE INDEX ON client_kb;
CREATE INDEX ON transaction_kb;
This optimizes the underlying vector search for faster retrieval.
Automated Data Ingestion with MindsDB JOBS
For a dynamic application, new data is constantly added. In our case as we inserted the data from CSV files we can simulate that by spliting the files and inserting the last CSV file to simulate new transactions or we can add them in the Django admin app in the backend. MindsDB JOBS allow for periodic updates of the Knowledge Bases. We will set up a job to automatically insert new transactions into transaction_kb every hour:
CREATE JOB update_transaction_kb_job (
INSERT INTO transaction_kb SELECT * FROM finance_db.transactions WHERE date > LAST;
) EVERY 1 HOUR;
The WHERE date > LAST clause is a powerful MindsDB feature that intelligently fetches only records added since the last job run or ingestion, ensuring efficiency.
Multi-step Workflows: KB + AI Table for Summarization
One of the most exciting aspects is combining KBs with MindsDB AI Tables for multi-step workflows. We will create an AI Table using Google Gemini (you can use OpenAI if you got moola) to summarize transaction details. You need to configure your Gemini engine in MindsDB first.
First, set up your Gemini engine (if you haven't already):
CREATE ML_ENGINE google_gemini_engine
FROM google_gemini
USING
api_key = 'AIzaSyA0for-example';
Then, define the summarization model:
CREATE ML_MODEL summarize_transactions_model
PREDICT summary
USING
engine = 'google_gemini_engine',
model_name = 'gemini-pro',
prompt_template = 'Summarize the following transaction details into a concise overview, highlighting key aspects: {{transaction_details}}.';
Now, the magic happens in a combined query, this will be called in the view of the Django application(You don't need to call it through the MindsDB web view):
SELECT
t.id AS transaction_id,
t.amount,
t.date,
t.merchant_city,
t.merchant_state,
t.use_chip,
t.client_id,
s.summary -- The predicted summary from the AI model
FROM
transaction_kb AS t
JOIN
summarize_transactions_model AS s ON 1=1 -- Join the KB with the AI model
WHERE
t.content LIKE 'unusual large spending' AND -- Semantic search for transactions
s.transaction_details = CONCAT( -- Feed relevant columns into the AI model's prompt
'Transaction ID: ', t.id,
', Amount: $', t.amount,
', Date: ', t.date,
', Merchant City: ', t.merchant_city,
', Merchant State: ', t.merchant_state,
', Used Chip: ', t.use_chip,
', Client ID: ', t.client_id
);
This query first finds transactions semantically related to "unusual large spending" from the transaction_kb, and then for each found transaction, it sends a concatenated string of its details to the summarize_transactions_model to generate a summary.
MindsDB Integration in Django (mindsdb_util.py)
To bridge Django and MindsDB, we create a mindsdb_util.py file within my Django core app. This utility uses the mindsdb_sdk Python library to connect to and query MindsDB. All the select queries will be called from a django view and generate data to be shown in the dashboard.
import os
from typing import List, Dict, Any, Optional
from django.conf import settings
from django.db import models
import mindsdb_sdk
from .models import Transaction, Client, Card
class MindsDBUtil:
"""
Utility class for executing MindsDB SQL queries using mindsdb_sdk
"""
def __init__(self):
"""Initialize the MindsDB connection using mindsdb_sdk"""
try:
# Get MindsDB connection parameters from environment or settings
host = getattr(settings, 'MINDSDB_HOST', 'localhost')
port = getattr(settings, 'MINDSDB_PORT', 47334)
username = getattr(settings, 'MINDSDB_USER', None)
password = getattr(settings, 'MINDSDB_PASSWORD', None)
# Initialize MindsDB connection
if username and password:
self.connection = mindsdb_sdk.connect(
host=host,
port=port,
username=username,
password=password
)
else:
self.connection = mindsdb_sdk.connect(
host=host,
port=port
)
self.server = self.connection.get_server()
print(f"Successfully connected to MindsDB at {host}:{port}")
except Exception as e:
print(f"Warning: Could not initialize MindsDB connection: {e}")
self.connection = None
self.server = None
def execute_query(self, query: str) -> List[Dict[str, Any]]:
"""
Execute a MindsDB SQL query and return results
Args:
query: SQL query to execute
Returns:
List of dictionaries containing query results
"""
if not self.connection:
raise Exception("MindsDB connection not available")
try:
# Execute the query
result = self.connection.query(query)
# Convert to list of dictionaries
if hasattr(result, 'fetchall'):
columns = [desc[0] for desc in result.description]
rows = result.fetchall()
return [dict(zip(columns, row)) for row in rows]
else:
# Handle different result formats
return result.to_dict('records') if hasattr(result, 'to_dict') else []
except Exception as e:
print(f"Error executing query: {e}")
raise
def find_wealthy_clients(self, min_age: int = 40, min_income: float = 70000) -> List[Dict[str, Any]]:
"""
Find clients in wealthy areas with age and income filtering
Note: This assumes your client_kb has columns: id, address, current_age, per_capita_income, gender
You may need to adjust based on your actual KB schema
"""
query = f"""
SELECT
c.id AS client_id,
c.address,
c.current_age,
c.per_capita_income,
c.gender
FROM
client_kb AS c
WHERE
c.current_age > {min_age} AND
c.per_capita_income > {min_income};
"""
return self.execute_query(query)
def find_travel_expenses(self, min_amount: float = 500, use_chip: bool = True) -> List[Dict[str, Any]]:
"""
Find transactions related to travel with amount and chip usage filtering
Note: This assumes your transaction_kb has appropriate columns
"""
# Use proper boolean value for MindsDB
chip_value = 'true' if use_chip else 'false'
query = f"""
SELECT
t.id AS transaction_id,
t.amount,
t.date,
t.merchant_city,
t.merchant_state,
t.use_chip,
t.client_id
FROM
transaction_kb AS t
WHERE
t.amount > {min_amount} AND
t.use_chip = {chip_value};
"""
return self.execute_query(query)
def find_online_shopping(self, state: str = 'California') -> List[Dict[str, Any]]:
"""
Find transactions for online shopping in a specific state
"""
query = f"""
SELECT
t.id AS transaction_id,
t.amount,
t.merchant_city,
t.merchant_state,
t.client_id
FROM
transaction_kb AS t
WHERE
t.merchant_state = '{state}';
"""
return self.execute_query(query)
def semantic_search_transactions(self, search_term: str, limit: int = 10) -> List[Dict[str, Any]]:
"""
Perform semantic search on transaction knowledge base
Args:
search_term: Natural language search term (e.g., "suspicious activity", "travel expenses")
limit: Maximum number of results to return
"""
query = f"""
SELECT
*
FROM transaction_kb
WHERE
MATCH('{search_term}')
LIMIT {limit};
"""
return self.execute_query(query)
def semantic_search_clients(self, search_term: str, limit: int = 10) -> List[Dict[str, Any]]:
"""
Perform semantic search on client knowledge base
Args:
search_term: Natural language search term (e.g., "wealthy suburban areas")
limit: Maximum number of results to return
"""
query = f"""
SELECT
*
FROM client_kb
WHERE
MATCH('{search_term}')
LIMIT {limit};
"""
return self.execute_query(query)
def get_transaction_summary(self, transaction_id: int) -> List[Dict[str, Any]]:
"""
Get AI-generated summary for a specific transaction
Note: This assumes you have a model named 'summarize_transactions_model'
and that the model expects transaction_id as input
"""
query = f"""
SELECT
transaction_id,
summary
FROM summarize_transactions_model
WHERE
transaction_id = {transaction_id};
"""
return self.execute_query(query)
def analyze_suspicious_patterns(self, client_id: Optional[int] = None) -> List[Dict[str, Any]]:
"""
Use AI model to analyze suspicious transaction patterns
This is a more realistic approach than the original join-based method
"""
where_clause = f"WHERE client_id = {client_id}" if client_id else ""
query = f"""
SELECT
t.id,
t.amount,
t.date,
t.merchant_city,
t.merchant_state,
t.client_id,
p.risk_score,
p.risk_reason
FROM transaction_kb t
JOIN pattern_analysis_model p ON t.id = p.transaction_id
{where_clause}
ORDER BY p.risk_score DESC;
"""
return self.execute_query(query)
def get_knowledge_base_stats(self) -> Dict[str, int]:
"""
Get statistics about the knowledge bases
"""
try:
client_count = self.execute_query("SELECT COUNT(*) as count FROM client_kb;")
transaction_count = self.execute_query("SELECT COUNT(*) as count FROM transaction_kb;")
return {
'client_kb_count': client_count[0]['count'] if client_count else 0,
'transaction_kb_count': transaction_count[0]['count'] if transaction_count else 0
}
except Exception as e:
print(f"Error getting knowledge base stats: {e}")
return {'client_kb_count': 0, 'transaction_kb_count': 0}
def custom_semantic_search(self,
search_term: str,
kb_type: str = 'transaction',
filters: Dict[str, Any] = None,
limit: int = 10) -> List[Dict[str, Any]]:
"""
Perform custom semantic search on knowledge bases with optional filters
Args:
search_term: Natural language search term
kb_type: 'transaction' or 'client'
filters: Dictionary of filters to apply
limit: Maximum number of results
Returns:
List of matching results
"""
kb_name = f"{kb_type}_kb"
# Build the base semantic search query
query = f"SELECT * FROM {kb_name} WHERE MATCH('{search_term}')"
# Add filters if provided
if filters:
filter_conditions = []
for key, value in filters.items():
if isinstance(value, str):
# Escape single quotes in string values
escaped_value = value.replace("'", "''")
filter_conditions.append(f"{key} = '{escaped_value}'")
elif isinstance(value, bool):
filter_conditions.append(f"{key} = {'true' if value else 'false'}")
else:
filter_conditions.append(f"{key} = {value}")
if filter_conditions:
query += " AND " + " AND ".join(filter_conditions)
query += f" LIMIT {limit};"
return self.execute_query(query)
def test_connection(self) -> bool:
"""
Test if MindsDB connection is working
"""
try:
if not self.connection:
return False
# Try a simple query
result = self.execute_query("SELECT 1 as test;")
return len(result) > 0 and result[0].get('test') == 1
except Exception as e:
print(f"Connection test failed: {e}")
return False
def list_available_tables(self) -> List[str]:
"""
List all available tables/knowledge bases in MindsDB
"""
try:
result = self.execute_query("SHOW TABLES;")
return [row.get('table_name', row.get('Tables_in_mindsdb', '')) for row in result]
except Exception as e:
print(f"Error listing tables: {e}")
return []
def describe_table(self, table_name: str) -> List[Dict[str, Any]]:
"""
Get schema information for a specific table
"""
try:
return self.execute_query(f"DESCRIBE {table_name};")
except Exception as e:
print(f"Error describing table {table_name}: {e}")
return []
# Global instance for easy access
mindsdb_util = MindsDBUtil()
Django views then simply import mindsdb_client and call its query() method:
from django.shortcuts import render
from .mindsdb_util import mindsdb_client
def client_kb_search_view(request):
query_text = request.GET.get('query', '')
if query_text:
sql_query = f"SELECT id, address, current_age, per_capita_income FROM client_kb WHERE content LIKE '{query_text}' LIMIT 10;"
results_df = mindsdb_client.query(sql_query)
# Convert to list of dicts for template
results = results_df.to_dict('records')
return render(request, 'finance/client_kb_search.html', {'query': query_text, 'results': results})
This application demonstrates several powerful use cases for MindsDB Knowledge Bases in a financial context:
- Customer Segmentation & Targeted Marketing: Semantically search for "affluent clients in suburban areas" to tailor product offerings.
- Fraud Detection: Query "suspicious transactions in unusual locations" or "unusually large spending patterns" and get AI summaries for quick analysis by human analysts.
- Risk Assessment: Identify clients or transaction types that semantically match "high-risk profiles."
- Compliance & Audit: Quickly retrieve all data points related to specific semantic concepts for reporting.
Conclusion
MindsDB Knowledge Bases significantly lower the barrier to integrating advanced semantic search and AI capabilities into existing data infrastructure. The ability to combine natural language queries with traditional SQL filtering, and to chain these with AI Tables, opens up a world of possibilities for building intelligent data applications.
This project was a great experience in leveraging MindsDB's powerful features to transform raw data into actionable, semantically queryable insights within a familiar Django environment.
GitHub Repository: [https://github.com/samaras/django-mindsdb-transactions]
MindsDB Docs: https://mindsdb.com/docs/
Top comments (1)
This is awesome