Samuel Komfi

Posted on Jul 1

Unlocking Data Insights: A Semantic Search App with MindsDB and Django

#programming #django #rag #nlp

Imagine querying your vast, disparate datasets using plain English, getting not just exact matches, but semantically relevant information. That is the power of MindsDB Knowledge Bases, and I'm excited to share how I integrated them into a Django application.

Introduction

Data is everywhere, but truly understanding it often requires complex SQL queries or specialized tools. What if you could ask your data questions naturally, like "Show me clients in high-income urban areas" or "Summarize suspicious transactions"? This is exactly what MindsDB's new Knowledge Bases (KBs) aim to enable.

This article details how a django transaction application can turn a dashboard of information. I decided to build a simple, yet powerful, Django web application that allows users to perform semantic searches on mock financial data and even get AI-powered summaries. The application only has three tables to keep it simple. It is based on the dataset from the following: Kaggle Dataset We will start by building a normal Django application with postgres as the backend database. We will later then show how to make your data available to MindsDB. Django knowledge is neccessary to follow along as I will not explain the framework.

Bridging Natural Language and Structured Data

Traditional databases excel at structured queries (e.g., SELECT * FROM clients WHERE age > 30). However, they fall short when you want to query based on meaning or context, like finding "high-risk clients" or "transactions indicative of fraud." This gap often requires complex machine learning pipelines.

MindsDB Knowledge Bases address this by:

Semantic Search: Understanding the meaning behind your queries, not just keywords.
Unified Access: Querying data across different sources (like my PostgreSQL DB).
Metadata Filtering: Combining semantic search with traditional SQL filters.

Project Overview: A Django App for Financial Data Insights

My application is a Django-based web interface designed to interact with a PostgreSQL database containing mock financial data:

Client: Information about individual clients (age, gender, address, income).
Card: Details about credit/debit cards associated with clients.
Transaction: Records of financial transactions (amount, merchant, date, location).

The core of the application's intelligence comes from MindsDB, which transforms this structured data into semantically searchable Knowledge Bases. The Django app then acts as a user-friendly frontend to query these KBs.

Django Application Setup

The Django project, named transaction_dashboard_proj or any other name, uses PostgreSQL as its database and python-dotenv for environment variable management. The name of the app will be called finance and swiftly added to a list of apps in the settings.py file.

Herer is the requirements file for the project:

Django>=5.2.3
psycopg2-binary>=2.9.9
python-dotenv>=1.0.0 
pandas>=2.0.0
mindsdb_sdk>=1.0.0

The finance/models.py defines the database schema:

# core/models.py  
from django.db import models  

class Client(models.Model):
    """Client model - separate from Django User model since clients cannot login"""
    current_age = models.IntegerField()
    retirement_age = models.IntegerField()
    birth_year = models.IntegerField()
    birth_month = models.IntegerField()
    gender = models.CharField(max_length=10)
    address = models.TextField()
    latitude = models.DecimalField(max_digits=9, decimal_places=6)
    longitude = models.DecimalField(max_digits=9, decimal_places=6)
    per_capita_income = models.DecimalField(max_digits=12, decimal_places=2)
    yearly_income = models.DecimalField(max_digits=12, decimal_places=2)
    total_debt = models.DecimalField(max_digits=12, decimal_places=2)
    credit_score = models.IntegerField(default=0)
    num_credit_cards = models.IntegerField(default=0)

    class Meta:
        db_table = 'client'
        ordering = ['id']

    def __str__(self):
        return f"Client {self.id}"


class Card(models.Model):
    """Card model linked to clients"""
    CARD_BRAND_CHOICES = [
        ('visa', 'Visa'),
        ('mastercard', 'Mastercard'),
        ('amex', 'American Express'),
        ('discover', 'Discover'),
    ]

    CARD_TYPE_CHOICES = [
        ('credit', 'Credit'),
        ('debit', 'Debit'),
        ('prepaid', 'Prepaid'),
    ]

    client = models.ForeignKey(Client, on_delete=models.CASCADE, related_name='cards')
    card_brand = models.CharField(max_length=20, choices=CARD_BRAND_CHOICES)
    card_type = models.CharField(max_length=25, choices=CARD_TYPE_CHOICES)
    card_number = models.CharField(max_length=20)  # Masked card number
    expires = models.CharField(max_length=12)
    cvv = models.CharField(max_length=4)  # Should be encrypted in production
    has_chip = models.BooleanField(default=True)
    num_cards_issued = models.IntegerField(default=1)
    credit_limit = models.DecimalField(max_digits=12, decimal_places=2, null=True, blank=True)
    acct_open_date = models.CharField(max_length=8, default="01/1970")
    year_pin_last_changed = models.IntegerField(default=2025)
    card_on_dark_web = models.CharField(default="No")


    class Meta:
        db_table = 'cards'
        ordering = ['id']

    def __str__(self):
        return f"Card {self.id} - {self.card_brand} {self.card_type}"


class Transaction(models.Model):
    """Transaction model linked to clients and cards"""

    CHIP_CHOICES = [
        ('chip', 'Chip Transaction'),
        ('swipe', 'Swipe Transaction'),
    ]

    date = models.DateTimeField()
    client = models.ForeignKey(Client, on_delete=models.CASCADE, related_name='transactions')
    card = models.ForeignKey(Card, on_delete=models.CASCADE, related_name='transactions')
    amount = models.DecimalField(max_digits=10, decimal_places=2)
    use_chip = models.CharField(default="Chip Transaction", choices=CHIP_CHOICES)
    merchant_id = models.CharField(max_length=50)
    merchant_city = models.CharField(max_length=100)
    merchant_state = models.CharField(max_length=100, null=True, blank=True)
    zip = models.CharField(max_length=10, null=True, blank=True)
    mcc = models.IntegerField(default=1000)
    errors = models.CharField(max_length=200, null=True, blank=True)

    class Meta:
        db_table = 'transactions'
        ordering = ['-date', 'id']

    def __str__(self):
        return f"Transaction {self.id} - ${self.amount} on {self.date}"

Note: Clients are not Django users and do not have login capabilities in this application, focusing purely on data querying. Some of the fields may not be exactly PCI compliant or best practice, but we working with the dataset provided for testing.

Run Django migrations to create the tables in the database:

python manage.py makemigrations
python manage.py migrate

The Kaggle Dataset

Download the dataset from the Financial Transactions Dataset: Analytics from Kaggle, specifically targeting only 3 files needed for the table models above. They are the Transaction data (transactions_data.csv), Card information( cards_dat.csv) and the Users data(users_data) which is renamed client for our Django app to avoid confusion with the django auth_user. Please note the transactions_data file is massive and will need to be split into 3 or more files for easier handling. You can find a script online to split it up.

After downloading the files we will copy them into the database by using the copy command in psql. Start with client data(users_data.csv), followed by card data, and finally transaction data(as it has card & client foreign keys).

Log into psql:

\connect fraud_mindsdb;

\copy client(id,current_age,retirement_age,birth_year,birth_month,gender,address,latitude,longitude,per_capita_income,yearly_income,total_debt,credit_score,num_credit_cards) FROM '/tmp/users_data.csv' DELIMITER ',' CSV header;

\copy cards(id, client_id, card_brand, card_type, card_number, expires, cvv, has_chip, num_cards_issued, credit_limit, acct_open_date, year_pin_last_changed, card_on_dark_web) FROM '/tmp/cards_data.csv' DELIMITER ',' CSV header;

\copy transactions(id, date,client_id, card_id, amount,use_chip,merchant_id, merchant_city, merchant_state, zip, mcc, errors) '/tmp/transaction_data.csv' DELIMITER ',' CSV header;

Connecting MindsDB to PostgreSQL

We are going to run MindsDB locally instead of using the cloud version. You will need docker installed. To run MindsDB Web Studio locally execute the following:

docker run -p 47334:47334 -p 47335:47335 mindsdb/mindsdb

If you get a 'Connection refused' error run the following to allow MindsDB to access your local Postgres instance:

docker run --network=host mindsdb/mindsdb

The first step in MindsDB is to connect it to our existing PostgreSQL database. This allows MindsDB to access the Client, Card, and Transaction tables. I ran this command directly in the MindsDB Web Studio:

CREATE DATABASE finance_db
WITH ENGINE = 'postgres',
PARAMETERS = {
  "host": "your_db_host",
  "port": 5432,
  "user": "your_user",
  "password": "your_password",
  "database": "your_postgresdb"
};

This creates a virtual database finance _db inside MindsDB, mirroring our PostgreSQL schema.

Creating MindsDB Knowledge Bases

Next, we define two Knowledge Bases: one for Client data (client_kb) and one for Transaction data (transaction_kb). The key here is defining the CONTENT_COLUMN (what MindsDB will semantically understand) and METADATA_COLUMNS (additional data to retrieve and filter on).


-- For Clients:  
CREATE KNOWLEDGE_BASE client_kb
USING 
    embedding_model = {
        "provider": "gemini",
        "model_name": 'gemini-embedding-exp-03-07',
        "api_key": "xxxxxx"
    },
    reranking_model = {
        "provider": "gemini",
        "model_name": "gemini-pro",
        "api_key": "xxxxxx"
    },
    content_column = 'address',
    metadata_columns = ['per_capita_income', 'current_age', 'gender', 'birth_year']; 

-- For Transactions:  
CREATE KNOWLEDGE_BASE transaction_kb
USING 
    embedding_model = {
        "provider": "gemini",
        "model_name": 'gemini-embedding-exp-03-07',
        "api_key": "xxxxxx"
    },
    reranking_model = {
        "provider": "gemini",
        "model_name": "gemini-pro",
        "api_key": "xxxxxx"
    },
    content_column = ['merchant_city', 'merchant_state'],
    metadata_columns = ['amount', 'use_chip', 'date', 'client_id'];

After defining the KBs, we ingest the existing data in the postgres database of the Django app:

INSERT INTO client_kb (content, metadata)
SELECT 
    address as content,
    JSON_OBJECT(
        'id', id,
        'per_capita_income', per_capita_income,
        'current_age', current_age,
        'gender', gender,
        'birth_year', birth_year
    ) as metadata
FROM finance_db.core_client;

-- Insert transaction data with merchant location as content
INSERT INTO transaction_kb (content, metadata)
SELECT 
    CONCAT(merchant_city, ', ', merchant_state) as content,
    JSON_OBJECT(
        'id', id,
        'amount', amount,
        'use_chip', use_chip,
        'date', date,
        'client_id', client_id
    ) as metadata
FROM finance_db.core_transaction;

Indexing for Performance

To ensure efficient semantic queries, it's crucial to create indexes on the Knowledge Bases:

CREATE INDEX ON client_kb;  
CREATE INDEX ON transaction_kb;

This optimizes the underlying vector search for faster retrieval.

Automated Data Ingestion with MindsDB JOBS

For a dynamic application, new data is constantly added. In our case as we inserted the data from CSV files we can simulate that by spliting the files and inserting the last CSV file to simulate new transactions or we can add them in the Django admin app in the backend. MindsDB JOBS allow for periodic updates of the Knowledge Bases. We will set up a job to automatically insert new transactions into transaction_kb every hour:

CREATE JOB update_transaction_kb_job (  
    INSERT INTO transaction_kb SELECT * FROM finance_db.transactions WHERE date > LAST;  
) EVERY 1 HOUR;

The WHERE date > LAST clause is a powerful MindsDB feature that intelligently fetches only records added since the last job run or ingestion, ensuring efficiency.

Multi-step Workflows: KB + AI Table for Summarization

One of the most exciting aspects is combining KBs with MindsDB AI Tables for multi-step workflows. We will create an AI Table using Google Gemini (you can use OpenAI if you got moola) to summarize transaction details. You need to configure your Gemini engine in MindsDB first.

First, set up your Gemini engine (if you haven't already):

CREATE ML_ENGINE google_gemini_engine
FROM google_gemini
USING
    api_key = 'AIzaSyA0for-example';

Then, define the summarization model:

CREATE ML_MODEL summarize_transactions_model
PREDICT summary
USING
    engine = 'google_gemini_engine', 
    model_name = 'gemini-pro',
    prompt_template = 'Summarize the following transaction details into a concise overview, highlighting key aspects: {{transaction_details}}.';

Now, the magic happens in a combined query, this will be called in the view of the Django application(You don't need to call it through the MindsDB web view):

SELECT  
    t.id AS transaction_id,  
    t.amount,  
    t.date,  
    t.merchant_city,  
    t.merchant_state,  
    t.use_chip,  
    t.client_id,  
    s.summary -- The predicted summary from the AI model  
FROM  
    transaction_kb AS t  
JOIN  
    summarize_transactions_model AS s ON 1=1 -- Join the KB with the AI model  
WHERE  
    t.content LIKE 'unusual large spending' AND -- Semantic search for transactions  
    s.transaction_details = CONCAT( -- Feed relevant columns into the AI model's prompt  
        'Transaction ID: ', t.id,  
        ', Amount: $', t.amount,  
        ', Date: ', t.date,  
        ', Merchant City: ', t.merchant_city,  
        ', Merchant State: ', t.merchant_state,  
        ', Used Chip: ', t.use_chip,  
        ', Client ID: ', t.client_id  
    );

This query first finds transactions semantically related to "unusual large spending" from the transaction_kb, and then for each found transaction, it sends a concatenated string of its details to the summarize_transactions_model to generate a summary.

MindsDB Integration in Django (mindsdb_util.py)

To bridge Django and MindsDB, we create a mindsdb_util.py file within my Django core app. This utility uses the mindsdb_sdk Python library to connect to and query MindsDB. All the select queries will be called from a django view and generate data to be shown in the dashboard.

import os
from typing import List, Dict, Any, Optional
from django.conf import settings
from django.db import models
import mindsdb_sdk
from .models import Transaction, Client, Card


class MindsDBUtil:
    """
    Utility class for executing MindsDB SQL queries using mindsdb_sdk
    """

    def __init__(self):
        """Initialize the MindsDB connection using mindsdb_sdk"""
        try:
            # Get MindsDB connection parameters from environment or settings
            host = getattr(settings, 'MINDSDB_HOST', 'localhost')
            port = getattr(settings, 'MINDSDB_PORT', 47334)
            username = getattr(settings, 'MINDSDB_USER', None)
            password = getattr(settings, 'MINDSDB_PASSWORD', None)

            # Initialize MindsDB connection
            if username and password:
                self.connection = mindsdb_sdk.connect(
                    host=host,
                    port=port,
                    username=username,
                    password=password
                )
            else:
                self.connection = mindsdb_sdk.connect(
                    host=host,
                    port=port
                )

            self.server = self.connection.get_server()
            print(f"Successfully connected to MindsDB at {host}:{port}")

        except Exception as e:
            print(f"Warning: Could not initialize MindsDB connection: {e}")
            self.connection = None
            self.server = None

    def execute_query(self, query: str) -> List[Dict[str, Any]]:
        """
        Execute a MindsDB SQL query and return results

        Args:
            query: SQL query to execute

        Returns:
            List of dictionaries containing query results
        """
        if not self.connection:
            raise Exception("MindsDB connection not available")

        try:
            # Execute the query
            result = self.connection.query(query)

            # Convert to list of dictionaries
            if hasattr(result, 'fetchall'):
                columns = [desc[0] for desc in result.description]
                rows = result.fetchall()
                return [dict(zip(columns, row)) for row in rows]
            else:
                # Handle different result formats
                return result.to_dict('records') if hasattr(result, 'to_dict') else []

        except Exception as e:
            print(f"Error executing query: {e}")
            raise

    def find_wealthy_clients(self, min_age: int = 40, min_income: float = 70000) -> List[Dict[str, Any]]:
        """
        Find clients in wealthy areas with age and income filtering

        Note: This assumes your client_kb has columns: id, address, current_age, per_capita_income, gender
        You may need to adjust based on your actual KB schema
        """
        query = f"""
        SELECT
            c.id AS client_id,
            c.address,
            c.current_age,
            c.per_capita_income,
            c.gender
        FROM
            client_kb AS c
        WHERE
            c.current_age > {min_age} AND                    
            c.per_capita_income > {min_income};
        """
        return self.execute_query(query)

    def find_travel_expenses(self, min_amount: float = 500, use_chip: bool = True) -> List[Dict[str, Any]]:
        """
        Find transactions related to travel with amount and chip usage filtering

        Note: This assumes your transaction_kb has appropriate columns
        """
        # Use proper boolean value for MindsDB
        chip_value = 'true' if use_chip else 'false'
        query = f"""
        SELECT
            t.id AS transaction_id,
            t.amount,
            t.date,
            t.merchant_city,
            t.merchant_state,
            t.use_chip,
            t.client_id
        FROM
            transaction_kb AS t
        WHERE
            t.amount > {min_amount} AND                   
            t.use_chip = {chip_value};
        """
        return self.execute_query(query)

    def find_online_shopping(self, state: str = 'California') -> List[Dict[str, Any]]:
        """
        Find transactions for online shopping in a specific state
        """
        query = f"""
        SELECT
            t.id AS transaction_id,
            t.amount,
            t.merchant_city,
            t.merchant_state,
            t.client_id
        FROM
            transaction_kb AS t
        WHERE
            t.merchant_state = '{state}';
        """
        return self.execute_query(query)

    def semantic_search_transactions(self, search_term: str, limit: int = 10) -> List[Dict[str, Any]]:
        """
        Perform semantic search on transaction knowledge base

        Args:
            search_term: Natural language search term (e.g., "suspicious activity", "travel expenses")
            limit: Maximum number of results to return
        """
        query = f"""
        SELECT 
            *
        FROM transaction_kb 
        WHERE 
            MATCH('{search_term}')
        LIMIT {limit};
        """
        return self.execute_query(query)

    def semantic_search_clients(self, search_term: str, limit: int = 10) -> List[Dict[str, Any]]:
        """
        Perform semantic search on client knowledge base

        Args:
            search_term: Natural language search term (e.g., "wealthy suburban areas")
            limit: Maximum number of results to return
        """
        query = f"""
        SELECT 
            *
        FROM client_kb 
        WHERE 
            MATCH('{search_term}')
        LIMIT {limit};
        """
        return self.execute_query(query)

    def get_transaction_summary(self, transaction_id: int) -> List[Dict[str, Any]]:
        """
        Get AI-generated summary for a specific transaction

        Note: This assumes you have a model named 'summarize_transactions_model'
        and that the model expects transaction_id as input
        """
        query = f"""
        SELECT 
            transaction_id,
            summary
        FROM summarize_transactions_model 
        WHERE 
            transaction_id = {transaction_id};
        """
        return self.execute_query(query)

    def analyze_suspicious_patterns(self, client_id: Optional[int] = None) -> List[Dict[str, Any]]:
        """
        Use AI model to analyze suspicious transaction patterns

        This is a more realistic approach than the original join-based method
        """
        where_clause = f"WHERE client_id = {client_id}" if client_id else ""

        query = f"""
        SELECT 
            t.id,
            t.amount,
            t.date,
            t.merchant_city,
            t.merchant_state,
            t.client_id,
            p.risk_score,
            p.risk_reason
        FROM transaction_kb t
        JOIN pattern_analysis_model p ON t.id = p.transaction_id
        {where_clause}
        ORDER BY p.risk_score DESC;
        """
        return self.execute_query(query)

    def get_knowledge_base_stats(self) -> Dict[str, int]:
        """
        Get statistics about the knowledge bases
        """
        try:
            client_count = self.execute_query("SELECT COUNT(*) as count FROM client_kb;")
            transaction_count = self.execute_query("SELECT COUNT(*) as count FROM transaction_kb;")

            return {
                'client_kb_count': client_count[0]['count'] if client_count else 0,
                'transaction_kb_count': transaction_count[0]['count'] if transaction_count else 0
            }
        except Exception as e:
            print(f"Error getting knowledge base stats: {e}")
            return {'client_kb_count': 0, 'transaction_kb_count': 0}

    def custom_semantic_search(self, 
                             search_term: str, 
                             kb_type: str = 'transaction',
                             filters: Dict[str, Any] = None,
                             limit: int = 10) -> List[Dict[str, Any]]:
        """
        Perform custom semantic search on knowledge bases with optional filters

        Args:
            search_term: Natural language search term
            kb_type: 'transaction' or 'client'
            filters: Dictionary of filters to apply
            limit: Maximum number of results

        Returns:
            List of matching results
        """
        kb_name = f"{kb_type}_kb"

        # Build the base semantic search query
        query = f"SELECT * FROM {kb_name} WHERE MATCH('{search_term}')"

        # Add filters if provided
        if filters:
            filter_conditions = []
            for key, value in filters.items():
                if isinstance(value, str):
                    # Escape single quotes in string values
                    escaped_value = value.replace("'", "''")
                    filter_conditions.append(f"{key} = '{escaped_value}'")
                elif isinstance(value, bool):
                    filter_conditions.append(f"{key} = {'true' if value else 'false'}")
                else:
                    filter_conditions.append(f"{key} = {value}")

            if filter_conditions:
                query += " AND " + " AND ".join(filter_conditions)

        query += f" LIMIT {limit};"
        return self.execute_query(query)

    def test_connection(self) -> bool:
        """
        Test if MindsDB connection is working
        """
        try:
            if not self.connection:
                return False

            # Try a simple query
            result = self.execute_query("SELECT 1 as test;")
            return len(result) > 0 and result[0].get('test') == 1

        except Exception as e:
            print(f"Connection test failed: {e}")
            return False

    def list_available_tables(self) -> List[str]:
        """
        List all available tables/knowledge bases in MindsDB
        """
        try:
            result = self.execute_query("SHOW TABLES;")
            return [row.get('table_name', row.get('Tables_in_mindsdb', '')) for row in result]
        except Exception as e:
            print(f"Error listing tables: {e}")
            return []

    def describe_table(self, table_name: str) -> List[Dict[str, Any]]:
        """
        Get schema information for a specific table
        """
        try:
            return self.execute_query(f"DESCRIBE {table_name};")
        except Exception as e:
            print(f"Error describing table {table_name}: {e}")
            return []


# Global instance for easy access
mindsdb_util = MindsDBUtil()

Django views then simply import mindsdb_client and call its query() method:

from django.shortcuts import render  
from .mindsdb_util import mindsdb_client  

def client_kb_search_view(request):  
    query_text = request.GET.get('query', '')  
    if query_text:  
        sql_query = f"SELECT id, address, current_age, per_capita_income FROM client_kb WHERE content LIKE '{query_text}' LIMIT 10;"  
        results_df = mindsdb_client.query(sql_query)  
        # Convert to list of dicts for template  
        results = results_df.to_dict('records') 
        
    return render(request, 'finance/client_kb_search.html', {'query': query_text, 'results': results})

This application demonstrates several powerful use cases for MindsDB Knowledge Bases in a financial context:

Customer Segmentation & Targeted Marketing: Semantically search for "affluent clients in suburban areas" to tailor product offerings.
Fraud Detection: Query "suspicious transactions in unusual locations" or "unusually large spending patterns" and get AI summaries for quick analysis by human analysts.
Risk Assessment: Identify clients or transaction types that semantically match "high-risk profiles."
Compliance & Audit: Quickly retrieve all data points related to specific semantic concepts for reporting.

Conclusion

MindsDB Knowledge Bases significantly lower the barrier to integrating advanced semantic search and AI capabilities into existing data infrastructure. The ability to combine natural language queries with traditional SQL filtering, and to chain these with AI Tables, opens up a world of possibilities for building intelligent data applications.

This project was a great experience in leveraging MindsDB's powerful features to transform raw data into actionable, semantically queryable insights within a familiar Django environment.

GitHub Repository: [https://github.com/samaras/django-mindsdb-transactions]
MindsDB Docs: https://mindsdb.com/docs/

Top comments (1)

Md Abid Hussain • Jul 2

This is awesome