DEV Community

Abu Taher Siddik
Abu Taher Siddik

Posted on

Build a Powerful Semantic Knowledge Base CLI with MindsDB and Google Gemini 2.5 Flash

Introduction

In today’s data-driven world, building intelligent systems that can understand, organize, and retrieve information semantically is a game-changer. Whether you’re powering customer support, knowledge management, or AI assistants, having a semantic knowledge base that can ingest your data and answer natural language queries is essential.

In this post, I’ll walk you through building a production-ready CLI tool that leverages MindsDB’s Knowledge Base capabilities combined with Google’s Gemini 2.5 Flash model for state-of-the-art embeddings and reranking. Plus, we’ll explore advanced features like asynchronous jobs, metadata handling, and MindsDB AI Tables for summarization and classification.


What is MindsDB Knowledge Base?

MindsDB Knowledge Base is an AI-powered layer on top of your data that enables semantic search, natural language querying, and AI-driven insights. It integrates with popular AI models to provide embeddings and reranking, turning your raw data into a searchable, intelligent knowledge graph.


Why Google Gemini 2.5 Flash?

Google’s Gemini 2.5 Flash model offers advanced reasoning, large context windows, and efficient performance, making it an excellent choice for semantic tasks. MindsDB now supports Gemini 2.5 Flash as a first-class AI engine, allowing you to build powerful knowledge bases without relying solely on OpenAI.


What You’ll Build

A CLI app that can:

  • Create and configure a MindsDB Knowledge Base with Gemini 2.5 Flash embeddings
  • Ingest CSV data asynchronously using MindsDB JOBs
  • Create semantic indexes asynchronously
  • Perform semantic search queries with metadata-aware SQL
  • Create and query MindsDB AI Tables for summarization, classification, and generation

Prerequisites

  • Python 3.8+
  • MindsDB Python SDK
  • Pandas
  • MindsDB Cloud account & API key
  • Google Gemini API key
  • CSV data file

Step 1: Setting Up the CLI App

I built the CLI tool in Python using the MindsDB SDK. It handles everything from engine creation to querying. Here’s a snippet showing how to create the Gemini engine:

def create_gemini_engine(self, engine_name, gemini_api_key):
    create_engine_sql = f"""
    CREATE ML_ENGINE {engine_name}
    FROM google_gemini
    USING api_key = '{gemini_api_key}';
    """
    self.client.query(create_engine_sql)
Enter fullscreen mode Exit fullscreen mode

This registers the Gemini engine in MindsDB for use in knowledge bases.


Step 2: Creating the Knowledge Base

The knowledge base is created with Gemini 2.5 Flash as both the embedding and reranking model:

create_kb_sql = f"""
CREATE KNOWLEDGE_BASE {kb_name}
USING
embedding_model = {{
    "provider": "google_gemini",
    "engine": "{engine_name}",
    "model_name": "gemini-2-5-flash"
}},
reranking_model = {{
    "provider": "google_gemini",
    "engine": "{engine_name}",
    "model_name": "gemini-2-5-flash"
}},
metadata_columns = {metadata_columns},
content_columns = {content_columns},
id_column = '{id_column}';
"""
self.client.query(create_kb_sql)
Enter fullscreen mode Exit fullscreen mode

Step 3: Ingesting Data with Jobs

To handle large datasets efficiently, data ingestion and index creation run asynchronously as MindsDB JOBs:

job = kb.insert(df, async_mode=True)
job.wait()
Enter fullscreen mode Exit fullscreen mode

This approach ensures your CLI remains responsive and scalable.


Step 4: Semantic Search with Metadata

The CLI supports semantic queries with metadata-aware SQL, using window functions like LAST_VALUE() to retrieve the latest metadata per record:

SELECT id, content,
  LAST_VALUE(updated_at) OVER (PARTITION BY id ORDER BY updated_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS latest_update,
  relevance_score
FROM kb_name
WHERE content LIKE ''
AND relevance_score >= 
ORDER BY relevance_score DESC
LIMIT ;
Enter fullscreen mode Exit fullscreen mode

Step 5: AI Tables for Summarization and Classification

MindsDB AI Tables enable you to create tables that perform AI tasks such as summarization or classification on your data:

CREATE AI TABLE summary_table
AS SELECT content
FROM kb_name
PREDICT summary
USING task = 'summarization';
Enter fullscreen mode Exit fullscreen mode

You can then query the AI Table to get AI-generated summaries or classifications.


How to Use the CLI Tool

  1. Ingest data:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --gemini_api_key YOUR_GEMINI_API_KEY --kb_name my_kb --input_file data.csv
Enter fullscreen mode Exit fullscreen mode
  1. Search semantically:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --kb_name my_kb --query "reset password" --limit 5 --relevance_threshold 0.6
Enter fullscreen mode Exit fullscreen mode
  1. Create AI Table:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --create_ai_table --ai_table_name summary_table --source_table my_kb --task_type summarization --input_columns content --output_column summary
Enter fullscreen mode Exit fullscreen mode
  1. Query AI Table:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --query_ai_table --ai_table_name summary_table --limit 5
Enter fullscreen mode Exit fullscreen mode

Conclusion

By combining MindsDB’s Knowledge Base with Google Gemini 2.5 Flash and advanced features like JOBs and AI Tables, you can build scalable, intelligent semantic search and AI applications with ease.

Feel free to check out the full source code and README on my GitHub repo. Happy coding!


References

Top comments (0)