Introduction
In today’s data-driven world, building intelligent systems that can understand, organize, and retrieve information semantically is a game-changer. Whether you’re powering customer support, knowledge management, or AI assistants, having a semantic knowledge base that can ingest your data and answer natural language queries is essential.
In this post, I’ll walk you through building a production-ready CLI tool that leverages MindsDB’s Knowledge Base capabilities combined with Google’s Gemini 2.5 Flash model for state-of-the-art embeddings and reranking. Plus, we’ll explore advanced features like asynchronous jobs, metadata handling, and MindsDB AI Tables for summarization and classification.
What is MindsDB Knowledge Base?
MindsDB Knowledge Base is an AI-powered layer on top of your data that enables semantic search, natural language querying, and AI-driven insights. It integrates with popular AI models to provide embeddings and reranking, turning your raw data into a searchable, intelligent knowledge graph.
Why Google Gemini 2.5 Flash?
Google’s Gemini 2.5 Flash model offers advanced reasoning, large context windows, and efficient performance, making it an excellent choice for semantic tasks. MindsDB now supports Gemini 2.5 Flash as a first-class AI engine, allowing you to build powerful knowledge bases without relying solely on OpenAI.
What You’ll Build
A CLI app that can:
- Create and configure a MindsDB Knowledge Base with Gemini 2.5 Flash embeddings
- Ingest CSV data asynchronously using MindsDB JOBs
- Create semantic indexes asynchronously
- Perform semantic search queries with metadata-aware SQL
- Create and query MindsDB AI Tables for summarization, classification, and generation
Prerequisites
- Python 3.8+
- MindsDB Python SDK
- Pandas
- MindsDB Cloud account & API key
- Google Gemini API key
- CSV data file
Step 1: Setting Up the CLI App
I built the CLI tool in Python using the MindsDB SDK. It handles everything from engine creation to querying. Here’s a snippet showing how to create the Gemini engine:
def create_gemini_engine(self, engine_name, gemini_api_key):
create_engine_sql = f"""
CREATE ML_ENGINE {engine_name}
FROM google_gemini
USING api_key = '{gemini_api_key}';
"""
self.client.query(create_engine_sql)
This registers the Gemini engine in MindsDB for use in knowledge bases.
Step 2: Creating the Knowledge Base
The knowledge base is created with Gemini 2.5 Flash as both the embedding and reranking model:
create_kb_sql = f"""
CREATE KNOWLEDGE_BASE {kb_name}
USING
embedding_model = {{
"provider": "google_gemini",
"engine": "{engine_name}",
"model_name": "gemini-2-5-flash"
}},
reranking_model = {{
"provider": "google_gemini",
"engine": "{engine_name}",
"model_name": "gemini-2-5-flash"
}},
metadata_columns = {metadata_columns},
content_columns = {content_columns},
id_column = '{id_column}';
"""
self.client.query(create_kb_sql)
Step 3: Ingesting Data with Jobs
To handle large datasets efficiently, data ingestion and index creation run asynchronously as MindsDB JOBs:
job = kb.insert(df, async_mode=True)
job.wait()
This approach ensures your CLI remains responsive and scalable.
Step 4: Semantic Search with Metadata
The CLI supports semantic queries with metadata-aware SQL, using window functions like LAST_VALUE()
to retrieve the latest metadata per record:
SELECT id, content,
LAST_VALUE(updated_at) OVER (PARTITION BY id ORDER BY updated_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS latest_update,
relevance_score
FROM kb_name
WHERE content LIKE ''
AND relevance_score >=
ORDER BY relevance_score DESC
LIMIT ;
Step 5: AI Tables for Summarization and Classification
MindsDB AI Tables enable you to create tables that perform AI tasks such as summarization or classification on your data:
CREATE AI TABLE summary_table
AS SELECT content
FROM kb_name
PREDICT summary
USING task = 'summarization';
You can then query the AI Table to get AI-generated summaries or classifications.
How to Use the CLI Tool
- Ingest data:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --gemini_api_key YOUR_GEMINI_API_KEY --kb_name my_kb --input_file data.csv
- Search semantically:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --kb_name my_kb --query "reset password" --limit 5 --relevance_threshold 0.6
- Create AI Table:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --create_ai_table --ai_table_name summary_table --source_table my_kb --task_type summarization --input_columns content --output_column summary
- Query AI Table:
python kb_cli_advanced.py --api_key YOUR_MDB_API_KEY --query_ai_table --ai_table_name summary_table --limit 5
Conclusion
By combining MindsDB’s Knowledge Base with Google Gemini 2.5 Flash and advanced features like JOBs and AI Tables, you can build scalable, intelligent semantic search and AI applications with ease.
Feel free to check out the full source code and README on my GitHub repo. Happy coding!
Top comments (0)