Just spent some time getting hands-on with seekdb, and it’s been a pleasant surprise—here’s a quick breakdown of what caught my eye (no fluff, just the good stuff):
✅ Lightweight & easy to spin up: Runs smoothly on my MacBook via Docker Desktop, or straight up with pip on Linux. macOS/Windows native support is on the way, so soon it’ll be a simple one-command install, no Docker required.
✅ Unified architecture done right: Natively supports relational, vector, full-text, JSON, and GIS data types—all indexes update atomically in the same transaction. Zero Data Lag, strict ACID compliance, and none of the latency/inconsistency headaches from traditional CDC sync.
✅ AI-Native out of the box: Built-in embedding models and AI functions mean one SQL query handles vector + full-text + scalar filtering. No more messy glue code to stitch tech stacks together—perfect for powering RAG workflows.
✅ Schema-free API: Write directly, no need to predefine rigid table structures—saves so much setup time.
✅ Full MySQL compatibility: Easy upgrade path for traditional databases looking to add AI capabilities without a complete overhaul.
✅ Open-source (Apache 2.0) with OceanBase backing: Long-term support is locked in, and the project’s only getting better—always a win for the community.

In this tutorial, we'll build an intelligent book search application from scratch using seekdb, demonstrating semantic search, hybrid search, and other core capabilities.
What We'll Build
This tutorial will walk you through creating a smart book search app that demonstrates seekdb's main features:
1. Data Import
- Import from CSV files into seekdb
- Support batch data import
- Automatically convert book text information into 384-dimensional vector embeddings
2. Three Search Capabilities
- Semantic Search: Based on vector similarity, use natural language queries to find semantically related books
- Metadata Filtering: Precise filtering by rating, genre, year, price, and other fields
- Hybrid Search: Combines semantic search + metadata filtering using RRF (Reciprocal Rank Fusion) algorithm
3. Index Optimization
- Create HNSW vector indexes to boost semantic search performance
- Generate column indexes from metadata (extract fields from JSON to create indexes)
4. Tech Stack
- Database: seekdb, pyseekdb (seekdb's Python SDK), pymysql
- Data Processing: pandas
Prerequisites
1. Install OrbStack
OrbStack is a lightweight Docker alternative optimized for Mac. It starts fast and uses fewer resources. We'll use it to deploy seekdb locally.
Step 1: Install via Homebrew (Recommended)
brew install orbstack
Or download from the official website: https://orbstack.dev
Step 2: Start OrbStack
# Start OrbStack
open -a OrbStack
# Verify installation
orb version
2. Deploy seekdb Image
If downloads are slow, configure Docker to use a domestic mirror source in OrbStack settings.
# Pull seekdb image
docker pull oceanbase/seekdb:latest
# Start seekdb container
docker run -d \
--name seekdb \
-p 2881:2881 \
-e MODE=slim \
oceanbase/seekdb:latest
# Check container status
docker ps | grep seekdb
# View logs (ensure service started successfully)
docker logs seekdb
Wait about 30 seconds for seekdb to fully start. You can monitor the startup logs with docker logs -f seekdb. When you see "boot success", it's ready.
3. Download the Dataset
Download the dataset from: https://www.kaggle.com/datasets/sootersaalu/amazon-top-50-bestselling-books-2009-2019
Rename it to: bestsellers_with_categories.csv. It contains 550 records of Amazon's historical bestsellers.

4. Download the Tutorial Code
git clone https://github.com/kejun/demo-seekdb-hybridsearch.git
Project Structure:
demo-seekdb-books-hybrid-search/
├── database/
│ ├── db_client.py # Database client wrapper
│ └── index_manager.py # Index manager
├── data/
│ └── processor.py # Data processor
├── models/
│ └── book_metadata.py # Book metadata model
├── utils/
│ └── text_utils.py # Text processing utilities
├── import_data.py # Data import script
├── hybrid_search.py # Hybrid search demo
└── bestsellers_with_categories.csv # Data file
Create Python Virtual Environment:
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate # macOS/Linux
# or
.\venv\Scripts\activate # Windows
Install Dependencies:
pip install -r requirements.txt
Execution Results
Run python import_data.py to import data. You'll see the entire process: load data file → connect to database → create database → create collection → batch import data → create metadata indexes.
(Note: seekdb currently supports HNSW indexes for embedding columns and full-text indexes for document columns. Metadata field indexing is planned for future releases.)
seekdb uses a schema-free interface design. For example, in data/processor.py, when calling collection.add(), you can pass any dictionary directly:
collection.add(
ids=valid_ids,
documents=valid_documents,
metadatas=valid_metadatas # Pass dictionary list directly, no schema predefinition needed
)
Complete Results (abbreviated):
Loading data file: bestsellers_with_categories.csv
Data loaded!
- Total rows: 550
- Total columns: 7
- Column names: Name, Author, User Rating, Reviews, Price, Year, Genre
- Load time: 0.01 seconds
Connecting to database...
Host: 127.0.0.1:2881
Database: demo_books
Collection: book_info
Database ready
Database connection successful
Creating/rebuilding collection...
Collection name: book_info
Vector dimensions: 384
Distance metric: cosine
Collection created successfully
Processing data...
Data preprocessing complete!
- Total records: 550
- Validation errors: 0
- Processing time: 0.05 seconds
Importing data to collection...
- Batch size: 100
- Total batches: 6
- Starting import...
Import progress: 100%|█████████████████████████████████████| 6/6 [00:53<00:00, 8.97s/batch]
Data import complete!
- Import time: 53.83 seconds
- Average speed: 10 records/second
Creating metadata indexes...
- Index fields: genre, year, user_rating, author, reviews, price
Index creation complete!
- Creation time: 3.81 seconds
Data import process complete!
Total time: 59.64 seconds
Imported records: 550
Database: demo_books
Collection: book_info
After importing data, you can query the database directly using the MySQL client or install obclient in the terminal.
# Enter seekdb container
docker exec -it seekdb bash
# Connect using MySQL client (seekdb is MySQL-compatible)
mysql -h127.0.0.1 -P2881 -uroot
book_info is a seekdb collection, which corresponds to the underlying table name c$v1$book_info:
-- View all databases
SHOW DATABASES;
-- Switch to demo database
USE demo;
-- View all tables (collections)
SHOW TABLES;
-- View collection structure
DESC c$v1$book_info;
-- Query collection data
SELECT * FROM c$v1$book_info LIMIT 10;
-- Count records
SELECT COUNT(*) FROM c$v1$book_info;
-- Exit
EXIT;
show table schemaDESC c$v1$book_info:
show index created:
(Note: pyseekdb doesn't currently support direct indexing of metadata columns, so the project uses pymysql + SQL DDL to implement metadata indexing. The next pyseekdb version will support automatic indexing of metadata fields.)
Running Hybrid Search
Next, run python hybrid_search.py. seekdb's built-in embedding model is sentence-transformers/all-MiniLM-L6-v2, with a maximum vector dimension of 384. For better results, configure an external model service.
Hybrid search is seekdb's killer feature. It simultaneously executes full-text retrieval and vector retrieval, then merges results using the RRF (Reciprocal Rank Fusion) algorithm.
Looking at the code example, query_params defines a full-text search for "inspirational" while filtering by user rating (user_rating >= 4.5) from metadata. knn_params is semantic search, with query_texts being the phrase "inspirational life advice", using the same user rating filter.
Code Snippet:
query_params = {
"where_document": {"$contains": "inspirational"},
"where": {"user_rating": {"$gte": 4.5}},
"n_results": 5
}
knn_params = {
"query_texts": ["inspirational life advice"],
"where": {"user_rating": {"$gte": 4.5}},
"n_results": 5
}
results = collection.hybrid_search(
query=query_params,
knn=knn_params,
rank={"rrf": {}},
n_results=5,
include=["metadatas", "documents", "distances"]
)
The results are impressively accurate. Complete execution results (abbreviated):
=== Semantic Search ===
Query: ['self improvement motivation success']
Semantic Search - Found 5 results:
[1] The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change
Author: Stephen R. Covey
Rating: 4.6
Reviews: 9325
Price: $24.0
Year: 2011
Genre: Non Fiction
Similarity distance: 0.5358
Similarity: 0.4642
(Other results omitted...)
=== Hybrid Search (Rating≥4.5) ===
Query: {'where_document': {'$contains': 'inspirational'}, 'where': {'user_rating': {'$gte': 4.5}}, 'n_results': 5}
KNN Query Texts: ['inspirational life advice']
Hybrid Search (Rating≥4.5) - Found 5 results:
[1] Mindset: The New Psychology of Success
Author: Carol S. Dweck
Rating: 4.6
Reviews: 5542
Price: $10.0
Year: 2014
Genre: Non Fiction
Similarity distance: 0.0159
Similarity: 0.9841
(Other results omitted...)
=== Hybrid Search (Non Fiction) ===
Query: {'where_document': {'$contains': 'business'}, 'where': {'genre': 'Non Fiction'}, 'n_results': 5}
KNN Query Texts: ['business entrepreneurship leadership']
Hybrid Search (Non Fiction) - Found 5 results:
[1] The Five Dysfunctions of a Team: A Leadership Fable
Author: Patrick Lencioni
Rating: 4.6
Reviews: 3207
Price: $6.0
Year: 2009
Genre: Non Fiction
Similarity distance: 0.0164
Similarity: 0.9836
(Other results omitted...)
=== Hybrid Search (Fiction, After 2015, Rating≥4.0) ===
Query: {'where_document': {'$contains': 'fiction'}, 'where': {'$and': [{'year': {'$gte': 2015}}, {'user_rating': {'$gte': 4.0}}, {'genre': 'Fiction'}]}, 'n_results': 5}
KNN Query Texts: ['fiction story novel']
Hybrid Search (Fiction, After 2015, Rating≥4.0) - Found 5 results:
[1] A Gentleman in Moscow: A Novel
Author: Amor Towles
Rating: 4.7
Reviews: 19699
Price: $15.0
Year: 2017
Genre: Fiction
Similarity distance: 0.0154
Similarity: 0.9846
(Other results omitted...)
=== Hybrid Search (Reviews≥10000) ===
Query: {'where_document': {'$contains': 'popular'}, 'where': {'reviews': {'$gte': 10000}}, 'n_results': 10}
KNN Query Texts: ['popular bestseller']
Hybrid Search (Reviews≥10000) - Found 10 results:
[1] Twilight (The Twilight Saga, Book 1)
Author: Stephenie Meyer
Rating: 4.7
Reviews: 11676
Price: $9.0
Year: 2009
Genre: Fiction
Similarity distance: 0.0143
Similarity: 0.9857
[2] 1984 (Signet Classics)
Author: George Orwell
Rating: 4.7
Reviews: 21424
Price: $6.0
Year: 2017
Genre: Fiction
Similarity distance: 0.0145
Similarity: 0.9855
[3] Last Week Tonight with John Oliver Presents A Day in the Life of Marlon Bundo (Better Bundo Book, LGBT Childrens Book)
Author: Jill Twiss
Rating: 4.9
Reviews: 11881
Price: $13.0
Year: 2018
Genre: Fiction
Similarity distance: 0.0147
Similarity: 0.9853
(Other results omitted...)
Vibe Coding Friendly
If you're using Cursor or Claude Code for development, you've probably installed context7-mcp. It queries the latest API documentation, code examples, and more—the perfect companion for vibe coding. I noticed seekdb has been added to Context7:
- seekdb: https://context7.com/oceanbase/seekdb
- pyseekdb: https://context7.com/oceanbase/pyseekdb
If you haven't installed it yet, I highly recommend it:
{
"mcpServers": {
"context7": {
"command": "npx",
"args": [
"-y",
"@upstash/context7-mcp",
"--api-key",
"<your-apiKey-created-on-context7>"
]
}
}
}
After installation, you can learn and use seekdb simultaneously.
Key Takeaways
What makes seekdb special:
- Lightweight & Easy to Deploy: Runs smoothly on a MacBook, with native macOS/Windows support coming soon
- Unified Architecture: Combines relational, vector, full-text, JSON, and GIS in one system
- AI-Native: Built-in embeddings and AI functions, no glue code needed
- Schema-Free: Write directly without predefining schemas
- MySQL-Compatible: Easy migration path for existing databases
- Open Source: Apache 2.0 license with OceanBase backing
The hybrid search capability is particularly impressive—combining semantic understanding with precise metadata filtering delivers results that feel both intelligent and accurate.
- Repo: github.com/oceanbase/seekdb (Apache 2.0 — Stars, Issues, PRs welcome)
- Docs: seekdb documentation
- Discord: https://discord.com/channels/1331061822945624085/1331061823465590805
- Medium:https://medium/seekdb
- Press: OceanBase Releases seekdb (MarkTechPost)
I hope this tutorial helps you get started with seekdb more smoothly. Enjoy building! 🚀


Top comments (0)