Most RAG pipelines do one thing well: find text chunks that are semantically similar to a query. But a real customer question like "My API calls are failing and I need to upgrade my plan" isn't answered by similarity alone. You need:
- Semantic similarity to understand the intent (vector search)
- Keyword precision to catch exact terms like "API" and "upgrade" (text search)
- Metadata filtering to surface only relevant, high-quality articles (SQL conditions)
- Relationship awareness to follow the thread from "API rate limits" to "plan upgrade" to "billing FAQ" (graph traversal)
Gluing Pinecone + Elasticsearch + Neo4j together to get this is an infrastructure nightmare. VelesQL does it in one query string, inside a single embedded engine.
In this tutorial, we'll build a customer support AI agent that uses all four query types against VelesDB® v1.9.3.
What is VelesQL?
VelesQL is a SQL-like language with three domain-specific extensions:
-
NEAR- vector similarity search (HNSW) -
MATCH 'term'- full-text search (BM25) -
MATCH (a)-[:REL]->(b)- Cypher-like graph pattern matching
Combine them with standard SQL predicates (AND, OR, <, =, >) and you can express hybrid retrieval logic in one statement. If you know SQL and a bit of Cypher, you already know 90% of VelesQL.
-- Hybrid vector + text + metadata
SELECT * FROM support_kb
WHERE vector NEAR $v
AND content MATCH 'authentication'
AND resolved_pct > 80
LIMIT 5
-- Graph + vector in one query
MATCH (doc:Document)-[:AUTHORED_BY]->(author:Person)
WHERE similarity(doc.embedding, $question) > 0.8
AND author.department = 'Engineering'
RETURN author.name, doc.title
ORDER BY similarity() DESC LIMIT 5
The first query combines vector search, keyword matching, and SQL filters. The second one walks the knowledge graph with node labels, filters by metadata, and ranks nodes by semantic similarity. Both are VelesQL, both run in one engine.
Setup
pip install velesdb
That installs a ~6MB native Rust engine. No Docker, no server, no config files.
import velesdb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
db = velesdb.Database("./support_data")
# Create a graph collection: vector + graph + columnar in one object
graph = db._inner.create_graph_collection(
"support_kb", dimension=384, metric="cosine"
)
# Get the collection handle for upsert and VelesQL SELECT queries
collection = db._inner.get_collection("support_kb")
New in v1.9.3: create_graph_collection gives you a unified object that supports vector search, graph traversal, and Cypher-like MATCH queries. The underlying collection handle is used for upserting points and running SELECT queries.
Step 1: Build the knowledge base
We'll populate a support knowledge base for a SaaS product. Each article gets a vector embedding from its title and content, plus structured metadata and a _labels field for graph node typing:
articles = [
{
"id": 1,
"title": "How to reset your password",
"content": "To reset your password, navigate to Settings, "
"then Security, then Reset Password...",
"category": "account",
"product": "auth-service",
"difficulty": "beginner",
"resolved_pct": 95.0,
"views": 12500,
"_labels": ["Article"],
},
{
"id": 2,
"title": "Setting up two-factor authentication",
"content": "Enable 2FA from Settings, Security, 2FA. "
"We support Google Authenticator and YubiKey...",
"category": "account",
"product": "auth-service",
"difficulty": "intermediate",
"resolved_pct": 88.0,
"views": 8900,
"_labels": ["Article"],
},
# ... 8 more articles covering billing, technical, compliance
]
points = []
for article in articles:
text = f"{article['title']} {article['content']}"
embedding = model.encode(text).tolist()
payload = {k: v for k, v in article.items() if k != "id"}
points.append({
"id": article["id"],
"vector": embedding,
"payload": payload
})
collection.upsert(points)
Ten articles across four categories: account, billing, technical, compliance. Each has a difficulty level, a resolution rate, and a view count. The _labels field is what lets MATCH queries filter by node type (like :Article or :Person in Cypher). In production you'd have thousands - the queries scale the same way.
Step 2: Semantic search with NEAR
A customer writes: "I forgot my password and I'm locked out." Let's find relevant articles:
question = "I forgot my password and I am locked out of my account"
query_vec = model.encode(question).tolist()
results = collection.query(
"SELECT * FROM support_kb WHERE vector NEAR $v LIMIT 5",
params={"v": query_vec}
)
for r in results:
print(f" {r['fused_score']:.3f} | {r['payload']['title']}")
0.645 | How to reset your password
0.203 | Setting up two-factor authentication
0.160 | Resolving payment failures
0.083 | SSO integration with SAML and OIDC
0.080 | Upgrading your subscription plan
The NEAR operator runs HNSW similarity search against the article embeddings. Results are ranked by cosine similarity. This is the same search you'd get from any vector database, but expressed as a query string you can log, compose, and debug.
Step 3: Add metadata filters
The customer is a free-tier user. We only want beginner-level articles with a high resolution rate:
results = collection.query(
"""SELECT * FROM support_kb
WHERE vector NEAR $v
AND difficulty = 'beginner'
AND resolved_pct > 90
LIMIT 5""",
params={"v": query_vec}
)
0.645 | How to reset your password | beginner | 95.0%
0.013 | Understanding your invoice | beginner | 92.0%
The SQL conditions work on the structured metadata alongside the vector similarity. The advanced SSO and 2FA articles are gone because they're not "beginner" difficulty. This is the part most vector databases make painful - you either filter client-side (wasting bandwidth) or learn a custom filter DSL. VelesQL keeps it SQL.
Step 4: Keyword precision with MATCH
Now a developer writes: "How do I configure authentication for my API?" This needs both semantic understanding and keyword precision. The word "authentication" should match exactly:
question = "How do I configure authentication for my API?"
query_vec = model.encode(question).tolist()
results = collection.query(
"""SELECT * FROM support_kb
WHERE vector NEAR $v
AND content MATCH 'authentication'
LIMIT 5""",
params={"v": query_vec}
)
for r in results:
print(f" fused={r['fused_score']:.3f} vec={r['vector_score']:.3f}"
f" | {r['payload']['title']}")
fused=0.017 vec=0.017 | Setting up two-factor authentication
fused=0.016 vec=0.016 | Webhook configuration and troubleshooting
fused=0.016 vec=0.016 | GraphQL API getting started guide
fused=0.008 vec=0.008 | How to reset your password
fused=0.008 vec=0.008 | SSO integration with SAML and OIDC
The MATCH operator runs BM25 full-text search. Combined with NEAR, VelesDB fuses both signals using Reciprocal Rank Fusion (RRF). Articles containing "authentication" get a score boost. The fused_score blends semantic relevance (from the vector) with keyword relevance (from BM25).
This solves a classic RAG failure mode: pure vector search might rank a vaguely related billing article above a directly relevant authentication guide, because the embeddings happen to be close. Adding MATCH anchors the results to the user's exact terminology.
Step 5: Graph relationships and traversal
So far we've been querying the vector+columnar store. VelesDB also includes a native graph engine. Let's add knowledge relationships:
# Articles form a learning path
graph.add_edge({
"id": 101, "source": 1, "target": 2,
"label": "NEXT_STEP",
"properties": {"reason": "After password reset, enable 2FA"}
})
graph.add_edge({
"id": 102, "source": 2, "target": 8,
"label": "NEXT_STEP",
"properties": {"reason": "Advanced: set up SSO after 2FA"}
})
# Technical cross-references
graph.add_edge({
"id": 105, "source": 5, "target": 6,
"label": "RELATED",
"properties": {"reason": "Rate limits affect webhook delivery"}
})
graph.add_edge({
"id": 106, "source": 6, "target": 10,
"label": "NEXT_STEP",
"properties": {"reason": "After webhooks, try the GraphQL API"}
})
Now traverse the graph from the top vector search result. If the customer asked about password resets, what should they read next?
# BFS: find related articles within 2 hops
related = graph.traverse_bfs(1, max_depth=2)
for r in related:
print(f" depth={r['depth']} | {r['target_id']} via edge path {r['path']}")
depth=1 | 2 via edge path [101]
depth=2 | 8 via edge path [101, 102]
The customer asked about password reset. The graph tells us: next, suggest 2FA. After that, SSO. This is the GraphRAG pattern - use vector similarity to find the entry point, then walk the relationship graph to surface connected knowledge.
Step 6: Cypher-like MATCH queries
The Python traversal API works, but writing graph logic in application code means your queries are scattered across functions. VelesQL supports Cypher-like MATCH syntax so you can express graph patterns directly in a query string:
results = graph.match_query(
"MATCH (article:Article)-[:NEXT_STEP]->(next:Article) "
"WHERE article.category = 'account' "
"RETURN article.title, next.title, next.difficulty "
"ORDER BY article.title "
"LIMIT 10"
)
for r in results:
p = r["projected"]
print(f" {p['article.title']} -> {p['next.title']} ({p['next.difficulty']})")
How to reset your password -> Setting up two-factor authentication (intermediate)
Setting up two-factor authentication -> SSO integration with SAML and OIDC (advanced)
Same data, same graph, but now the query is a single readable string. (article:Article)-[:NEXT_STEP]->(next:Article) reads like a sentence: "find Article nodes linked by a NEXT_STEP edge to another Article." Filter with WHERE, pick your columns with RETURN, sort, paginate. If you've used Cypher, this is familiar.
Now the part that no graph database does alone - combining graph traversal with vector similarity in one query:
question = "I need help securing my account"
query_vec = model.encode(question).tolist()
results = graph.match_query(
"MATCH (article:Article)-[:NEXT_STEP]->(next:Article) "
"WHERE similarity(article.embedding, $v) > 0.3 "
"RETURN article.title, next.title "
"ORDER BY similarity() DESC "
"LIMIT 5",
params={"v": query_vec},
vector=query_vec,
threshold=0.3
)
for r in results:
print(f" score={r['score']:.3f} | "
f"{r['projected']['article.title']} -> {r['projected']['next.title']}")
score=0.621 | How to reset your password -> Setting up two-factor authentication
score=0.198 | Setting up two-factor authentication -> SSO integration with SAML and OIDC
Read that query out loud: "find Article nodes connected by NEXT_STEP edges, where the article is semantically similar to my question, return titles sorted by similarity." That's vector search and graph traversal in one statement.
Under the hood, the planner picks the best execution strategy automatically:
plan = graph.explain(
"MATCH (article:Article)-[:NEXT_STEP]->(next:Article) "
"WHERE similarity(article.embedding, $v) > 0.5 "
"RETURN article.title, next.title LIMIT 5"
)
print(plan["tree"])
Query Plan:
+-- MatchTraversal
| +-- Strategy: VectorFirst: Search top-20 candidates, then validate graph
| +-- Max Depth: 1
| +-- Relationships: 1
| +-- Similarity Threshold: 0.50
+-- Limit: 5
Estimated cost: 0.251ms
Index used: HNSW
Three strategies available: VectorFirst (find similar nodes, then walk edges), GraphFirst (traverse first, then score by similarity), or Parallel (run both and merge). The planner chooses based on your query shape. You don't need to think about it.
In a traditional stack, this would be: run a vector search in Pinecone, take the IDs, query Neo4j for neighbors, join results in Python. Here it's one query, one engine, one result set.
Step 7: The complete pipeline
Here's the full support agent in action. One customer question triggers vector search, text matching, metadata filtering, and graph traversal:
def answer_support_question(question: str, graph):
query_vec = model.encode(question).tolist()
# Step 1: Hybrid search with filters
results = collection.query(
"""SELECT * FROM support_kb
WHERE vector NEAR $v
AND content MATCH 'API'
AND resolved_pct > 70
LIMIT 3""",
params={"v": query_vec}
)
top = results[0]
print(f"Best match: {top['payload']['title']}")
print(f" Score: {top['fused_score']:.3f}")
# Step 2: Walk the knowledge graph from the top result
related = graph.traverse_bfs(top["id"], max_depth=2)
if related:
print("Related articles:")
for r in related:
print(f" -> article {r['target_id']} "
f"(depth={r['depth']})")
return results, related
answer_support_question(
"My API calls are getting rejected, need help with rate limits",
graph
)
Best match: API rate limiting and quotas
Score: 0.016
Related articles:
-> article 6 (depth=1)
-> article 10 (depth=2)
The hybrid search found the API rate limiting article (combining semantic match + keyword "API" + high resolution filter). Then traverse_bfs walked the graph from that entry point to surface connected articles: webhook troubleshooting (article 6, because rate limits affect webhook delivery) and the GraphQL API guide (article 10, as a next step after webhooks).
Four query paradigms - vector, text, columnar, graph - two calls, one engine, zero infrastructure.
What's under the hood
VelesDB v1.9.3 gives you visibility into query execution:
plan = graph.explain(
"SELECT * FROM support_kb "
"WHERE vector NEAR $v AND category = 'technical' LIMIT 10"
)
print(plan["tree"])
Query Plan:
+-- VectorSearch
| +-- Collection: support_kb
| +-- ef_search: 100
| +-- Candidates: 10
+-- Filter
| +-- Conditions: category = ?
| +-- Selectivity: 50.0%
+-- Limit: 10
Estimated cost: 0.106ms
Index used: HNSW
Filter strategy: post-filtering (low selectivity)
The planner tells you which index it used, the filter strategy, and the estimated cost. When you're debugging slow queries in production, this is gold.
Adaptive search quality
VelesDB v1.9.3 includes search quality modes that trade precision for speed:
# Fast: fewer candidates, lower latency
results = collection.search_with_quality(query_vec, "fast", top_k=5)
# Balanced: good default
results = collection.search_with_quality(query_vec, "balanced", top_k=5)
# Accurate: more candidates, higher recall
results = collection.search_with_quality(query_vec, "accurate", top_k=5)
Five modes available: fast, balanced, accurate, perfect, and autotune. Use fast for autocomplete, accurate for final answers, autotune to let the engine decide based on collection size.
Multi-query fusion
When one question maps to multiple search intents, fuse them:
# Customer asks about both account and billing
q1 = model.encode("password reset account access").tolist()
q2 = model.encode("billing invoice payment").tolist()
results = collection.multi_query_search(
vectors=[q1, q2],
top_k=5,
fusion=velesdb.FusionStrategy.rrf()
)
Five fusion strategies:
velesdb.FusionStrategy.rrf() # Reciprocal Rank Fusion
velesdb.FusionStrategy.average() # Average scores
velesdb.FusionStrategy.maximum() # Take the max
velesdb.FusionStrategy.relative_score(0.7, 0.3) # Weighted by score type
velesdb.FusionStrategy.weighted(0.5, 0.3, 0.2) # Custom weight blend
This is useful when a single user query decomposes into multiple search intents - common in agentic RAG pipelines where the LLM generates sub-queries.
VelesQL parser
You can now validate and introspect queries before executing them:
from velesdb import VelesQL
# Validate user input
assert VelesQL.is_valid("SELECT * FROM kb WHERE vector NEAR $v LIMIT 10")
assert not VelesQL.is_valid("DROP TABLE kb") # Nice try
# Introspect a query
parsed = VelesQL.parse(
"SELECT * FROM support_kb WHERE vector NEAR $v "
"AND category = 'technical' LIMIT 5"
)
print(parsed.table_name) # "support_kb"
print(parsed.has_vector_search()) # True
print(parsed.has_where_clause()) # True
print(parsed.limit) # 5
The parser understands GROUP BY, HAVING, ORDER BY, JOIN, DISTINCT, and OFFSET:
agg = VelesQL.parse(
"SELECT category, COUNT(*) FROM kb GROUP BY category"
)
print(agg.group_by) # ["category"]
ordered = VelesQL.parse(
"SELECT * FROM kb ORDER BY views DESC LIMIT 10"
)
print(ordered.order_by) # [("views", "DESC")]
This is valuable for building query builders, admin UIs, or query validation middleware in production systems.
The full picture
| Capability | VelesQL Syntax | What it does |
|---|---|---|
| Vector search | WHERE vector NEAR $v |
Semantic similarity (HNSW) |
| Text search | AND content MATCH 'term' |
BM25 keyword ranking |
| Metadata filter | AND price < 100 |
SQL-style conditions |
| Graph pattern | MATCH (a:Label)-[:REL]->(b) |
Cypher-like traversal |
| Graph + vector | WHERE similarity() > 0.8 |
Fused vector + graph |
| Graph traversal | graph.traverse_bfs() |
Programmatic BFS/DFS |
| Query plan | graph.explain() |
Execution introspection |
| Quality modes | search_with_quality("fast") |
Precision/speed tradeoff |
| Multi-query fusion | multi_query_search() |
Combine multiple intents |
| Query validation | VelesQL.is_valid() |
Pre-execution checks |
All of this runs inside a single ~6MB embedded engine. No Docker, no network calls, no infrastructure to maintain.
Why this matters for RAG
The typical RAG architecture looks like this:
- Embed the user question
- Vector search in Pinecone/Qdrant/Weaviate
- Maybe filter results client-side
- Maybe query a separate text search engine
- Maybe query a graph database for relationships
- Glue it all together in application code
With VelesQL:
- Embed the user question
SELECT ... WHERE vector NEAR $v AND content MATCH 'term' AND filter = valueMATCH (a)-[:RELATED]->(b) WHERE similarity(a.embedding, $v) > 0.5
Two queries. One engine. Vector, text, metadata, and graph - all in VelesQL.
Less infrastructure. Less code. Fewer failure modes. And because it's embedded, your tests run in milliseconds against real data - no mock database needed.
Getting started
pip install velesdb sentence-transformers
The complete runnable script for this tutorial is available as a single Python file. Copy, paste, run.
Full docs: velesdb.com/en
GitHub: github.com/cyberlife-coder/VelesDB
VelesDB is source-available under the Elastic License 2.0.
What's your current stack for hybrid retrieval? Are you gluing multiple databases together, or have you found a unified approach? I'd love to hear what works (and what breaks) in your RAG pipelines.
Top comments (0)