DEV Community

Julien L for WiScale

Posted on

Your RAG pipeline is missing two-thirds of the picture

Most RAG pipelines do one thing well: find text chunks that are semantically similar to a query. But a real customer question like "My API calls are failing and I need to upgrade my plan" isn't answered by similarity alone. You need:

  • Semantic similarity to understand the intent (vector search)
  • Keyword precision to catch exact terms like "API" and "upgrade" (text search)
  • Metadata filtering to surface only relevant, high-quality articles (SQL conditions)
  • Relationship awareness to follow the thread from "API rate limits" to "plan upgrade" to "billing FAQ" (graph traversal)

Gluing Pinecone + Elasticsearch + Neo4j together to get this is an infrastructure nightmare. VelesQL does it in one query string, inside a single embedded engine.

In this tutorial, we'll build a customer support AI agent that uses all four query types against VelesDB v1.9.2.

What is VelesQL?

VelesQL is a SQL-like language with two domain-specific extensions:

  • NEAR - vector similarity search (HNSW)
  • MATCH 'term' - full-text search (BM25)

Combine them with standard SQL predicates (AND, OR, <, =, >) and you can express hybrid retrieval logic in one statement. VelesDB also includes a native graph engine accessible via a Python traversal API (traverse_bfs, traverse_dfs). If you know SQL, you already know 90% of VelesQL.

-- Hybrid vector + text + metadata in one query
SELECT * FROM support_kb
WHERE vector NEAR $v
  AND content MATCH 'authentication'
  AND resolved_pct > 80
LIMIT 5
Enter fullscreen mode Exit fullscreen mode

This single query combines vector search, keyword matching, and SQL filters. Add graph traversal in Python and you cover all four retrieval paradigms.

Setup

pip install velesdb
Enter fullscreen mode Exit fullscreen mode

That installs a ~6MB native Rust engine. No Docker, no server, no config files.

import velesdb
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
db = velesdb.Database("./support_data")
collection = db.create_collection(
    "support_kb", dimension=384, metric="cosine"
)
Enter fullscreen mode Exit fullscreen mode

Step 1: Build the knowledge base

We'll populate a support knowledge base for a SaaS product. Each article gets a vector embedding from its title and content, plus structured metadata:

articles = [
    {
        "id": 1,
        "title": "How to reset your password",
        "content": "To reset your password, navigate to Settings, "
                   "then Security, then Reset Password...",
        "category": "account",
        "product": "auth-service",
        "difficulty": "beginner",
        "resolved_pct": 95.0,
        "views": 12500,
    },
    {
        "id": 2,
        "title": "Setting up two-factor authentication",
        "content": "Enable 2FA from Settings, Security, 2FA. "
                   "We support Google Authenticator and YubiKey...",
        "category": "account",
        "product": "auth-service",
        "difficulty": "intermediate",
        "resolved_pct": 88.0,
        "views": 8900,
    },
    # ... 8 more articles covering billing, technical, compliance
]

points = []
for article in articles:
    text = f"{article['title']} {article['content']}"
    embedding = model.encode(text).tolist()
    payload = {k: v for k, v in article.items() if k != "id"}
    points.append({
        "id": article["id"],
        "vector": embedding,
        "payload": payload
    })

collection.upsert(points)
Enter fullscreen mode Exit fullscreen mode

Ten articles across four categories: account, billing, technical, compliance. Each has a difficulty level, a resolution rate, and a view count. In production you'd have thousands - the queries scale the same way.

Step 2: Semantic search with NEAR

A customer writes: "I forgot my password and I'm locked out." Let's find relevant articles:

question = "I forgot my password and I am locked out of my account"
query_vec = model.encode(question).tolist()

results = collection.query(
    "SELECT * FROM support_kb WHERE vector NEAR $v LIMIT 5",
    params={"v": query_vec}
)

for r in results:
    print(f"  {r['fused_score']:.3f} | {r['payload']['title']}")
Enter fullscreen mode Exit fullscreen mode
  0.645 | How to reset your password
  0.203 | Setting up two-factor authentication
  0.160 | Resolving payment failures
  0.083 | SSO integration with SAML and OIDC
  0.080 | Upgrading your subscription plan
Enter fullscreen mode Exit fullscreen mode

The NEAR operator runs HNSW similarity search against the article embeddings. Results are ranked by cosine similarity. This is the same search you'd get from any vector database, but expressed as a query string you can log, compose, and debug.

Step 3: Add metadata filters

The customer is a free-tier user. We only want beginner-level articles with a high resolution rate:

results = collection.query(
    """SELECT * FROM support_kb
       WHERE vector NEAR $v
       AND difficulty = 'beginner'
       AND resolved_pct > 90
       LIMIT 5""",
    params={"v": query_vec}
)
Enter fullscreen mode Exit fullscreen mode
  0.645 | How to reset your password      | beginner | 95.0%
  0.013 | Understanding your invoice       | beginner | 92.0%
Enter fullscreen mode Exit fullscreen mode

The SQL conditions work on the structured metadata alongside the vector similarity. The advanced SSO and 2FA articles are gone because they're not "beginner" difficulty. This is the part most vector databases make painful - you either filter client-side (wasting bandwidth) or learn a custom filter DSL. VelesQL keeps it SQL.

Step 4: Keyword precision with MATCH

Now a developer writes: "How do I configure authentication for my API?" This needs both semantic understanding and keyword precision. The word "authentication" should match exactly:

question = "How do I configure authentication for my API?"
query_vec = model.encode(question).tolist()

results = collection.query(
    """SELECT * FROM support_kb
       WHERE vector NEAR $v
       AND content MATCH 'authentication'
       LIMIT 5""",
    params={"v": query_vec}
)

for r in results:
    print(f"  fused={r['fused_score']:.3f} vec={r['vector_score']:.3f}"
          f" | {r['payload']['title']}")
Enter fullscreen mode Exit fullscreen mode
  fused=0.017 vec=0.017 | Setting up two-factor authentication
  fused=0.016 vec=0.016 | Webhook configuration and troubleshooting
  fused=0.016 vec=0.016 | GraphQL API getting started guide
  fused=0.008 vec=0.008 | How to reset your password
  fused=0.008 vec=0.008 | SSO integration with SAML and OIDC
Enter fullscreen mode Exit fullscreen mode

The MATCH operator runs BM25 full-text search. Combined with NEAR, VelesDB fuses both signals using Reciprocal Rank Fusion (RRF). Articles containing "authentication" get a score boost. The fused_score blends semantic relevance (from the vector) with keyword relevance (from BM25).

This solves a classic RAG failure mode: pure vector search might rank a vaguely related billing article above a directly relevant authentication guide, because the embeddings happen to be close. Adding MATCH anchors the results to the user's exact terminology.

Step 5: Graph traversal for learning paths

So far we've been querying the vector+columnar store. VelesDB also includes a native graph engine. Let's add knowledge relationships:

graph = collection.get_graph_store()

# Articles form a learning path
graph.add_edge({
    "id": 101, "source": 1, "target": 2,
    "label": "NEXT_STEP",
    "properties": {"reason": "After password reset, enable 2FA"}
})
graph.add_edge({
    "id": 102, "source": 2, "target": 8,
    "label": "NEXT_STEP",
    "properties": {"reason": "Advanced: set up SSO after 2FA"}
})

# Technical cross-references
graph.add_edge({
    "id": 105, "source": 5, "target": 6,
    "label": "RELATED",
    "properties": {"reason": "Rate limits affect webhook delivery"}
})
graph.add_edge({
    "id": 106, "source": 6, "target": 10,
    "label": "NEXT_STEP",
    "properties": {"reason": "After webhooks, try the GraphQL API"}
})
Enter fullscreen mode Exit fullscreen mode

Now traverse the graph from the top vector search result. If the customer asked about password resets, what should they read next?

# BFS: find related articles within 2 hops
related = graph.traverse_bfs(source=1, max_depth=2)
for r in related:
    print(f"  depth={r.depth} | article {r.source} -> {r.target} ({r.label})")
Enter fullscreen mode Exit fullscreen mode
  depth=1 | article 1 -> 2 (NEXT_STEP)
  depth=2 | article 2 -> 8 (NEXT_STEP)
Enter fullscreen mode Exit fullscreen mode

The customer asked about password reset. The graph tells us: next, suggest 2FA. After that, SSO. This is the GraphRAG pattern - use vector similarity to find the entry point, then walk the relationship graph to surface connected knowledge.

You can also filter by relationship type:

# Only follow NEXT_STEP edges, ignore RELATED
learning_path = graph.traverse_bfs(
    source=1, max_depth=3,
    relationship_types=["NEXT_STEP"]
)
Enter fullscreen mode Exit fullscreen mode

Or use DFS when you want to explore depth-first:

deep_path = graph.traverse_dfs(source=1, max_depth=3)
Enter fullscreen mode Exit fullscreen mode

Step 6: The complete pipeline

Here's the full support agent in action. One customer question triggers vector search, text matching, metadata filtering, and graph traversal:

def answer_support_question(question: str, collection, graph):
    query_vec = model.encode(question).tolist()

    # Step 1: Hybrid search with filters
    results = collection.query(
        """SELECT * FROM support_kb
           WHERE vector NEAR $v
           AND content MATCH 'API'
           AND resolved_pct > 70
           LIMIT 3""",
        params={"v": query_vec}
    )

    top = results[0]
    print(f"Best match: {top['payload']['title']}")
    print(f"  Score: {top['fused_score']:.3f}")

    # Step 2: Follow the knowledge graph
    related = graph.traverse_bfs(source=top["id"], max_depth=2)
    if related:
        print("Related articles:")
        for r in related:
            print(f"  -> article {r.target} ({r.label}, depth={r.depth})")

    return results, related

graph = collection.get_graph_store()
answer_support_question(
    "My API calls are getting rejected, need help with rate limits",
    collection,
    graph
)
Enter fullscreen mode Exit fullscreen mode
Best match: API rate limiting and quotas
  Score: 0.016
Related articles:
  -> article 6 (RELATED, depth=1)
  -> article 10 (NEXT_STEP, depth=2)
Enter fullscreen mode Exit fullscreen mode

The hybrid search found the API rate limiting article (combining semantic match + keyword "API" + high resolution filter). Then traverse_bfs walked the graph to surface related articles: webhook troubleshooting (because rate limits affect webhook delivery) and the GraphQL API guide (as a next step after webhooks).

Four query paradigms - vector, text, columnar, graph - two calls, one engine, zero infrastructure.

Step 7: What's under the hood

VelesDB v1.9.2 gives you visibility into query execution:

plan = collection._inner.explain(
    "SELECT * FROM support_kb "
    "WHERE vector NEAR $v AND category = 'technical' LIMIT 10"
)
print(plan["tree"])
Enter fullscreen mode Exit fullscreen mode
Query Plan:
+-- VectorSearch
|   +-- Collection: support_kb
|   +-- ef_search: 100
|   +-- Candidates: 10
+-- Filter
|   +-- Conditions: category = ?
|   +-- Selectivity: 50.0%
+-- Limit: 10

Estimated cost: 0.106ms
Index used: HNSW
Filter strategy: post-filtering (low selectivity)
Enter fullscreen mode Exit fullscreen mode

The planner tells you which index it used, the filter strategy, and the estimated cost. When you're debugging slow queries in production, this is gold.

New in v1.9.2: adaptive search quality

VelesDB v1.9.2 introduces search quality modes that trade precision for speed:

# Fast: fewer candidates, lower latency
results = collection.search_with_quality(query_vec, "fast", top_k=5)

# Balanced: good default
results = collection.search_with_quality(query_vec, "balanced", top_k=5)

# Accurate: more candidates, higher recall
results = collection.search_with_quality(query_vec, "accurate", top_k=5)
Enter fullscreen mode Exit fullscreen mode

Five modes available: fast, balanced, accurate, perfect, and autotune. Use fast for autocomplete, accurate for final answers, autotune to let the engine decide based on collection size.

New in v1.9.2: multi-query fusion

When one question maps to multiple search intents, fuse them:

# Customer asks about both account and billing
q1 = model.encode("password reset account access").tolist()
q2 = model.encode("billing invoice payment").tolist()

results = collection._inner.multi_query_search(
    vectors=[q1, q2],
    top_k=5,
    fusion=velesdb.FusionStrategy.rrf()
)
Enter fullscreen mode Exit fullscreen mode

Five fusion strategies:

velesdb.FusionStrategy.rrf()                         # Reciprocal Rank Fusion
velesdb.FusionStrategy.average()                      # Average scores
velesdb.FusionStrategy.maximum()                      # Take the max
velesdb.FusionStrategy.relative_score(0.7, 0.3)       # Weighted by score type
velesdb.FusionStrategy.weighted(0.5, 0.3, 0.2)        # Custom weight blend
Enter fullscreen mode Exit fullscreen mode

This is useful when a single user query decomposes into multiple search intents - common in agentic RAG pipelines where the LLM generates sub-queries.

New in v1.9.2: VelesQL parser

You can now validate and introspect queries before executing them:

from velesdb import VelesQL

# Validate user input
assert VelesQL.is_valid("SELECT * FROM kb WHERE vector NEAR $v LIMIT 10")
assert not VelesQL.is_valid("DROP TABLE kb")  # Nice try

# Introspect a query
parsed = VelesQL.parse(
    "SELECT * FROM support_kb WHERE vector NEAR $v "
    "AND category = 'technical' LIMIT 5"
)
print(parsed.table_name)          # "support_kb"
print(parsed.has_vector_search()) # True
print(parsed.has_where_clause())  # True
print(parsed.limit)               # 5
Enter fullscreen mode Exit fullscreen mode

The parser understands GROUP BY, HAVING, ORDER BY, JOIN, DISTINCT, and OFFSET:

agg = VelesQL.parse(
    "SELECT category, COUNT(*) FROM kb GROUP BY category"
)
print(agg.group_by)  # ["category"]

ordered = VelesQL.parse(
    "SELECT * FROM kb ORDER BY views DESC LIMIT 10"
)
print(ordered.order_by)  # [("views", "DESC")]
Enter fullscreen mode Exit fullscreen mode

This is valuable for building query builders, admin UIs, or query validation middleware in production systems.

The full picture

Capability VelesQL Syntax What it does
Vector search WHERE vector NEAR $v Semantic similarity (HNSW)
Text search AND content MATCH 'term' BM25 keyword ranking
Metadata filter AND price < 100 SQL-style conditions
Graph traversal graph.traverse_bfs() BFS/DFS graph exploration
Query plan collection._inner.explain() Execution introspection
Quality modes search_with_quality("fast") Precision/speed tradeoff
Multi-query fusion multi_query_search() Combine multiple intents
Query validation VelesQL.is_valid() Pre-execution checks

All of this runs inside a single ~6MB embedded engine. No Docker, no network calls, no infrastructure to maintain.

Why this matters for RAG

The typical RAG architecture looks like this:

  1. Embed the user question
  2. Vector search in Pinecone/Qdrant/Weaviate
  3. Maybe filter results client-side
  4. Maybe query a separate text search engine
  5. Maybe query a graph database for relationships
  6. Glue it all together in application code

With VelesQL:

  1. Embed the user question
  2. One query: WHERE vector NEAR $v AND content MATCH 'term' AND filter = value
  3. Walk the graph: graph.traverse_bfs(source=top_id, max_depth=2)

Less infrastructure. Less code. Fewer failure modes. And because it's embedded, your tests run in milliseconds against real data - no mock database needed.

Getting started

pip install velesdb sentence-transformers
Enter fullscreen mode Exit fullscreen mode

The complete runnable script for this tutorial is available as a single Python file. Copy, paste, run.

Full docs: velesdb.com/en
GitHub: github.com/cyberlife-coder/VelesDB

VelesDB is source-available under the Elastic License 2.0.


What's your current stack for hybrid retrieval? Are you gluing multiple databases together, or have you found a unified approach? I'd love to hear what works (and what breaks) in your RAG pipelines.

Top comments (0)