DEV Community

Cover image for Architecting a Two-Stage Semantic Search Pipeline with HNSW, LATERAL JOIN, and Cubic Scoring
Siyu
Siyu

Posted on

Architecting a Two-Stage Semantic Search Pipeline with HNSW, LATERAL JOIN, and Cubic Scoring

AI agents are becoming a new interface for finding people.

Instead of opening a marketplace, typing keywords, filtering profiles, and manually deciding who is worth contacting, a user can now say something like:

Help me find a few SaaS founders who might need my backend architecture services.

or:

Find remote Rust freelancers who have experience with early-stage infrastructure products.

In Opportunity Skill, the user's AI agent turns that request into a semantic search query, calls the QuestMeet backend, receives a compact list of matched candidates, and then drafts tailored collaboration proposals for the user to approve.

This post is a technical walkthrough of the backend search function behind that flow.

The interesting part is not just "use embeddings". The search engine combines:

  • PostgreSQL
  • pgvector cosine distance
  • HNSW indexes
  • tag-level semantic recall
  • active-user filtering
  • cubic similarity scoring
  • LATERAL JOIN impression reranking
  • separate buyer/professional identity perspectives

The goal is simple:

Given a natural-language request from an AI agent, return the candidates worth contacting, together with enough semantic context for the agent to explain the match and write a good proposal.


What Opportunity Skill does

Opportunity Skill is an Agent Skill that makes a user discoverable to other agents. It supports agent products that follow the Skill specification, such as Claude Code and OpenClaw. The user does not need to manually browse a website. The agent calls functions exposed by the skill.

At a high level, the skill has four processes:

The search engine discussed in this article powers the Search and Contact process.

When the user asks the agent to find buyers or professionals, the agent calls one of these functions:

  • ai_search_buyers
  • ai_search_professionals

These functions are implemented in the skill's scripts/callable_functions.py file and communicate with QuestMeet through GraphQL.

A simplified version looks like this:

import httpx
from typing import Union

BASE_URL = "https://questmeet.ai/graphql"

def ai_search_professionals(access_token: str, query: str) -> Union[list, bool, None]:
    try:
        response = httpx.post(
            BASE_URL,
            json={
                "query": """
                    query AiSearchProfessionals($query: String!) {
                        aiSearchProfessionals(query: $query)
                    }
                """,
                "variables": {"query": query},
            },
            headers={"Authorization": f"Bearer {access_token}"},
            trust_env=False,
            timeout=20,
        )
        return response.json()["data"]["aiSearchProfessionals"]
    except Exception:
        return False
Enter fullscreen mode Exit fullscreen mode

The return values have clear semantics for the agent:

Return value Meaning
list[dict] Relevant candidates were found
[] The request succeeded, but no relevant candidates were found
None The access token is missing or expired; the agent should re-authenticate
False Something failed; notify the user and stop

This is important because the agent, not the server, owns the workflow. If the token is expired, the skill instructs the agent to run the sign-in process, obtain a new token, and retry.


The GraphQL entry points

On the server side, the two public search fields are small wrappers around the same internal function:

@strawberry.field
async def ai_search_buyers(self, info: Info, query: str) -> Optional[JSON]:
    try:
        return await search_buyers_or_professionals(info, query, "Buyer")
    except Exception:
        return False

@strawberry.field
async def ai_search_professionals(self, info: Info, query: str) -> Optional[JSON]:
    try:
        return await search_buyers_or_professionals(info, query, "Professional")
    except Exception:
        return False
Enter fullscreen mode Exit fullscreen mode

The only difference is the perspective argument:

  • "Buyer" means we search for users as buyers, including employers or clients.
  • "Professional" means we search for users as professionals, including freelancers and employees.

This distinction is not cosmetic. The same human can be both a buyer and a professional. A founder may want to hire developers, while also being discoverable as a product consultant. These two identities should not share the same matching context.

That is why each user has two external candidate IDs:

professional_id VARCHAR(50) UNIQUE DEFAULT gen_random_uuid()::text,
buyer_id        VARCHAR(50) UNIQUE DEFAULT gen_random_uuid()::text
Enter fullscreen mode Exit fullscreen mode

When a user is returned as a professional, the API returns professional_id as candidate_id.

When a user is returned as a buyer, the API returns buyer_id as candidate_id.

The agent receives a unified candidate_id and does not need to know which internal column was used.


The data model

The search function touches five tables:

  1. users
  2. logins
  3. tags
  4. impressions
  5. impression_tags

Here is the simplified relationship:

The core idea is that the user's profile is not just a display profile written for humans.

It is a set of impressions written for AI agents to search and reason over.

An impression is a structured statement about a user's expertise, collaboration style, communication preference, leadership style, taste, or requirements.

For example, a professional's impression might say:

This developer prefers projects where technical architecture decisions are made explicitly. They value strict type definitions, maintainable interfaces, and long-term code evolution over quick prototypes.

Each impression is associated with 1 to 5 tags:

impressions_with_tags_format = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "impression": {"type": "string"},
            "tags": {
                "type": "array",
                "items": {"type": "string"},
                "maxItems": 5,
            },
        },
        "required": ["impression", "tags"],
        "additionalProperties": False,
    },
}
Enter fullscreen mode Exit fullscreen mode

The tags are not used as a traditional keyword system. They are embedded into vector space and used as a lightweight semantic recall layer.

The full search pipeline looks like this:

  1. The agent calls ai_search_buyers or ai_search_professionals.
  2. The GraphQL resolver receives the request.
  3. Check whether an authenticated user_id exists.
    • If no: return None, prompting the agent to re-authenticate.
    • If yes: proceed to the next step.
  4. Vectorize the natural-language query.
  5. Search tags using pgvector cosine distance.
  6. Keep tags with distance <= 0.55, limiting to 100 tags.
  7. Map matched tags back to users via the impression_tags join table.
  8. Keep only public impressions.
  9. Keep only users active in the last month.
  10. Score users by summing the cube of tag similarity (similarity³).
  11. Take the top 50 users using heapq.nlargest.
  12. Exclude the current user from the results.
  13. Rerank each candidate's impressions using a LATERAL JOIN.
  14. Return the candidate's name, badge, candidate_id, description, and impressions.

Step 1: Auth guard

The internal function begins with an authentication guard:

async def search_buyers_or_professionals(
    info: Info,
    query: str,
    perspective: str,
) -> Optional[Union[list[dict], bool]]:

    if logged_user_id := info.context["user_id"]:
        if len(embedding_models) == 1:
            embeddings = await vectorize_contents(embedding_models[0], [query])
            # ... tag-level recall, cubic scoring, and LATERAL JOIN reranking ...
            # if candidates found, returns list[dict] here
        elif len(embedding_models) == 2:
            embeddings = await vectorize_contents(embedding_models[0], [query])
            other_embeddings = await vectorize_contents(embedding_models[1], [query])
            # ... tag-level recall, cubic scoring, and LATERAL JOIN reranking ...
            # if candidates found, returns list[dict] here
        return []  # executed only when no candidates matched
    return None
Enter fullscreen mode Exit fullscreen mode

In the QuestMeet GraphQL service, info.context["user_id"] is populated after the access token is verified.

If it is missing, the function returns None.

The server does not attempt to redirect or refresh the token. It only tells the agent:

You are not authenticated for this operation.

The skill then instructs the agent to run the sign-in process again, store the new access token in long-term memory or a local .txt file, and retry the original process.

This keeps the backend simple and makes the agent responsible for workflow recovery.


Step 2: Vectorize the query

The user request arrives as natural language:

Find remote Rust freelancers who have experience with early-stage infrastructure products.
Enter fullscreen mode Exit fullscreen mode

Before querying PostgreSQL, the server embeds it:

embeddings = await vectorize_contents(embedding_models[0], [query])
Enter fullscreen mode Exit fullscreen mode

The database schema uses 1536-dimensional vectors:

odd_embedding  vector(1536),
even_embedding vector(1536)
Enter fullscreen mode Exit fullscreen mode

The search function then uses the query embedding to search semantically related tags and impressions.


Step 3: Tag-level semantic recall

The first database query searches the tags table:

SELECT tag_id, distance
FROM (
    SELECT tag_id, ({embedding_column} <=> $1::vector) AS distance
    FROM tags
) AS tag_distances
WHERE distance <= 0.55
ORDER BY distance ASC
LIMIT 100
Enter fullscreen mode Exit fullscreen mode

The <=> operator is pgvector's cosine distance operator.

Cosine distance is:

distance = 1 - cosine_similarity
Enter fullscreen mode Exit fullscreen mode

So:

similarity = 1.0 - distance
Enter fullscreen mode Exit fullscreen mode

The filter:

WHERE distance <= 0.55
Enter fullscreen mode Exit fullscreen mode

means:

cosine similarity >= 0.45
Enter fullscreen mode Exit fullscreen mode

This is intentionally not too strict.

If the threshold is too high, the search becomes brittle and misses useful matches. If it is too low, the search lets in too much noise. In this system, tag search is the recall stage, so the threshold should be broad enough to catch potentially relevant concepts while still filtering out unrelated tags.

The LIMIT 100 prevents broad queries from pulling too many tags into the next stage.

The tags table has HNSW indexes on both embedding columns:

CREATE INDEX i_tags_odd_embedding
ON tags
USING hnsw (odd_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

CREATE INDEX i_tags_even_embedding
ON tags
USING hnsw (even_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);
Enter fullscreen mode Exit fullscreen mode

The goal is to make semantic tag recall fast even as the tag vocabulary grows.


Step 4: Map tags back to active public users

After retrieving semantically related tags, the server maps them back to users through impression_tags:

tag_distances = {
    row["tag_id"]: float(row["distance"])
    for row in rows
}

tag_ids = list(tag_distances.keys())
Enter fullscreen mode Exit fullscreen mode

Then:

SELECT impression_tags.tag_id, impression_tags.user_id
FROM impression_tags
JOIN (
    SELECT logins.user_id
    FROM logins
    WHERE logins.updated_at >= NOW() - '1 month'::interval
    GROUP BY logins.user_id
) AS active_users
ON impression_tags.user_id = active_users.user_id
WHERE impression_tags.tag_id = ANY($1::bigint[])
  AND impression_tags.is_public = TRUE
Enter fullscreen mode Exit fullscreen mode

This query does three things.

1. It uses impression_tags as an inverted index

impression_tags connects impressions and tags.

If a tag such as "Rust" or "Type Safety" is semantically close to the query, the join table tells us which users have public impressions associated with that tag.

2. It only searches public impressions

impression_tags.is_public = TRUE
Enter fullscreen mode Exit fullscreen mode

A user may have private impressions used only for self-reflection or agent memory. Those should not be discoverable by other users' agents. This check acts as the first layer of filtering during tag-level recall.

3. It filters for recent activity

logins.updated_at >= NOW() - '1 month'::interval
Enter fullscreen mode Exit fullscreen mode

A matching profile is only useful if the person is still active.

Traditional candidate-search products often surface abandoned profiles. For agent-driven collaboration, that wastes the user's time. This query only keeps users who have logged in within the last month.

The relevant index is:

CREATE INDEX i_impression_tags_tag_user_public
ON impression_tags (tag_id, user_id)
WHERE is_public IS TRUE;
Enter fullscreen mode Exit fullscreen mode

This partial index keeps the public-tag reverse lookup efficient.


Step 5: Cubic similarity scoring

For every matched (tag_id, user_id) pair, the server converts cosine distance back to similarity:

similarity = 1.0 - tag_distances[row["tag_id"]]
Enter fullscreen mode Exit fullscreen mode

Then it adds the cube of the similarity to the user's score:

record_id = row["user_id"]

if record_id in record_scores:
    record_scores[record_id] += similarity * similarity * similarity
else:
    record_scores[record_id] = similarity * similarity * similarity
Enter fullscreen mode Exit fullscreen mode

Why cube the similarity?

Because weak semantic matches should not dominate the ranking just because there are many of them.

After the distance <= 0.55 filter, similarity is in the range:

[0.45, 1.0]
Enter fullscreen mode Exit fullscreen mode

Compare linear scoring with cubic scoring:

Linear score Cubic score
0.99 0.970299
0.90 0.729
0.70 0.343
0.45 0.091125

A very strong tag match remains strong. A weak-but-acceptable tag match becomes much less important.

This matters because tags are not hand-written search keywords. They are concepts extracted from impressions. A user may have several broad tags that are not wrong but should not make them rank above someone with fewer, stronger, more specific matches.

For example, if the query is:

TypeScript type-safety architecture consultant
Enter fullscreen mode Exit fullscreen mode

A candidate with strong matches on:

  • TypeScript
  • Type Safety
  • Software Architecture

should outrank someone who only has broad tags like:

  • JavaScript
  • Web Development
  • Frontend

Cubic scoring makes that more likely.


Step 6: Keep the top 50 candidates

After scoring users, the server selects the top 50 candidates:

user_ids = [
    user_id
    for user_id, _ in heapq.nlargest(
        50,
        record_scores.items(),
        key=lambda item: item[1],
    )
    if user_id != logged_user_id
]
Enter fullscreen mode Exit fullscreen mode

There are two details here.

First, heapq.nlargest(50, ...) avoids fully sorting the entire candidate set. For large candidate sets, this is cheaper than sorting everything and slicing.

Second, the current user is excluded, so the final candidate count may be fewer than 50.

The top-50 list is not the final set of results. It is a small candidate set for the next stage: impression-level reranking.


Step 7: Impression-level reranking with LATERAL JOIN

Tag-level recall is fast and broad, but the final results should be based on the actual impression text.

This is where LATERAL JOIN is useful.

SELECT
    users.{id_column},
    users.name,
    users.description,
    users.badge,
    string_agg('- ' || content, E'\n')
FROM unnest($1::bigint[]) AS user_ids(user_id)
JOIN users
ON users.user_id = user_ids.user_id
JOIN LATERAL (
    SELECT content
    FROM (
        SELECT
            content,
            ({embedding_column} <=> $3::vector) AS distance
        FROM impressions
        WHERE user_id = users.user_id
          AND perspective = $2
          AND is_public = TRUE
    ) AS impressions
    WHERE distance <= 0.55
    ORDER BY distance ASC
    LIMIT 10
) AS impressions ON true
GROUP BY users.{id_column}, users.name, users.description, users.badge
Enter fullscreen mode Exit fullscreen mode

This query does a lot.

unnest turns the Python list into rows

The server passes the top candidate user IDs as an array.

unnest($1::bigint[]) AS user_ids(user_id)
Enter fullscreen mode Exit fullscreen mode

This lets PostgreSQL process all candidates in one query instead of making 50 separate round trips.

LATERAL JOIN runs a per-user subquery

A normal subquery cannot reference columns from the outer query. A lateral subquery can.

That means this part:

WHERE user_id = users.user_id
Enter fullscreen mode Exit fullscreen mode

runs independently for each candidate user.

For every candidate, the database selects that user's most relevant public impressions under the requested perspective.

perspective prevents identity leakage

AND perspective = $2
Enter fullscreen mode Exit fullscreen mode

If the user is being searched as a professional, only professional impressions are used. If the user is being searched as a buyer, only buyer impressions are used.

This prevents a user's hiring preferences from affecting their ranking as a freelancer, and vice versa. The is_public check here serves as the second layer, ensuring that only explicitly public impression content enters the final result.

distance <= 0.55 is the second quality gate

A user may enter the top 50 because of relevant tags.

But if none of their actual impression texts are close enough to the query, the lateral join returns no rows for that user.

Because this is a LATERAL subquery joined with ON true, if no impressions match, the subquery produces no rows and the candidate is naturally filtered out.

This gives the search engine two quality gates:

  1. Tag-level semantic recall.
  2. Impression-level semantic verification.

LIMIT 10 keeps the results agent-friendly

The backend returns at most 10 relevant impressions per candidate.

That is enough context for the agent to explain the match and draft a proposal, but not so much that the agent's context becomes noisy.

The impressions are collected into a Markdown-style list:

string_agg('- ' || content, E'\n')
Enter fullscreen mode Exit fullscreen mode

So the final impressions field looks like this:

- This developer has experience designing backend systems for early-stage SaaS products.
- They prefer collaboration where technical trade-offs are discussed explicitly.
- They value maintainable interfaces and strict type definitions.
Enter fullscreen mode Exit fullscreen mode

This format is intentionally simple. The consumer is an AI agent, not a frontend table component. A compact Markdown list is easy for the agent to read and reason over.


Step 8: Return an agent-readable payload

The server then returns:

id_column = "professional_id" if perspective == "Professional" else "buyer_id"

return [
    {
        "name": row["name"],
        "badge": row["badge"],
        "candidate_id": row[id_column],
        "description": row["description"],
        "impressions": row["string_agg"],
    }
    for row in rows
]
Enter fullscreen mode Exit fullscreen mode

The final payload contains:

Field Purpose
name Display name
badge Subscription or trust-related marker
candidate_id Identity-specific contact ID
description General user description
impressions Up to 10 query-relevant impression statements

The agent can now decide which candidates are worth showing to the user.

It can say:

Candidate A looks relevant because their impressions mention early-stage SaaS architecture, backend reliability, and collaboration with technical founders.

Then, if the user wants to contact the candidate, the agent drafts:

  • a collaboration proposal
  • potential benefits for the candidate

After user confirmation, the agent calls:

ai_contact_candidate(access_token, candidate_id, proposal, benefits)
Enter fullscreen mode Exit fullscreen mode

So the backend is not responsible for writing persuasive messages. It is responsible for retrieving semantically dense, high-signal candidate context.

In other words:

The server finds the right people.

The agent says the right thing.


Handling two embedding models

The production code also supports a two-model mode:

elif len(embedding_models) == 2:
    embeddings = await vectorize_contents(embedding_models[0], [query])
    other_embeddings = await vectorize_contents(embedding_models[1], [query])

    other_embedding_column = (
        "even_embedding"
        if embedding_column == "odd_embedding"
        else "odd_embedding"
    )
Enter fullscreen mode Exit fullscreen mode

The schema has two embedding columns:

odd_embedding  vector(1536),
even_embedding vector(1536)
Enter fullscreen mode Exit fullscreen mode

with a constraint that exactly one is present:

CONSTRAINT check_odd_or_even_embedding
CHECK ((odd_embedding IS NULL) <> (even_embedding IS NULL))
Enter fullscreen mode Exit fullscreen mode

When two embedding models are configured, the query is embedded with both models. The server searches both vector columns:

for embedding, column in [
    (embeddings[0], embedding_column),
    (other_embeddings[0], other_embedding_column),
]:
    query = f"""
    SELECT tag_id, distance
    FROM (
        SELECT tag_id, ({column} <=> $1::vector) AS distance
        FROM tags
    ) AS tag_distances
    WHERE distance <= 0.55
    ORDER BY distance ASC
    LIMIT 100
    """
Enter fullscreen mode Exit fullscreen mode

The scoring logic remains the same. Matches from both embedding spaces contribute to the same record_scores dictionary.

The impression reranking query also searches both columns using UNION ALL:

SELECT content, ({embedding_column} <=> $3::vector) AS distance
FROM impressions
WHERE user_id = users.user_id
  AND perspective = $2
  AND is_public = TRUE

UNION ALL

SELECT content, ({other_embedding_column} <=> $4::vector) AS distance
FROM impressions
WHERE user_id = users.user_id
  AND perspective = $2
  AND is_public = TRUE
Enter fullscreen mode Exit fullscreen mode

Then the combined results are filtered:

WHERE distance <= 0.55
ORDER BY distance ASC
LIMIT 10
Enter fullscreen mode Exit fullscreen mode

This makes the search function tolerant of data encoded with either embedding model.

That is useful when rotating embedding models, migrating old vectors, or operating with more than one embedding source during a transition period.


Why not search all impressions globally?

A simpler version of this system would be:

Embed the query and run a global vector search over all impressions.

That sounds attractive, but it has drawbacks.

Impressions are longer and more numerous than tags. A single user may have many impressions, and each impression is a rich semantic statement. Searching the entire impression table globally would make the expensive part of the pipeline happen too early.

Instead, this system uses a two-stage approach:

The tag layer is a lightweight proxy for semantic recall.

The impression layer is used only after the candidate set has been reduced to a small number of users.

This gives the system a useful split:

Stage Purpose Design
Tag recall High recall, low cost HNSW vector search over tags
User scoring Rank likely candidates Sum of similarity³
Impression rerank High precision Per-user semantic filtering with LATERAL JOIN

This is not just a performance optimization. It also improves the quality of results.

Tags help the system identify relevant candidates. Impressions help the system explain why they are a good fit.


Relevant indexes

Here are the indexes most relevant to this search path.

For tag vector search:

CREATE INDEX i_tags_odd_embedding
ON tags
USING hnsw (odd_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

CREATE INDEX i_tags_even_embedding
ON tags
USING hnsw (even_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);
Enter fullscreen mode Exit fullscreen mode

For active-user filtering:

CREATE INDEX i_logins_updated_at_user_id
ON logins (updated_at, user_id);
Enter fullscreen mode Exit fullscreen mode

For public tag-to-user lookup:

CREATE INDEX i_impression_tags_tag_user_public
ON impression_tags (tag_id, user_id)
WHERE is_public IS TRUE;
Enter fullscreen mode Exit fullscreen mode

For per-user impression lookup:

CREATE INDEX i_impressions_user_perspective
ON impressions (user_id, perspective);
Enter fullscreen mode Exit fullscreen mode

The impressions table is also partitioned by user_id:

CREATE TABLE impressions (
    impression_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
    user_id BIGINT NOT NULL,
    perspective VARCHAR(50) NOT NULL,
    content TEXT NOT NULL,
    odd_embedding vector(1536),
    even_embedding vector(1536),
    is_public BOOLEAN NOT NULL,
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, impression_id)
) PARTITION BY RANGE (user_id);
Enter fullscreen mode Exit fullscreen mode

This helps keep per-user impression retrieval predictable as the table grows.


The full mental model

Here is the full search pipeline, step by step:

  1. The agent sends a natural-language query.
  2. Embed the query into a vector.
  3. Find semantically close tags using pgvector cosine distance.
  4. Keep tags with distance <= 0.55, limiting to 100 tags.
  5. Reverse-lookup users via the impression_tags join table.
  6. Filter to include only public impressions.
  7. Filter to include only users active in the last month.
  8. Compute each user's score as the sum of similarity³.
  9. Take the top 50 users.
  10. Exclude the current user.
  11. For each remaining user, run a LATERAL JOIN against their impressions.
  12. Keep only impressions matching the requested perspective (Buyer or Professional).
  13. Keep only impressions with distance <= 0.55.
  14. Return up to 10 impressions per user.
  15. The agent receives the results, recommends candidates to the user, and drafts collaboration proposals.

Engineering takeaways

Here are the main design lessons from building this.

1. AI-agent APIs should return reasoning context, not just records

A normal search API might return IDs and display fields. An AI-agent search API should return evidence. That is why the response includes query-relevant impressions. The agent needs them to explain the match and write a personalized message.

2. Semantic tags are a useful middle layer

Raw keyword search is too brittle. Global impression vector search can be too expensive too early. Semantic tags give the system a compact recall layer between keywords and full documents.

3. Weak semantic matches should not dominate rankings

Linear scoring makes it easier for many weak matches to overpower fewer strong matches. Cubic scoring is a simple way to let strong matches dominate without completely discarding weaker supporting signals.

4. Separate identity perspectives matter

The same person can be a buyer in one context and a professional in another. Mixing these impressions would create strange matches. Keeping buyer_id, professional_id, and perspective separate makes the search context cleaner.

5. Auth failure should be part of the function contract

Returning None for expired authentication lets the agent recover by running the sign-in process. Returning False for other failures tells the agent not to blindly retry. For agent workflows, these distinctions matter.


Closing

Opportunity Skill is built around a simple belief:

In the AI-agent era, your profile should not only be readable by humans. It should be searchable, interpretable, and actionable by agents.

The search function described here is one part of that system. It turns a natural-language request into:

  • semantically matched candidates
  • identity-aware candidate IDs
  • relevant profile evidence
  • compact context that an AI agent can use to draft a proposal

If you want to try the skill, you can ask your agent to install it from:

https://github.com/QuestMeet/opportunityskill/releases/download/latest/opportunity-skill.zip
Enter fullscreen mode Exit fullscreen mode

Top comments (0)