Siyu

Posted on May 11 • Edited on May 15

Architecting a Two-Stage Semantic Search Pipeline with HNSW, LATERAL JOIN, and Cubic Scoring

#postgres #agentskills #architecture #discuss

AI agents are becoming a new interface for finding people.

Instead of opening a marketplace, typing keywords, filtering profiles, and manually deciding who is worth contacting, a user can now say something like:

Help me find a few SaaS founders who might need my backend architecture services.

or:

Find remote Rust freelancers who have experience with early-stage infrastructure products.

In Opportunity Skill, the user's AI agent turns that request into a semantic search query, calls the QuestMeet backend, receives a compact list of matched candidates, and then drafts tailored collaboration proposals for the user to approve.

This post is a technical walkthrough of the backend search function behind that flow.

The interesting part is not just "use embeddings". The search engine combines:

PostgreSQL
pgvector cosine distance
HNSW indexes
tag-level semantic recall
active-user filtering
cubic similarity scoring
LATERAL JOIN impression reranking
separate buyer/professional identity perspectives

The goal is simple:

Given a natural-language request from an AI agent, return the candidates worth contacting, together with enough semantic context for the agent to explain the match and write a good proposal.

What Opportunity Skill does

Opportunity Skill is an Agent Skill that makes a user discoverable to other agents. It supports agent products that follow the Skill specification, such as Claude Code and OpenClaw. The user does not need to manually browse a website. The agent calls functions exposed by the skill.

At a high level, the skill has four processes:

The search engine discussed in this article powers the Search and Contact process.

When the user asks the agent to find buyers or professionals, the agent calls one of these functions:

ai_search_buyers
ai_search_professionals

These functions are implemented in the skill's scripts/callable_functions.py file and communicate with QuestMeet through GraphQL.

A simplified version looks like this:

import httpx
from typing import Union

BASE_URL = "https://questmeet.ai/graphql"

def ai_search_professionals(access_token: str, query: str) -> Union[list, bool, None]:
    try:
        response = httpx.post(
            BASE_URL,
            json={
                "query": """
                    query AiSearchProfessionals($query: String!) {
                        aiSearchProfessionals(query: $query)
                    }
                """,
                "variables": {"query": query},
            },
            headers={"Authorization": f"Bearer {access_token}"},
            trust_env=False,
            timeout=20,
        )
        return response.json()["data"]["aiSearchProfessionals"]
    except Exception:
        return False

The return values have clear semantics for the agent:

Return value	Meaning
`list[dict]`	Relevant candidates were found
`[]`	The request succeeded, but no relevant candidates were found
`None`	The access token is missing or expired; the agent should re-authenticate
`False`	Something failed; notify the user and stop

This is important because the agent, not the server, owns the workflow. If the token is expired, the skill instructs the agent to run the sign-in process, obtain a new token, and retry.

The GraphQL entry points

On the server side, the two public search fields are small wrappers around the same internal function:

@strawberry.field
async def ai_search_buyers(self, info: Info, query: str) -> Optional[JSON]:
    try:
        return await search_buyers_or_professionals(info, query, "Buyer")
    except Exception:
        return False

@strawberry.field
async def ai_search_professionals(self, info: Info, query: str) -> Optional[JSON]:
    try:
        return await search_buyers_or_professionals(info, query, "Professional")
    except Exception:
        return False

The only difference is the perspective argument:

"Buyer" means we search for users as buyers, including employers or clients.
"Professional" means we search for users as professionals, including freelancers and employees.

This distinction is not cosmetic. The same human can be both a buyer and a professional. A founder may want to hire developers, while also being discoverable as a product consultant. These two identities should not share the same matching context.

That is why each user has two external candidate IDs:

professional_id VARCHAR(50) UNIQUE DEFAULT gen_random_uuid()::text,
buyer_id        VARCHAR(50) UNIQUE DEFAULT gen_random_uuid()::text

When a user is returned as a professional, the API returns professional_id as candidate_id.

When a user is returned as a buyer, the API returns buyer_id as candidate_id.

The agent receives a unified candidate_id and does not need to know which internal column was used.

The data model

The search function touches five tables:

users
logins
tags
impressions
impression_tags

Here is the simplified relationship:

The core idea is that the user's profile is not just a display profile written for humans.

It is a set of impressions written for AI agents to search and reason over.

An impression is a structured statement about a user's expertise, collaboration style, communication preference, leadership style, taste, or requirements.

For example, a professional's impression might say:

This developer prefers projects where technical architecture decisions are made explicitly. They value strict type definitions, maintainable interfaces, and long-term code evolution over quick prototypes.

Each impression is associated with 1 to 5 tags:

impressions_with_tags_format = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "impression": {"type": "string"},
            "tags": {
                "type": "array",
                "items": {"type": "string"},
                "maxItems": 5,
            },
        },
        "required": ["impression", "tags"],
        "additionalProperties": False,
    },
}

The tags are not used as a traditional keyword system. They are embedded into vector space and used as a lightweight semantic recall layer.

The full search pipeline looks like this:

The agent calls ai_search_buyers or ai_search_professionals.
The GraphQL resolver receives the request.
Check whether an authenticated user_id exists.
- If no: return None, prompting the agent to re-authenticate.
- If yes: proceed to the next step.
Vectorize the natural-language query.
Search tags using pgvector cosine distance.
Keep tags with distance <= 0.4, limiting to 100 tags.
Map matched tags back to users via the impression_tags join table.
Keep only public impressions.
Keep only users active in the last month.
Score users by summing the cube of tag similarity (similarity³).
Take the top 100 users using heapq.nlargest.
Exclude the current user from the results.
Rerank each candidate's impressions using a LATERAL JOIN.
Keep only impressions with distance <= 0.28 and return the candidate's name, badges, candidate_id, description, and impressions with their creation dates.

Step 1: Auth guard

The internal function begins with an authentication guard:

async def search_buyers_or_professionals(
    info: Info,
    query: str,
    perspective: str,
) -> Optional[JSON]:

    if logged_user_id := info.context["user_id"]:
        if len(embedding_models) == 1 and (embeddings := await vectorize_contents(embedding_models[0], [query])):
            # ... tag-level recall, cubic scoring, and LATERAL JOIN reranking ...
            # if candidates found, returns list[dict] here
        elif len(embedding_models) == 2 and (embeddings := await vectorize_contents(embedding_models[0], [query])) and (other_embeddings := await vectorize_contents(embedding_models[1], [query])):
            # ... tag-level recall, cubic scoring, and LATERAL JOIN reranking ...
            # if candidates found, returns list[dict] here
        return []  # executed only when no candidates matched
    return None

In the QuestMeet GraphQL service, info.context["user_id"] is populated after the access token is verified.

If it is missing, the function returns None.

The server does not attempt to redirect or refresh the token. It only tells the agent:

You are not authenticated for this operation.

The skill then instructs the agent to run the sign-in process again, store the new access token in long-term memory or a local .txt file, and retry the original process.

This keeps the backend simple and makes the agent responsible for workflow recovery.

Step 2: Vectorize the query

The user request arrives as natural language:

Find remote Rust freelancers who have experience with early-stage infrastructure products.

Before querying PostgreSQL, the server embeds it:

embeddings = await vectorize_contents(embedding_models[0], [query])

The database schema uses 1536-dimensional vectors:

odd_embedding  vector(1536),
even_embedding vector(1536)

The search function then uses the query embedding to search semantically related tags and impressions.

Step 3: Tag-level semantic recall

The first database query searches the tags table:

SELECT tag_id, distance
FROM (
    SELECT tag_id, ({embedding_column} <=> $1::vector) AS distance
    FROM tags
) AS tag_distances
WHERE distance <= 0.4
ORDER BY distance ASC
LIMIT 100

The <=> operator is pgvector's cosine distance operator.

Cosine distance is:

distance = 1 - cosine_similarity

So:

similarity = 1.0 - distance

The filter:

WHERE distance <= 0.4

means:

cosine similarity >= 0.6

This is intentionally not too strict.

If the threshold is too high, the search becomes brittle and misses useful matches. If it is too low, the search lets in too much noise. In this system, tag search is the recall stage, so the threshold should be broad enough to catch potentially relevant concepts while still filtering out unrelated tags.

The LIMIT 100 prevents broad queries from pulling too many tags into the next stage.

The tags table has HNSW indexes on both embedding columns:

CREATE INDEX i_tags_odd_embedding
ON tags
USING hnsw (odd_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

CREATE INDEX i_tags_even_embedding
ON tags
USING hnsw (even_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

The goal is to make semantic tag recall fast even as the tag vocabulary grows.

Step 4: Map tags back to active public users

After retrieving semantically related tags, the server maps them back to users through impression_tags:

tag_distances = {
    row["tag_id"]: float(row["distance"])
    for row in rows
}

tag_ids = list(tag_distances.keys())

Then:

SELECT impression_tags.tag_id, impression_tags.user_id
FROM impression_tags
JOIN (
    SELECT logins.user_id
    FROM logins
    WHERE logins.updated_at >= NOW() - '1 month'::interval
    GROUP BY logins.user_id
) AS active_users
ON impression_tags.user_id = active_users.user_id
WHERE impression_tags.tag_id = ANY($1::bigint[])
  AND impression_tags.is_public = TRUE

This query does three things.

1. It uses `impression_tags` as an inverted index

impression_tags connects impressions and tags.

If a tag such as "Rust" or "Type Safety" is semantically close to the query, the join table tells us which users have public impressions associated with that tag.

2. It only searches public impressions

impression_tags.is_public = TRUE

A user may have private impressions used only for self-reflection or agent memory. Those should not be discoverable by other users' agents. This check acts as the first layer of filtering during tag-level recall.

3. It filters for recent activity

logins.updated_at >= NOW() - '1 month'::interval

A matching profile is only useful if the person is still active.

Traditional candidate-search products often surface abandoned profiles. For agent-driven collaboration, that wastes the user's time. This query only keeps users who have logged in within the last month.

The relevant index is:

CREATE INDEX i_impression_tags_tag_user_public
ON impression_tags (tag_id, user_id)
WHERE is_public IS TRUE;

This partial index keeps the public-tag reverse lookup efficient.

Step 5: Cubic similarity scoring

For every matched (tag_id, user_id) pair, the server converts cosine distance back to similarity:

similarity = 1.0 - tag_distances[row["tag_id"]]

Then it adds the cube of the similarity to the user's score:

record_id = row["user_id"]

if record_id in record_scores:
    record_scores[record_id] += similarity * similarity * similarity
else:
    record_scores[record_id] = similarity * similarity * similarity

Why cube the similarity?

Because weak semantic matches should not dominate the ranking just because there are many of them.

After the distance <= 0.4 filter, similarity is in the range:

[0.6, 1.0]

Compare linear scoring with cubic scoring:

Linear score	Cubic score
0.99	0.970299
0.90	0.729
0.80	0.512
0.70	0.343
0.60	0.216

A very strong tag match remains strong. A weak-but-acceptable tag match becomes much less important.

This matters because tags are not hand-written search keywords. They are concepts extracted from impressions. A user may have several broad tags that are not wrong but should not make them rank above someone with fewer, stronger, more specific matches.

For example, if the query is:

TypeScript type-safety architecture consultant

A candidate with strong matches on:

TypeScript
Type Safety
Software Architecture

should outrank someone who only has broad tags like:

JavaScript
Web Development
Frontend

Cubic scoring makes that more likely.

Step 6: Keep the top 100 candidates

After scoring users, the server selects the top 100 candidates:

user_ids = [
    user_id
    for user_id, _ in heapq.nlargest(
        100,
        record_scores.items(),
        key=lambda item: item[1],
    )
    if user_id != logged_user_id
]

There are two details here.

First, heapq.nlargest(100, ...) avoids fully sorting the entire candidate set. For large candidate sets, this is cheaper than sorting everything and slicing.

Second, the current user is excluded, so the final candidate count may be fewer than 100.

The top-100 list is not the final set of results. It is a small candidate set for the next stage: impression-level reranking.

Step 7: Impression-level reranking with LATERAL JOIN

Tag-level recall is fast and broad, but the final results should be based on the actual impression text.

This is where LATERAL JOIN is useful.

SELECT
    users.{id_column},
    users.name,
    users.description,
    users.badges,
    string_agg('- ' || impressions.content || ' ' || impressions.created_at::date, E'\n')
FROM unnest($1::bigint[]) AS user_ids(user_id)
JOIN users
ON users.user_id = user_ids.user_id AND users.is_shadow = FALSE
JOIN LATERAL (
    SELECT content, created_at
    FROM (
        SELECT
            content,
            created_at,
            ({embedding_column} <=> $3::vector) AS distance
        FROM impressions
        WHERE user_id = users.user_id
          AND perspective = $2
          AND is_public = TRUE
    ) AS impressions
    WHERE distance <= 0.28
    ORDER BY distance ASC
    LIMIT 10
) AS impressions ON true
GROUP BY users.{id_column}, users.name, users.description, users.badges

This query does a lot.

`unnest` turns the Python list into rows

The server passes the top candidate user IDs as an array.

unnest($1::bigint[]) AS user_ids(user_id)

This lets PostgreSQL process all candidates in one query instead of making 100 separate round trips.

`AND users.is_shadow IS FALSE` handles multi-region placeholders

To comply with regional data-residency requirements, the database uses a multi-node architecture where foreign key placeholders (shadow data) exist across nodes. This condition ensures that only actual, localized user records are included, preventing placeholder data from leaking into search results.

`LATERAL JOIN` runs a per-user subquery

A normal subquery cannot reference columns from the outer query. A lateral subquery can.

That means this part:

WHERE user_id = users.user_id

runs independently for each candidate user.

For every candidate, the database selects that user's most relevant public impressions under the requested perspective.

`perspective` prevents identity leakage

AND perspective = $2

If the user is being searched as a professional, only professional impressions are used. If the user is being searched as a buyer, only buyer impressions are used.

This prevents a user's hiring preferences from affecting their ranking as a freelancer, and vice versa. The is_public check here serves as the second layer, ensuring that only explicitly public impression content enters the final result.

`distance <= 0.28` is the second quality gate

A user may enter the top candidate set because of relevant tags.

But if none of their actual impression texts are close enough to the query, the lateral join returns no rows for that user.

Because this is a LATERAL subquery joined with ON true, if no impressions match, the subquery produces no rows and the candidate is naturally filtered out.

This gives the search engine two quality gates:

Tag-level semantic recall.
Impression-level semantic verification.

`LIMIT 10` keeps the results agent-friendly

The backend returns at most 10 relevant impressions per candidate.

That is enough context for the agent to explain the match and draft a proposal, but not so much that the agent's context becomes noisy.

The impressions and their creation dates are collected into a Markdown-style list:

string_agg('- ' || impressions.content || ' ' || impressions.created_at::date, E'\n')

So the final impressions field looks like this:

- This developer has experience designing backend systems for early-stage SaaS products. 2026-03-12
- They prefer collaboration where technical trade-offs are discussed explicitly. 2026-03-15
- They value maintainable interfaces and strict type definitions. 2026-04-01

This format is intentionally simple. The consumer is an AI agent, not a frontend table component. A compact Markdown list is easy for the agent to read and reason over.

Step 8: Return an agent-readable payload

The server then returns:

id_column = "professional_id" if perspective == "Professional" else "buyer_id"

return [
    {
        "name": row["name"],
        "badges": row["badges"],
        "candidate_id": row[id_column],
        "description": row["description"],
        "impressions": row["string_agg"],
    }
    for row in rows
]

The final payload contains:

Field	Purpose
`name`	Display name
`badges`	An array of subscription or trust-related markers
`candidate_id`	Identity-specific contact ID
`description`	General user description
`impressions`	Up to 10 query-relevant impression statements, each followed by its creation date

The agent can now decide which candidates are worth showing to the user.

It can say:

Candidate A looks relevant because their impressions mention early-stage SaaS architecture, backend reliability, and collaboration with technical founders.

Then, if the user wants to contact the candidate, the agent drafts:

a collaboration proposal
potential benefits for the candidate

After user confirmation, the agent calls:

ai_contact_candidate(access_token, candidate_id, proposal, benefits)

So the backend is not responsible for writing persuasive messages. It is responsible for retrieving semantically dense, high-signal candidate context.

In other words:

The server finds the right people.

The agent says the right thing.

Handling two embedding models

The production code also supports a two-model mode:

elif len(embedding_models) == 2:
    embeddings = await vectorize_contents(embedding_models[0], [query])
    other_embeddings = await vectorize_contents(embedding_models[1], [query])

    other_embedding_column = (
        "even_embedding"
        if embedding_column == "odd_embedding"
        else "odd_embedding"
    )

The schema has two embedding columns:

odd_embedding  vector(1536),
even_embedding vector(1536)

with a constraint that exactly one is present:

CONSTRAINT check_odd_or_even_embedding
CHECK ((odd_embedding IS NULL) <> (even_embedding IS NULL))

When two embedding models are configured, the query is embedded with both models. The server searches both vector columns:

for embedding, column in [
    (embeddings[0], embedding_column),
    (other_embeddings[0], other_embedding_column),
]:
    query = f"""
    SELECT tag_id, distance
    FROM (
        SELECT tag_id, ({column} <=> $1::vector) AS distance
        FROM tags
    ) AS tag_distances
    WHERE distance <= 0.4
    ORDER BY distance ASC
    LIMIT 100
    """

The scoring logic remains the same. Matches from both embedding spaces contribute to the same record_scores dictionary.

The impression reranking query also searches both columns using UNION ALL:

SELECT content, created_at, ({embedding_column} <=> $3::vector) AS distance
FROM impressions
WHERE user_id = users.user_id
  AND perspective = $2
  AND is_public = TRUE

UNION ALL

SELECT content, created_at, ({other_embedding_column} <=> $4::vector) AS distance
FROM impressions
WHERE user_id = users.user_id
  AND perspective = $2
  AND is_public = TRUE

Then the combined results are filtered:

WHERE distance <= 0.28
ORDER BY distance ASC
LIMIT 10

This makes the search function tolerant of data encoded with either embedding model.

That is useful when rotating embedding models, migrating old vectors, or operating with more than one embedding source during a transition period.

Why not search all impressions globally?

A simpler version of this system would be:

Embed the query and run a global vector search over all impressions.

That sounds attractive, but it has drawbacks.

Impressions are longer and more numerous than tags. A single user may have many impressions, and each impression is a rich semantic statement. Searching the entire impression table globally would make the expensive part of the pipeline happen too early.

Instead, this system uses a two-stage approach:

The tag layer is a lightweight proxy for semantic recall.

The impression layer is used only after the candidate set has been reduced to a small number of users.

This gives the system a useful split:

Stage	Purpose	Design
Tag recall	High recall, low cost	HNSW vector search over tags
User scoring	Rank likely candidates	Sum of `similarity³`
Impression rerank	High precision	Per-user semantic filtering with `LATERAL JOIN`

This is not just a performance optimization. It also improves the quality of results.

Tags help the system identify relevant candidates. Impressions help the system explain why they are a good fit.

Relevant indexes

Here are the indexes most relevant to this search path.

For tag vector search:

CREATE INDEX i_tags_odd_embedding
ON tags
USING hnsw (odd_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

CREATE INDEX i_tags_even_embedding
ON tags
USING hnsw (even_embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

For active-user filtering:

CREATE INDEX i_logins_updated_at_user_id
ON logins (updated_at, user_id);

For public tag-to-user lookup:

CREATE INDEX i_impression_tags_tag_user_public
ON impression_tags (tag_id, user_id)
WHERE is_public IS TRUE;

For per-user impression lookup:

CREATE INDEX i_impressions_user_perspective
ON impressions (user_id, perspective);

The impressions table is also partitioned by user_id:

CREATE TABLE impressions (
    impression_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
    user_id BIGINT NOT NULL,
    perspective VARCHAR(50) NOT NULL,
    content TEXT NOT NULL,
    odd_embedding vector(1536),
    even_embedding vector(1536),
    is_public BOOLEAN NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, impression_id)
) PARTITION BY RANGE (user_id);

This helps keep per-user impression retrieval predictable as the table grows.

The full mental model

Here is the full search pipeline, step by step:

The agent sends a natural-language query.
Embed the query into a vector.
Find semantically close tags using pgvector cosine distance.
Keep tags with distance <= 0.4, limiting to 100 tags.
Reverse-lookup users via the impression_tags join table.
Filter to include only public impressions.
Filter to include only users active in the last month.
Compute each user's score as the sum of similarity³.
Take the top 100 users.
Exclude the current user.
For each remaining user, run a LATERAL JOIN against their impressions.
Keep only impressions matching the requested perspective (Buyer or Professional), excluding shadow users.
Keep only impressions with distance <= 0.28.
Return up to 10 impressions (with creation dates) per user.
The agent receives the results, recommends candidates to the user, and drafts collaboration proposals.

Engineering takeaways

Here are the main design lessons from building this.

1. AI-agent APIs should return reasoning context, not just records

A normal search API might return IDs and display fields. An AI-agent search API should return evidence. That is why the response includes query-relevant impressions. The agent needs them to explain the match and write a personalized message.

2. Semantic tags are a useful middle layer

Raw keyword search is too brittle. Global impression vector search can be too expensive too early. Semantic tags give the system a compact recall layer between keywords and full documents.

3. Weak semantic matches should not dominate rankings

Linear scoring makes it easier for many weak matches to overpower fewer strong matches. Cubic scoring is a simple way to let strong matches dominate without completely discarding weaker supporting signals.

4. Separate identity perspectives matter

The same person can be a buyer in one context and a professional in another. Mixing these impressions would create strange matches. Keeping buyer_id, professional_id, and perspective separate makes the search context cleaner.

5. Auth failure should be part of the function contract

Returning None for expired authentication lets the agent recover by running the sign-in process. Returning False for other failures tells the agent not to blindly retry. For agent workflows, these distinctions matter.

Closing

Opportunity Skill is built around a simple belief:

In the AI-agent era, your profile should not only be readable by humans. It should be searchable, interpretable, and actionable by agents.

The search function described here is one part of that system. It turns a natural-language request into:

semantically matched candidates
identity-aware candidate IDs
relevant profile evidence
compact context that an AI agent can use to draft a proposal

If you want to try the skill, you can ask your agent to install it from:

https://github.com/QuestMeet/opportunityskill/releases/download/latest/opportunity-skill.zip

Another article in this series: Designing Conversational Infrastructure for AI Agents: Context Forking, Rate Limiting, and Identity Rotation

DEV Community

Architecting a Two-Stage Semantic Search Pipeline with HNSW, LATERAL JOIN, and Cubic Scoring

What Opportunity Skill does

The GraphQL entry points

The data model

Step 1: Auth guard

Step 2: Vectorize the query

Step 3: Tag-level semantic recall

Step 4: Map tags back to active public users

1. It uses `impression_tags` as an inverted index

2. It only searches public impressions

3. It filters for recent activity

Step 5: Cubic similarity scoring

Step 6: Keep the top 100 candidates

Step 7: Impression-level reranking with LATERAL JOIN

`unnest` turns the Python list into rows

`AND users.is_shadow IS FALSE` handles multi-region placeholders

`LATERAL JOIN` runs a per-user subquery

`perspective` prevents identity leakage

`distance <= 0.28` is the second quality gate

`LIMIT 10` keeps the results agent-friendly

Step 8: Return an agent-readable payload

Handling two embedding models

Why not search all impressions globally?

Relevant indexes

The full mental model

Engineering takeaways

1. AI-agent APIs should return reasoning context, not just records

2. Semantic tags are a useful middle layer

3. Weak semantic matches should not dominate rankings

4. Separate identity perspectives matter

5. Auth failure should be part of the function contract

Closing

Top comments (0)

What Opportunity Skill does

The GraphQL entry points

The data model

Step 1: Auth guard

Step 2: Vectorize the query

Step 3: Tag-level semantic recall

Step 4: Map tags back to active public users

1. It uses impression_tags as an inverted index

2. It only searches public impressions

3. It filters for recent activity

Step 5: Cubic similarity scoring

Step 6: Keep the top 100 candidates

Step 7: Impression-level reranking with LATERAL JOIN

unnest turns the Python list into rows

AND users.is_shadow IS FALSE handles multi-region placeholders

LATERAL JOIN runs a per-user subquery

perspective prevents identity leakage

distance <= 0.28 is the second quality gate

LIMIT 10 keeps the results agent-friendly

Step 8: Return an agent-readable payload

Handling two embedding models

Why not search all impressions globally?

Relevant indexes

The full mental model

Engineering takeaways

1. AI-agent APIs should return reasoning context, not just records

2. Semantic tags are a useful middle layer

3. Weak semantic matches should not dominate rankings

4. Separate identity perspectives matter

5. Auth failure should be part of the function contract

Closing

1. It uses `impression_tags` as an inverted index

`unnest` turns the Python list into rows

`AND users.is_shadow IS FALSE` handles multi-region placeholders

`LATERAL JOIN` runs a per-user subquery

`perspective` prevents identity leakage

`distance <= 0.28` is the second quality gate

`LIMIT 10` keeps the results agent-friendly