Artem

Posted on May 24

Building Production-Ready Semantic Search with Python and Snowflake Cortex

#python #snowflake #ai #backend

Recently I has been given a task to implement AI powered semantic search for our catalogue and as we are already using snowflake we decided to implement this feature using Cortex Search Service.

If you do not know what is the Cortex Search yet, you might just quickly check this link for an overview. Basically, Cortex Search lets you build low-latency semantic and full-text search directly on top of data already stored in Snowflake.

In this article, I’ll share my experience integrating Cortex Search Service into our Python backend app, including the pitfalls we encountered.

Keep the search column focused

The SEARCH_TEXT column, or more precisely the Cortex Search “search column,” is the column Cortex Search indexes and uses for search retrieval.

One mistake I made at first was trying to put almost every available field into the SEARCH_TEXT column.

The logic looked reasonable initially: the more fields Cortex Search sees, the more context it has. But in practice, this can make the searchable text noisy.

For example, fields like IDs, status flags, tenant/company IDs, currency IDs, and internal category IDs are usually not useful for semantic search:

'id:' || COALESCE(id::STRING, ''),
'warehouse_id:' || COALESCE(warehouse_id::STRING, ''),
'project_id:' || COALESCE(project_id::STRING, ''),
'is_active:' || COALESCE(is_active::STRING, ''),
'status:' || COALESCE(status, ''),
'currency_id:' || COALESCE(currency_id::STRING, '')

These fields are not usually part of what the user is searching for. They should be exposed as ATTRIBUTES and used for filtering instead. I’ll cover that in the next section

The SEARCH_TEXT field should be focused on fields that describe the actual meaning of the item:

CONCAT_WS(
    ' ',
    title,
    brand,
    origin,
    category_name,
    subcategory_name,
    short_description
) AS SEARCH_TEXT

We need to include the fields that is useful for the end-user search queries.

Expose filterable fields as `ATTRIBUTES`

Another one small but important detail in Cortex Search Service configuration is the ATTRIBUTES property.

The ON column is the searchable text column. This is what Cortex Search uses for matching user queries. But if you also need to filter results by metadata, such as organization, status, category, brand, region, or availability, those columns must be added to ATTRIBUTES when the service is created.

Attributes are not the main searched text. They are columns returned alongside search results and available for filtering or display. Snowflake documents that Cortex Search filtering works on the ATTRIBUTES columns specified in the CREATE CORTEX SEARCH SERVICE command.

CREATE OR REPLACE CORTEX SEARCH SERVICE DEMO_DB.SEARCH.PRODUCT_SEARCH_SERVICE ON SEARCH_TEXT 
ATTRIBUTES (
  CITY,
  COUNTRY,
  CURRENCY,
  IS_ACTIVE
)
WAREHOUSE = WH_SEARCH_DEMO
TARGET_LAG = '1 hour'
AS
SELECT
    ID,
    CITY,
    COUNTRY,
    CURRENCY,
    IS_ACTIVE
FROM DEMO_DB.SEARCH.PRODUCT_SEARCH_SOURCE;

The main thing to remember: every column you want to filter by later has to be available as an attribute. Also, Snowflake notes that columns in ATTRIBUTES must be included in the source query used to create the service.

Then we can apply these filters in python code:

from typing import Any, Dict, List

COUNTRY_ATTRIBUTE = "COUNTRY"
CITY_ATTRIBUTE = "CITY"
IS_ACTIVE_ATTRIBUTE = "IS_ACTIVE"

filters: List[Dict[str, Any]] = [
    {
        "@or": [
            {
                "@eq": {
                    COUNTRY_ATTRIBUTE: "UK",
                }
            },
            {
                "@eq": {
                    CITY_ATTRIBUTE: "London",
                }
            },
        ],
    },
    {
        "@eq": {
            IS_ACTIVE_ATTRIBUTE: True,
        }
    },
]

response = search_service.search(
    query=query,
    columns=[
        "ID",
        "COUNTRY",
        "CITY",
        "IS_ACTIVE",
    ],
    filter={
        "@and": filters,
    },
    limit=20,
)

Try to keep filters selective and easy to reason about. Good filters should reduce the search space before Cortex Search ranks and returns results. If the filter payload becomes too large or too nested, it may be a sign that the search source table should be adjusted instead of moving too much application logic into the search query.

Pay attention to `TARGET_LAG`

One more important argument in the CREATE OR REPLACE CORTEX SEARCH SERVICE command is TARGET_LAG.

In simple terms, TARGET_LAG controls how fresh your Cortex Search index is compared to the source table.

For instance:

TARGET_LAG = '10 minutes'

This does not mean that every new row becomes searchable immediately. Cortex Search still needs to refresh its internal index first. So when you insert or update rows in the source table, those changes will appear in search results only after the next refresh happens.

This is especially important if you use managed embeddings. Cortex Search has to process the updated source data, create or update the embeddings, and refresh the search index before users can find those records through semantic search.

So if TARGET_LAG is too long, your search results can become stale. A newly published item may already exist in your source table, but users still will not be able to find it in search for some time.

At the same time, setting TARGET_LAG too low is not always the best answer. More frequent refreshes can mean more Snowflake work in the background, and that can increase credit usage. Snowflake also mentions that a target lag that is too low may refresh the index more often than needed.

So the right value depends on how fresh your search results really need to be.

CREATE OR REPLACE CORTEX SEARCH SERVICE DEMO_DB.SEARCH.PRODUCT_SEARCH_SERVICE
ON SEARCH_TEXT
ATTRIBUTES
(
    ITEM_ID,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
)
WAREHOUSE = WH_SEARCH_DEMO
TARGET_LAG = '30 minutes'
AS
SELECT
    ITEM_ID,
    SEARCH_TEXT,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
FROM DEMO_DB.SEARCH.PRODUCT_SEARCH_SOURCE;

For a user-facing catalog search, where newly published items should appear quickly, something like 15-20 minutes may make sense.

For an internal knowledge base, documentation search, or any data that does not change often, 1 hour or even longer may be completely fine.

Choose the right embedding model for your use-case

Cortex Search uses an embedding model during the vector search stage. In simple terms, this model converts your search column and the user query into vectors, so Cortex Search can find records that are semantically similar, not only records that contain the same keywords. Snowflake allows selecting the model with the EMBEDDING_MODEL parameter when creating the Cortex Search Service.

CREATE OR REPLACE CORTEX SEARCH SERVICE DEMO_DB.SEARCH.PRODUCT_SEARCH_SERVICE
ON SEARCH_TEXT
ATTRIBUTES
(
    ITEM_ID,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
)
WAREHOUSE = WH_SEARCH_DEMO
TARGET_LAG = '30 minutes'
EMBEDDING_MODEL = 'snowflake-arctic-embed-m-v1.5' # custom EMBEDDING_MODEL
AS
SELECT
    ITEM_ID,
    SEARCH_TEXT,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
FROM DEMO_DB.SEARCH.PRODUCT_SEARCH_SOURCE;

Snowflake lists snowflake-arctic-embed-m-v1.5 as the default Cortex Search embedding model. It has 768 output dimensions, a 512 token context window, and English-only language support. Snowflake also describes it as the fastest indexing option among the available Cortex Search models. That makes it a good starting point for English-only catalogs or internal search where indexing speed matters.

If your catalog is multilingual, the default English-only model may not work for you. In that case, check multilingual models such as snowflake-arctic-embed-l-v2.0, snowflake-arctic-embed-l-v2.0-8k or voyage-multilingual-2. For the full list of supported models, dimensions, context windows, and language support, refer to the official Snowflake embedding models table.

Snowflake’s CREATE CORTEX SEARCH SERVICE docs show EMBEDDING_MODEL as part of the service definition, and changing it usually means recreating the service rather than just tuning a runtime parameter. So it is worth testing this early, especially if your product has multilingual data or users search in different languages.

Cortex connection warmup

The one of main reasons of why we integrated a Cortex Search was to reduce search query latency, as we have a huge amount of items.

The first bottleneck came from connection handling. I used the Snowflake Python library, and each new search request required opening a new connection. That connection setup alone took around 1–1.5 seconds, while the actual search query took only 500–600 ms. So before optimizing the search itself, I focused on removing the connection setup cost from the request path.

The optimization was to move the Snowflake connection setup out of the request path. I added a warmup() method that resolves the Cortex Search service once and caches the Snowflake connection, Root object, and service reference at the worker-process level. The cache is protected with a lock, so initialization stays safe even if multiple requests arrive at the same time.

    def warmup(self) -> None:
        self._get_service()

    def _get_service(self):
        cache_key = (self.database, self.schema, self.service_name)
        service = self.__class__._service_cache.get(cache_key)
        if service:
            return service

        with self.__class__._lock:
            service = self.__class__._service_cache.get(cache_key)
            if service:
                return service

            root = self._get_root()
            service = (
                root.databases[self.database]
                .schemas[self.schema]
                .cortex_search_services[self.service_name]
            )
            self.__class__._service_cache[cache_key] = service
            return service

    def _get_root(self):
        root = self.__class__._root
        connection = self.__class__._connection
        if root and connection and not connection.is_closed():
            return root

        with self.__class__._lock:
            root = self.__class__._root
            connection = self.__class__._connection
            if root and connection and not connection.is_closed():
                return root

            if connection and not connection.is_closed():
                try:
                    connection.close()
                except Exception:
                    pass

            self.__class__._connection = None
            self.__class__._root = None
            self.__class__._service_cache = {}

            try:
                connection = snowflake.connector.connect(**self._connection_parameters)
            except Exception as exc:
                raise CortexSearchCatalogServiceError(
                    f"Failed to create Snowflake connection for Cortex search: {exc}"
                )

            self.__class__._connection = connection
            root = Root(connection)
            self.__class__._root = root
            return root

Then I called this warmup method from Gunicorn’s post_fork hook. Because Gunicorn workers are separate processes, each worker has to create its own Snowflake connection after it forks. With this change, the connection is opened during worker startup instead of during the first search request, which removes the 1–1.5 second connection setup cost from the user-facing path.

def post_fork(server, worker):
    # Warm the Snowflake Cortex client in each worker process so the first user
    # search does not pay the connection/session setup cost on the request path.

    try:
        if not _cortex_warmup_is_configured():
            server.log.debug(
                "Skipping Cortex warmup for worker pid=%s because Snowflake settings are incomplete.",
                worker.pid,
            )
            return

        from apps.db.cortex_search_services.cortex_search_catalog_service import CortexSearchCatalogService

        CortexSearchCatalogService().warmup()
        server.log.info("Cortex warmup completed for worker pid=%s", worker.pid)
    except Exception:
        server.log.exception("Cortex warmup failed for worker pid=%s", worker.pid)

Tuning Cortex Search scoring weights

The next important optimization was not about latency, but about result quality.

Cortex Search uses hybrid ranking. It can combine semantic/vector similarity, keyword/text matching, and neural reranking. Snowflake exposes this through scoring_config.weights, where vectors, texts, and reranker control the relative contribution of each scoring component. By default, these components have equal weight, but you can adjust them per query depending on your use case.

For example, in our case, pure keyword matching could sometimes rank the wrong item higher just because it contained matching words. One real example was a “glove protection” sign being ranked above actual gloves. The text match was strong, but semantically it was not the product the user expected.

To fix this, I increased the vector score weight relative to the text score:

# Tune these weights to adjust Cortex ranking before results reach application code.
# vectors = semantic similarity, texts = keyword match, reranker = neural reranker.
# Raise vectors relative to texts to prevent keyword-heavy but semantically irrelevant
# items from outranking semantically correct ones.
DEFAULT_SCORING_CONFIG = {
    "weights": {
        "vectors": 2,
        "texts": 1,
        "reranker": 1,
    }
}

This is an important parameter because it directly changes how Cortex Search orders results before they reach your application code. If your search is closer to traditional full-text search, you may want to increase the texts weight. If your users search by meaning, synonyms, descriptions, or natural language, increasing the vectors weight can produce better results.

Another parameter worth testing is reranker. Cortex Search uses semantic reranking by default to improve relevance, but reranking can also increase query latency. Snowflake allows disabling reranking when lower latency is more important than the additional quality improvement.

Return only the data you need from Cortex Search

One practical limitation worth knowing before you wire Cortex Search into production code is payload size.

Snowflake documents response size limits for Cortex Search queries: the REST API and Python API response payload must not exceed 10 MB

This means you should be careful with both sides of the query: what you send to Cortex Search and what you ask it to return.

For filters, try to send only what is really needed. Huge nested is the shortest way for silent bugs in production. If you have a large OR condition over the same field, prefer a more compact operator when possible. For example, instead of building a long or list:

filter={
    "@or": [
        {"@eq": {"STATUS": "published"}},
        {"@eq": {"STATUS": "scheduled"}},
        {"@eq": {"STATUS": "archived"}},
    ]
}

Try to use use in if it fits your case:

filter={
    "@in": {
        "STATUS": ["published", "scheduled", "archived"]
    }
}

The same applies to your data model. If every search request needs a huge amount of filter logic, it may be a sign that the search-facing model is not shaped well enough. Sometimes it is worth preparing a cleaner search source table or view with fields that are easier to filter on, instead of pushing too much application logic into the Cortex Search request.

For responses, I prefer to keep Cortex Search result columns small. In most application search flows, Cortex Search does not need to return the full object payload. It can return only the internal IDs and maybe a few lightweight fields needed for ranking or debugging.

response = search_service.search(
    query=query,
    columns=[
        "ID",
    ],
    filter=filter,
    limit=50,
)

Then the application can take those IDs and fetch the full records from the primary application database:

item_ids = [row["ID"] for row in response.results]

items = (
    CatalogItem.objects
    .filter(id__in=item_ids)
    .select_related("brand", "category")
    .prefetch_related("tags")
)

This keeps Cortex Search focused on what it does best: finding relevant records. Your application's database is still responsible for loading all domain objects, and also you save on query costs.

Conclusion

Cortex Search is a powerful option when you already have data in Snowflake and need low-latency semantic and full-text search without building a separate search pipeline from scratch.

But the important part is that it is not completely “set and forget.” The quality and performance depend a lot on how you configure the service and how you use it from your backend application.

The biggest lessons from my experience were:

warm up the Snowflake connection before the first user request;
keep the search column focused on fields users actually search by;
choose the embedding model based on your data and language requirements;
tune scoring weights depending on whether your use case needs more semantic or keyword-based matching;
add required fields to ATTRIBUTES if you need filtering;
configure TARGET_LAG based on how fresh the search results need to be;
return only the data you need from Cortex Search and load full objects from your primary application database.

Overall, Cortex Search can work really well for application-level search, especially for catalog search, documentation search, and other data already stored in Snowflake.

DEV Community

Building Production-Ready Semantic Search with Python and Snowflake Cortex

Keep the search column focused

Expose filterable fields as `ATTRIBUTES`

Pay attention to `TARGET_LAG`

Choose the right embedding model for your use-case

Cortex connection warmup

Tuning Cortex Search scoring weights

Return only the data you need from Cortex Search

Conclusion

Top comments (0)

Keep the search column focused

Expose filterable fields as ATTRIBUTES

Pay attention to TARGET_LAG

Choose the right embedding model for your use-case

Cortex connection warmup

Tuning Cortex Search scoring weights

Return only the data you need from Cortex Search

Conclusion

Expose filterable fields as `ATTRIBUTES`

Pay attention to `TARGET_LAG`