DEV Community: Gleb Otochkin

How your sample data impact vector tests in PostgreSQL and AlloyDB.

Gleb Otochkin — Tue, 14 Jul 2026 04:44:34 +0000

Introduction

If you are using vector data in PostgreSQL on Cloud SQL or AlloyDB, you probably use some benchmark queries or functions to check your vector search performance. From time to time, I see people performing speed benchmarks or quality testing using generated, fully synthetic data. The main goal, after all, is to test, for example, performance with a particular data type and get some comparable numbers. Does it really matter what data I have in my vector column?

It really depends, and things can indeed go very wrong when you try to evaluate the index’s performance. I use synthetic data all the time for my tests. But it depends on what I am testing. In some cases, real vs. synthetic data might have a significant impact on the results. Let me show you how it can change the outcome of some tests for the vector data type in PostgreSQL.

Two datasets

Some tests can be done using fully synthetic data. For example, if you test inserts, updates, or deletes for vectors, then synthetic data works for you. In such cases, I usually create a fully synthetic dataset of totally random, normalized number arrays. Here is an example of a simple function and how to generate the dataset.

-- Create a function 
CREATE OR REPLACE FUNCTION generate_random_vector(dimensions integer)
RETURNS vector
LANGUAGE sql
AS $$
    SELECT array_agg(random()*2-1)::vector
    FROM generate_series(1, dimensions);
$$;
-- Creating the test table with vector data type 768 dimensions
CREATE TABLE tvectr768(
id BIGINT PRIMARY KEY,
d VECTOR(768)
);
-- Populate the table with 1M or fows
INSERT INTO tvectr768 (id, d)
SELECT
    i AS id,
    -- This function is now called for each row
    generate_random_vector(768) AS d
FROM
    generate_series(1, 1000000) AS i;

In the end, we have a table with 1 million vectors. Each vector dimension value is between -1 and 1. So, the vector is normalized and shows more or less the same behaviour for the DML operations as normally generated embedding for a piece of content.

But when I test recall quality or performance for vector search, I use one of the publicly available datasets with vectors created based on real data. You can find multiple datasets for embeddings on Hugging Face or other sources. For example, you can check the Cohere Labs datasets here: https://huggingface.co/CohereLabs/datasets.

Here are my 1M rows with the same dimensions and structure, but all the vectors in those rows are based on real data.

pgdata=# \d tvreal768
               Table "public.tvreal768"
 Column | Type | Collation | Nullable | Default
--------+-------------+-----------+----------+---------
 id | bigint | | |
 d | vector(768) | | |

pgdata=#

Both tables have the same size and structure.

pgdata=# SELECT
pgdata-# relname AS table_name,
pgdata-# pg_size_pretty(pg_table_size(relid)) AS table_size
pgdata-# FROM
pgdata-# pg_catalog.pg_statio_user_tables
pgdata-# WHERE
pgdata-# relname IN ('tvectr768', 'tvreal768');
 table_name | table_size
------------+------------
 tvectr768 | 4008 MB
 tvreal768 | 4008 MB

If you go a bit deeper, you can see that both tables are about 57 MB in heap size, plus 3950 MB in TOAST where the actual vector data is stored. It makes perfect sense, since the d column with vector(768) is larger than the default 2k threshold and is stored entirely in the TOAST segment.

pgdata-# relname AS table_name,
pgdata-# pg_size_pretty(pg_relation_size(oid)) AS heap_size,
pgdata-# pg_size_pretty(pg_table_size(oid) - pg_relation_size(oid)) AS toast_size
pgdata-# FROM pg_class
pgdata-# WHERE relname = ANY ('{"tvectr768" ,"tvreal768"}')
pgdata-# AND relkind = 'r';
 table_name | heap_size | toast_size
------------+-----------+------------
 tvectr768 | 57 MB | 3950 MB
 tvreal768 | 57 MB | 3950 MB

Let us do some performance tests and compare the results.

Building Vector Index

For my tests, I was using Postgres 18 on a Debian VM in Google Cloud, and AlloyDB Omni for ScaNN index tests on a VM of the same size. We are mostly interested in the difference between “real” and synthetic vector values, but this varies depending on the index. The first test is to build an IVFFLAT index on our vectors.

Here is an example for the IVFFLAT index on synthetic vector data.

drop index if exists tvectr768_d_idx;
CREATE INDEX ON tvectr768 USING ivfflat (d vector_cosine_ops) WITH (lists=1000);

You can see I was using default parameters, specifying only the number of lists. According to my tests, it takes about 271,068 ms to build the index on the synthetic tvectr768 table. And when it is built, the size of the index is roughly equal to the size of all the vectors (3912 MB), which makes total sense.

When we run the same for the tvreal768 table with real vectors, we get similar results: 271,788 ms. So far, there is no difference. And it is probably expected if we think about how the IVFFLAT index is being built with splitting to buckets according to the number of lists and then simply moving each vector to one or another bucket.

What about the HNSW index? Here is an example of a query used to build the HNSW index.

drop index if exists tvectr768_d_idx;
CREATE INDEX ON tvectr768 USING hnsw (d vector_cosine_ops);

And here we start to see the difference in performance. The HNSW is a graph-based index and it is much harder to find node connections for a totally random array of numbers. You can play with different build parameters for the index and see how it can impact the results.

For the building with default parameters it took on average 6601789 ms for synthetic data and 4174196 ms for the “real” vectors. That’s 1.5 faster for the real data and it is significant.

So, why is it happening? How does the randomness in the numbers in the vector impact the performance? It probably deserves a separate article but I will try to give a short explanation. The HNSW builds its graph incrementally. When it adds another vector to the index it has to search for the right place in the graph. It means to find the closest neighbors to connect the new vector to. HNSW relies on the assumption that your data contains some underlying structure and it means clusters, patterns, and dense neighborhoods. The embedding vectors have but the random vector most likely not. Because of that the build process has to evaluate more candidates to find neighbors. That process is mathematically expensive for distances like cosine distance and takes time. That’s a short explanation but I hope it helps to understand to some extent the reasons behind the difference in built time.

With the ScaNN index we have another case of striking difference between building time for index with fully synthetic vs “real” dataset. I was using the basic recommended parameters for the ScaNN index according to the guide for AlloyDB Omni.

BEGIN;
SET LOCAL scann.num_leaves_to_search = 1;
SET LOCAL scann.pre_reordering_num_neighbors=50;
drop index if exists tvreal768_d_idx;
CREATE INDEX tvreal768_d_idx ON tvreal768 USING scann (d cosine) WITH (num_leaves = 1000);
END;

It took 237094 ms to build the index on the synthetic data vs only 62673 ms on the “real” dataset. And the size of indexes was quite different because AlloyDB ScaNN was using internal index optimization for the real dataset. We have 1562 MB for the synthetic data index vs 275 MB for the “real” one.

So, when you start to evaluate performance in building indexes then you probably should use proper data or you can get misleading results. I would recommend using vectors built on your actual data.

Vector Search Performance

Now that we have our ANN indexes built, we can evaluate how quickly you can find the top 5 similar vectors using the HNSW index. I am using a procedure that takes 100 samples from the table and uses some of those samples as predicates to find the 5 most similar vectors. I repeated this 11 times, discarding the first execution. This gives me the throughput in Queries Per Second (QPS) and the average execution time per query, along with the max and min times for all executions. You can check the procedure benchmark_vector_search in the Appendix chapter and the end of the article or download it from GitHub.

Here is a sample output from the procedure.

NOTICE: ------------------------------------------------------------------------
NOTICE: Benchmark finished successfully.
NOTICE: Total queries executed: 1000 (excluding warmup)
NOTICE: Total execution time: 2.145 seconds (excluding warmup)
NOTICE: Throughput (overall): 466.1 QPS (excluding warmup)
NOTICE: ------------------------------------------------------------------------
NOTICE: Latency Stats (excluding warmup):
NOTICE: Average: 2.125 ms
NOTICE: Minimum: 1.389 ms
NOTICE: Maximum: 3.888 ms
NOTICE: ------------------------------------------------------------------------

When we run our tests, it shows 4.173 ms on average for synthetic data vs. 2.102 ms for our “real” vectors using HNSW index. Interestingly enough, for AlloyDB ScaNN, the difference is not so big — only 5.958 ms vs. 5.161 ms if we use 1,000 lists. But if we use an index with only 100 lists, we can see a significant difference in performance, where the response time for synthetic data is 7.239 ms vs. 4.249 ms for the “real” ones.

The reasons behind the difference in performance have the same nature but slightly different consequences. Since the random vector lacks clusters and dense neighborhoods it goes through more steps to find the closest neighbours. Instead, for example going through 32 nodes to find the closest neighbour it has to traverse 256 or more and even in such a case the quality might be still low.

So, if you are testing HNSW, which is widely used in the community, the 2x difference is big enough and fully justifies the effort of getting a real dataset.

Vector Search Quality

We already established that synthetic data can mislead you about build time or vector search speed for your vector indexes. What about quality?

Usually, when we talk about vector search quality, we use the term “recall,” which represents the percentage of exact matches returned by an ANN index compared to the list of hits returned by a KNN exact search. For example, we do a KNN search returning the top 10 similar vectors. Then, we run exactly the same query with an ANN index and compare how many of the 10 vectors returned by the ANN search match the KNN results. If it is 9 out of 10, then it is 90% recall.

In my procedure evaluate_vector_recall, I use 100 random samples from the dataset and then iterate on those samples to get the recall for each of them, averaging the results at the end. I tested it with the top 10 similar vectors (k_param=>10). The procedure is provided in the Appendix and on GitHub.

Here is a sample output of the procedure.

pgdata=# call evaluate_vector_recall('tvreal768',768,k_param=>10);
NOTICE: Starting vector similarity search recall evaluation...
NOTICE: Target table: tvreal768, Expected dimensions: 768
NOTICE: Parameters - Vector Column: d, ID Column: id, K: 10, Sample Size: 100
NOTICE: Selecting 100 random vectors for sampling from tvreal768...
NOTICE: ------------------------------------------------------------------------
NOTICE: Recall Evaluation Finished.
NOTICE: Total queries evaluated: 100
NOTICE: Average Recall@10: %91.80
NOTICE: Minimum Recall@10: %0.00
NOTICE: Maximum Recall@10: %100.00
NOTICE: ------------------------------------------------------------------------
CALL

According to my test results, the recall for synthetic data is quite low. On average for the 100 samples, I am getting only 3.8% recall using HNSW, while for “real” data, the recall is 91.8%. This means our search on synthetic data is not only slow but also hugely inaccurate. When we try it using the AlloyDB ScaNN index, it shows 16.4% recall for synthetic data vs. 91.4% for the “real” data. Only IVFFLAT didn’t show itself good with the real dataset giving only about 51.9% recall on average. But the quality with synthetic data was still much worse than that — only 10.6%

Of course, your test results can and most likely will be different, but the trend is clear. Even if ScaNN shows slightly better results for a synthetic vector search, it is still quite misleading and inaccurate.

You also can opt-in to use the evaluate_query_recall procedure coming with the AlloyDB Omni and it gives you the option to measure recall. Read more in the official guide on how to use the procedure.

Conclusion

Here is my short summary for all the tests and results

Synthetic vectors can be used for pure performance DML tests where the vectors are not used as predicates and are only used to fill up space. For example, for inserts, updates, or deletes of rows filtered by non-vector column data.
Synthetic vector data shows incorrect results for index build times and can mislead testers by giving inaccurate results. So far, in most of my tests, synthetic data required more time to build the index compared to realistic vector data.
An ANN vector search executed against a synthetic dataset can be 1.5 times slower than for a similar dataset with realistic vectors.
Recall quality for an ANN vector search cannot be measured against synthetic vectors. In my tests, the recall quality ranged from 3.8% to 16.4%, depending on the index type and index parameters. The real dataset showed more than 90% recall for most of the tests.
Get a real dataset and use it for all your performance and quality testing for ANN indexes.

Appendix

Here are some procedures I used for testing. I don’t pretend they are perfect in any way but should be clear enough for everyone who tries to use it. I used some hardcoded values to make it simpler to read. Both procedures are available on GitHub.

Here is the procedure to check response time for vector search. It is using cosine distance for measurement and takes the first 100 vectors from the table as a sample to be used later.

CREATE OR REPLACE PROCEDURE benchmark_vector_search(
    n_repetitions integer,
    table_name_param text,
    vector_dimensions integer
)
LANGUAGE plpgsql
AS $$
DECLARE
    -- Array to hold the 100 vectors sampled for the benchmark run
    sample_vectors vector[];
    -- Variable to hold the current vector being searched for
    current_vector vector;
    -- Variables for timing
    overall_start_time timestamptz;
    overall_end_time timestamptz;
    repetition_start_time timestamptz;
    q_start timestamptz;
    q_end timestamptz;
    q_diff double precision; -- in milliseconds

    -- Metrics tracked across the entire benchmark
    global_min_latency double precision := 1e9;
    global_max_latency double precision := 0;
    global_sum_latency double precision := 0;
    overall_total_queries integer := 0;

    -- Variable for dimension validation
    actual_dimensions integer;
    actual_sample_size integer;

    -- Loop counter
    i int;
BEGIN
    RAISE NOTICE 'Starting vector similarity search benchmark...';
    RAISE NOTICE 'Target table: %, Repetitions: %, Expected dimensions: %', table_name_param, n_repetitions, vector_dimensions;

    -- Validate that the vector column "d" exists and dimensions match
    BEGIN
        EXECUTE format('SELECT vector_dims(d) FROM %I WHERE d IS NOT NULL LIMIT 1', table_name_param)
        INTO actual_dimensions;

        IF actual_dimensions IS NULL THEN
            RAISE EXCEPTION 'Could not find any non-NULL vectors in table %. The table might be empty or column "d" contains only NULLs.', table_name_param;
        END IF;

        IF actual_dimensions != vector_dimensions THEN
            RAISE EXCEPTION 'Vector dimension mismatch. Expected dimension %, but found dimension % in table %.', vector_dimensions, actual_dimensions, table_name_param;
        END IF;
    EXCEPTION
        WHEN undefined_table THEN
            RAISE EXCEPTION 'Table "%" does not exist.', table_name_param;
        WHEN undefined_column THEN
            RAISE EXCEPTION 'Table "%" does not have a column named "d".', table_name_param;
    END;

    RAISE NOTICE 'Dimension validation successful.';

    -- Fetch 100 random vectors to use as search queries
    RAISE NOTICE 'Selecting 100 random vectors for sampling from %...', table_name_param;
    EXECUTE format('SELECT array_agg(d) FROM (SELECT d FROM %I WHERE d IS NOT NULL ORDER BY random() LIMIT 100) AS random_sample', table_name_param)
    INTO sample_vectors;

    IF sample_vectors IS NULL OR COALESCE(array_length(sample_vectors, 1), 0) = 0 THEN
        RAISE EXCEPTION 'Could not retrieve any sample vectors. The table % might be empty or all vectors are NULL.', table_name_param;
    END IF;

    actual_sample_size := array_length(sample_vectors, 1);
    IF actual_sample_size < 100 THEN
        RAISE WARNING 'Could not retrieve 100 sample vectors. Using % available sample vectors instead.', actual_sample_size;
    END IF;

    RAISE NOTICE 'Starting main benchmark loop...';
    overall_start_time := clock_timestamp();

    -- The main outer loop that repeats the entire test (warmup + N repetitions)
    FOR i IN 0..n_repetitions LOOP
        repetition_start_time := clock_timestamp();

        -- Reset overall_start_time to start of repetition 1 to exclude warmup duration from QPS/Totals
        IF i = 1 THEN
            overall_start_time := clock_timestamp();
        END IF;

        DECLARE
            rep_min_latency double precision := 1e9;
            rep_max_latency double precision := 0;
            rep_sum_latency double precision := 0;
            rep_queries integer := 0;
            rep_duration double precision;
        BEGIN
            -- The inner loop that iterates through each of the sampled vectors
            FOREACH current_vector IN ARRAY sample_vectors
            LOOP
                q_start := clock_timestamp();
                -- Perform the core operation (limiting to top 5)
                EXECUTE format('SELECT 1 FROM %I ORDER BY d <=> $1 LIMIT 5', table_name_param)
                USING current_vector;
                q_end := clock_timestamp();

                q_diff := extract(epoch from (q_end - q_start)) * 1000.0; -- milliseconds

                -- Accumulate local statistics
                IF q_diff < rep_min_latency THEN rep_min_latency := q_diff; END IF;
                IF q_diff > rep_max_latency THEN rep_max_latency := q_diff; END IF;
                rep_sum_latency := rep_sum_latency + q_diff;
                rep_queries := rep_queries + 1;
            END LOOP;

            -- Calculate total time for this repetition
            rep_duration := extract(epoch from (clock_timestamp() - repetition_start_time));

            IF i = 0 THEN
                -- Warmup iteration: report it but do not add to global statistics
                RAISE NOTICE 'Warmup Repetition: Avg Latency = % ms (Min: % ms, Max: % ms) | QPS: %', 
                             round((rep_sum_latency / rep_queries)::numeric, 3), 
                             round(rep_min_latency::numeric, 3), 
                             round(rep_max_latency::numeric, 3),
                             round((rep_queries / rep_duration)::numeric, 1);
            ELSE
                -- Accumulate global statistics
                IF rep_min_latency < global_min_latency THEN global_min_latency := rep_min_latency; END IF;
                IF rep_max_latency > global_max_latency THEN global_max_latency := rep_max_latency; END IF;
                global_sum_latency := global_sum_latency + rep_sum_latency;
                overall_total_queries := overall_total_queries + rep_queries;

                RAISE NOTICE 'Repetition %/%: Avg Latency = % ms (Min: % ms, Max: % ms) | QPS: %', 
                             i, n_repetitions, 
                             round((rep_sum_latency / rep_queries)::numeric, 3), 
                             round(rep_min_latency::numeric, 3), 
                             round(rep_max_latency::numeric, 3),
                             round((rep_queries / rep_duration)::numeric, 1);
            END IF;
        END;
    END LOOP;

    overall_end_time := clock_timestamp();

    -- Log overall summary metrics
    DECLARE
        total_benchmark_time double precision;
        overall_qps double precision;
    BEGIN
        total_benchmark_time := extract(epoch from (overall_end_time - overall_start_time));
        overall_qps := overall_total_queries / total_benchmark_time;

        RAISE NOTICE '------------------------------------------------------------------------';
        RAISE NOTICE 'Benchmark finished successfully.';
        RAISE NOTICE 'Total queries executed: % (excluding warmup)', overall_total_queries;
        RAISE NOTICE 'Total execution time: % seconds (excluding warmup)', round(total_benchmark_time::numeric, 3);
        RAISE NOTICE 'Throughput (overall): % QPS (excluding warmup)', round(overall_qps::numeric, 1);
        RAISE NOTICE '------------------------------------------------------------------------';
        RAISE NOTICE 'Latency Stats (excluding warmup):';
        RAISE NOTICE ' Average: % ms', round((global_sum_latency / overall_total_queries)::numeric, 3);
        RAISE NOTICE ' Minimum: % ms', round(global_min_latency::numeric, 3);
        RAISE NOTICE ' Maximum: % ms', round(global_max_latency::numeric, 3);
        RAISE NOTICE '------------------------------------------------------------------------';
    END;
END;
$$;

And here is the procedure where I measure recall quality. It takes recall from each sample and then calculates an average recall quality for the entire execution.

CREATE OR REPLACE PROCEDURE evaluate_vector_recall(
    table_name_param text,
    vector_dimensions integer,
    vector_col_param text DEFAULT 'd',
    id_col_param text DEFAULT 'id',
    k_param integer DEFAULT 5,
    sample_size_param integer DEFAULT 100
)
LANGUAGE plpgsql
AS $$
DECLARE
    -- Array to hold the vectors sampled for the recall evaluation
    sample_vectors vector[];
    -- Variable to hold the current vector being searched for
    current_vector vector;
    -- Arrays to hold the result IDs from exact and approximate searches
    exact_ids text[];
    approx_ids text[];
    -- Metrics variables
    overlap_count integer;
    vector_recall double precision;
    sum_recall double precision := 0.0;
    min_recall double precision := 1.0;
    max_recall double precision := 0.0;
    actual_dimensions integer;
    actual_sample_size integer;
    total_queries integer := 0;
BEGIN
    RAISE NOTICE 'Starting vector similarity search recall evaluation...';
    RAISE NOTICE 'Target table: %, Expected dimensions: %', table_name_param, vector_dimensions;
    RAISE NOTICE 'Parameters - Vector Column: %, ID Column: %, K: %, Sample Size: %', 
                 vector_col_param, id_col_param, k_param, sample_size_param;

    -- Validate vector column exists and dimension matches
    BEGIN
        EXECUTE format('SELECT vector_dims(%I) FROM %I WHERE %I IS NOT NULL LIMIT 1', 
                       vector_col_param, table_name_param, vector_col_param)
        INTO actual_dimensions;

        IF actual_dimensions IS NULL THEN
            RAISE EXCEPTION 'Could not find any non-NULL vectors in column % of table %. The table might be empty or contains only NULLs.', 
                            vector_col_param, table_name_param;
        END IF;

        IF actual_dimensions != vector_dimensions THEN
            RAISE EXCEPTION 'Vector dimension mismatch. Expected dimension %, but found dimension % in table %.', 
                            vector_dimensions, actual_dimensions, table_name_param;
        END IF;
    EXCEPTION
        WHEN undefined_table THEN
            RAISE EXCEPTION 'Table "%" does not exist.', table_name_param;
        WHEN undefined_column THEN
            RAISE EXCEPTION 'Column "%" does not exist in table "%".', vector_col_param, table_name_param;
    END;

    -- Validate ID column exists
    BEGIN
        EXECUTE format('SELECT %I FROM %I LIMIT 1', id_col_param, table_name_param);
    EXCEPTION
        WHEN undefined_column THEN
            RAISE EXCEPTION 'ID column "%" does not exist in table "%". Cannot measure recall.', 
                            id_col_param, table_name_param;
    END;

    -- Select random sample vectors (ignoring nulls)
    RAISE NOTICE 'Selecting % random vectors for sampling from %...', sample_size_param, table_name_param;
    EXECUTE format('SELECT array_agg(%I) FROM (SELECT %I FROM %I WHERE %I IS NOT NULL ORDER BY random() LIMIT $1) AS random_sample', 
                   vector_col_param, vector_col_param, table_name_param, vector_col_param)
    INTO sample_vectors
    USING sample_size_param;

    IF sample_vectors IS NULL OR COALESCE(array_length(sample_vectors, 1), 0) = 0 THEN
        RAISE EXCEPTION 'Could not retrieve any sample vectors. The table % might be empty or all vectors are NULL.', table_name_param;
    END IF;

    actual_sample_size := array_length(sample_vectors, 1);
    IF actual_sample_size < sample_size_param THEN
        RAISE WARNING 'Could not retrieve % sample vectors. Using % available sample vectors instead.', 
                      sample_size_param, actual_sample_size;
    END IF;

    -- Iterate through sampled vectors to calculate recall
    FOREACH current_vector IN ARRAY sample_vectors
    LOOP
        -- 1. Find exact nearest neighbors (ground truth) by forcing sequential scan
        SET LOCAL enable_indexscan = off;
        SET LOCAL enable_bitmapscan = off;
        EXECUTE format('SELECT array_agg(%I::text) FROM (SELECT %I FROM %I ORDER BY %I <=> $1 LIMIT $2) AS exact_search', 
                       id_col_param, id_col_param, table_name_param, vector_col_param)
        INTO exact_ids
        USING current_vector, k_param;

        -- 2. Find approximate nearest neighbors (allowing index scan)
        SET LOCAL enable_indexscan = on;
        SET LOCAL enable_bitmapscan = on;
        EXECUTE format('SELECT array_agg(%I::text) FROM (SELECT %I FROM %I ORDER BY %I <=> $1 LIMIT $2) AS approx_search', 
                       id_col_param, id_col_param, table_name_param, vector_col_param)
        INTO approx_ids
        USING current_vector, k_param;

        -- 3. Calculate overlap
        IF exact_ids IS NOT NULL AND approx_ids IS NOT NULL THEN
            SELECT COUNT(*) INTO overlap_count
            FROM unnest(approx_ids) a
            JOIN unnest(exact_ids) e ON a = e;

            vector_recall := overlap_count::double precision / k_param;
            sum_recall := sum_recall + vector_recall;

            IF vector_recall < min_recall THEN min_recall := vector_recall; END IF;
            IF vector_recall > max_recall THEN max_recall := vector_recall; END IF;

            total_queries := total_queries + 1;
        END IF;
    END LOOP;

    -- Print recall report
    IF total_queries > 0 THEN
        RAISE NOTICE '------------------------------------------------------------------------';
        RAISE NOTICE 'Recall Evaluation Finished.';
        RAISE NOTICE 'Total queries evaluated: %', total_queries;
        RAISE NOTICE 'Average Recall@%: %%%', k_param, round(((sum_recall / total_queries) * 100.0)::numeric, 2);
        RAISE NOTICE 'Minimum Recall@%: %%%', k_param, round((min_recall * 100.0)::numeric, 2);
        RAISE NOTICE 'Maximum Recall@%: %%%', k_param, round((max_recall * 100.0)::numeric, 2);
        RAISE NOTICE '------------------------------------------------------------------------';
    ELSE
        RAISE WARNING 'No recall queries were successfully evaluated.';
    END IF;
END;
$$;

Migrating to Antigravity CLI from Gemini CLI with MCP for Google Cloud Databases

Gleb Otochkin — Tue, 30 Jun 2026 05:31:23 +0000

Time to Move

If you are following Google news closely you are probably aware about Antigravity CLI announced on the Google I/O 2026 and gradual move to the new tools ecosystem from the previous generation of AI assistants. In short, it’s time to move from Gemini CLI to Antigravity CLI. In this article I’ll explain how to set up your essential MCP servers for Google Cloud in the Antigravity CLI.

Gemini CLI MCP Configuration

Everybody has their own set of extensions and tools for Gemini CLI. In my case, I work extensively with Google Cloud databases and, as such, I have a few “must-have” extensions for Gemini CLI.

Here is my minimum Gemini CLI configuration for MCP servers.

gleb@db-connect:~$ gemini extension list
Ignore file not found: /home/gleb/.geminiignore, continue without it.
✓ alloydb (1.0.0)
 ID: dffab4e5e5d86ea81431cad2bf77fc027c0042d54fc4414c67145bfa255ee6cf
 name: dffab4e5e5d86ea81431cad2bf77fc027c0042d54fc4414c67145bfa255ee6cf
 Path: /home/gleb/.gemini/extensions/alloydb
 Enabled (User): true
 Enabled (Workspace): true
 MCP servers:
  AlloyDB MCP Server

✓ cloud-sql (1.0.0)
 ID: fe4d062a1349d2dddc18bd27899586f9c5a6ea524a95c44ec45257cbd18e9ed9
 name: fe4d062a1349d2dddc18bd27899586f9c5a6ea524a95c44ec45257cbd18e9ed9
 Path: /home/gleb/.gemini/extensions/cloud-sql
 Enabled (User): true
 Enabled (Workspace): true
 MCP servers:
  Cloud SQL MCP Server

✓ developer-knowledge (0.1.0)
 ID: 7b4042e11865c674ad1b7007c3040ba4a2a277c16c8d9bda9ba10ac7a70d2eec
 name: 47ba5f7123468475fe1edc973bf455cabe8eb51061f1e892d0bb9e1867db98b9
 Path: /home/gleb/.gemini/extensions/developer-knowledge
 Source: https://github.com/gemini-cli-extensions/developer-knowledge (Type: git)
 Enabled (User): true
 Enabled (Workspace): true
 Context files:
  /home/gleb/.gemini/extensions/developer-knowledge/GEMINI.md
 MCP servers:
  developer-knowledge

✓ spanner (1.0.0)
 ID: eb1ea9d72fe0c79269a8dc2b047ddfb531b004ed19afb45eaef9e9a38cccf351
 name: eb1ea9d72fe0c79269a8dc2b047ddfb531b004ed19afb45eaef9e9a38cccf351
 Path: /home/gleb/.gemini/extensions/spanner
 Enabled (User): true
 Enabled (Workspace): true
 MCP servers:
  Spanner MCP Server

There are three extensions for different Google Cloud databases and the Developer Knowledge MCP. For some projects I have additional MCP servers depending on the nature of the project, but for this tutorial, we will limit ourselves to just those four extensions. Our goal is to move everything to Antigravity CLI and get them working.

Install Antigravity CLI

The full instructions on how to install Antigravity CLI are in the documentation, but here is a short version. I am showing this on a Debian Linux box ,but the steps are generally the same for a Mac installation and migration.

Run the installation script in a terminal window on your Mac or Linux machine.

curl -fsSL https://antigravity.google/cli/install.sh | bash

At the end of the installation, it should show something like this:

On Linux, it automatically adds an alias for the binary, so you can start the tool using the agy alias. On the first launch, it will ask you to authenticate either with your Google account or by using a specific project.

And if you’ve chosen a project path then after authentication it will ask to put the project ID:

Next, it will ask about your location and then prompt you to choose a color scheme. On the color scheme screen, there’s a checkbox at the bottom that allows you to import all your existing Gemini CLI extensions into Antigravity CLI.

And as you can see, all my existing extensions are listed there. Can we wrap up here and call it a day? We could, but what if we missed the checkbox and clicked ‘Next’ without importing our extensions? As a result, we wouldn’t have any of our old MCPs in our Antigravity CLI.

Is it possible to migrate them afterward? Yes, of course, and in the next section, we’ll do exactly that.

Migrating Extensions to Antigravity Plugins

The full guide for migrating Gemini CLI extensions is in the documentation and you can take a look there if something isn’t working or if you want the latest information and updates. Here is my short version, along with a few fixes I had to apply.

First, exit Antigravity and run the following command in the terminal:

agy plugin import gemini

The command should find all existing extensions, skills, hooks, and MCP servers and convert them into Antigravity plugins. You should see something like this:

Now you can start the Antigravity CLI and check your MCP servers using the /mcp command. All of your MCP servers should be visible in the output.

At first glance, it looks like everything is fine. But you’ll notice that only the Developer Knowledge MCP shows its tools, while none of the database plugins show any tools at all. If that happens, you might need to fix your configuration.

Have a look into the tools definitions in the ~/.gemini/antigravity-cli/mcp directory — the tools don’t have parameters.

gleb@db-connect:~$ cat .gemini/antigravity-cli/mcp/AlloyDB\ MCP\ Server/list_clusters.json | jq
{
  "name": "list_clusters",
  "description": "List all clusters",
  "parameters": null
}

Also, if we check the Antigravity CLI log at ~/.gemini/antigravity-cli/cli.log, we can see that the database MCP servers are not authenticating correctly.

W0618 18:12:18.733858 86343 mcp_auth.go:634] OAuth setup failed for Spanner MCP Server: OAuth client ID required: server does not support dynamic client registration
W0618 18:12:18.734186 86343 mcp_auth.go:634] OAuth setup failed for AlloyDB MCP Server: OAuth client ID required: server does not support dynamic client registration
W0618 18:12:18.740350 86343 mcp_auth.go:634] OAuth setup failed for Cloud SQL MCP Server: OAuth client ID required: server does not support dynamic client registration
W0618 18:12:18.917512 86343 mcp_auth.go:634] OAuth setup failed for developer-knowledge: OAuth client ID required: server does not support dynamic client registration

And when we look into the plugin config for the imported AlloyDB MCP extension, we can see it doesn’t have an authentication section. Also, the plugin’s name consists of several words separated by spaces.

gleb@db-connect:~$ cat .gemini/config/plugins/alloydb/mcp_config.json 
{
  "mcpServers": {
    "AlloyDB MCP Server": {
      "command": "",
      "args": null,
      "cwd": "",
      "env": null,
      "serverUrl": "https://alloydb.googleapis.com/mcp"
    }
  }
}

The authProviderType parameter is missing, and the name for the MCP server should be fixed. This is likely why it isn’t working correctly. Let’s rename the MCP server and add the missing parameter. Here is the fixed configuration for the AlloyDB MCP plugin:

{
  "mcpServers": {
    "AlloyDB": {
      "command": "",
      "args": null,
      "cwd": "",
      "env": null,
      "serverUrl": "https://alloydb.googleapis.com/mcp",
      "authProviderType": "google_credentials"
    }
  }
}

Now, if we start the Antigravity CLI again and run the /mcp command, we can see all the tools associated with the remote AlloyDB MCP.

And when we test the MCP, we can confirm that it works correctly now.

We now have a new subdirectory for the AlloyDB MCP server, and the tools have all the necessary parameters and required information there.

gleb@db-connect:~$ cat .gemini/antigravity-cli/mcp/AlloyDB/list_clusters.json | jq
{
  "name": "list_clusters",
  "description": "List all clusters",
  "parameters": {
    "description": "Message for requesting list of Clusters",
    "properties": {
      "filter": {
        "description": "Optional. Filtering results",
        "type": "string"
      },
      "orderBy": {
        "description": "Optional. Hint for how to order the results",
        "type": "string"
      },
      "pageSize": {
        "description": "Optional. Requested page size. Server may return fewer items than requested. If unspecified, server will pick an appropriate default.",
        "format": "int32",
        "type": "integer"
      },
      "pageToken": {
        "description": "A token identifying a page of results the server should return.",
        "type": "string"
      },
      "parent": {
        "description": "Required. The name of the parent resource. For the required format, see the comment on the Cluster.name field. Additionally, you can perform an aggregated list operation by specifying a value with the following format: * projects/{project}/locations/-",
        "type": "string"
      }
    },
    "required": [
      "parent"
    ],
    "type": "object"
  }
}

The old subdirectory with the incomplete tools can now be removed from the MCP directory.

gleb@db-connect:~$ rm -rf .gemini/antigravity-cli/mcp/AlloyDB\ MCP\ Server/

We can fix the rest of our MCP servers for Cloud SQL and Spanner in the same way. For the Developer Knowledge MCP, however, we only need to add the authProviderType, its name didn’t have any spaces and was processed correctly.

In the end, you should see something like this when you list your MCP servers in the Antigravity CLI:

And a couple of words about the Developer Knowledge MCP. I think it’s one of the “must-have” MCPs in any configuration. It provides up-to-date information directly from the Google documentation and is one of the tools I use every single day.

You can also test the Developer Knowledge MCP by asking it to prepare a tutorial on a specific product option. For example, you can ask: ‘Prepare a tutorial with options on how I can connect to an AlloyDB instance.’ The Developer Knowledge MCP will use the search_documents tool to find the information and then create the tutorial based on the latest version of the Google documentation.

Try it and let us know

I recommend Antigravity CLI to everyone who likes command-line tools and is more comfortable with a terminal than a GUI. But keep in mind that Antigravity also has a GUI application and IDE, so you have a choice.

If you need to customize your Antigravity setup, it supports skills, plugins, rules, hooks, and MCP servers.

Happy migration, and let us know how it works for you!

Gemini 3.5 Flash in Google Cloud Databases

Gleb Otochkin — Sat, 23 May 2026 09:41:05 +0000

I hope you’ve been able to attend or at least watch the IO 26 keynote. Among other exciting announcements Google introduced the new Gemini 3.5 Flash model, and I immediately went to test it in Cloud SQL and AlloyDB. It worked really well, returning faster and more accurate responses in my particular use case compared to Gemini 2.5 Flash or even Gemini 3 Flash Preview. I encourage you to try it out for yourself and test it with your own workloads. Let me show you how I tested it.

Cloud SQL

Cloud SQL has full AI integration via the google_ml extension and can call models directly from a SQL query. It has several preregistered models, but Gemini 3.5 Flash was not there yet when I tested it. To work with the model, I registered it using the following call. Replace the PROJECT_ID placeholder in the code by your Google project id.

CALL google_ml.create_model(
    model_id => 'gemini-3.5-flash',
    model_request_url => 'https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/gemini-3.5-flash:generateContent',
    model_provider => 'google',
    model_type => 'generic',
    model_auth_type => 'cloudsql_service_agent_iam'
);

If you look at the model_request_url parameter, you’ll notice I’ve put the full URL path there with the global location. And I have a reason for that. Even if Cloud SQL and AlloyDB allow you to use a short variant for the URL, it might not work with the new models. If you try to use something like publishers/google/models/gemini-3.5-flash:generateContent, you will most likely get an error, since by default it tries to use a regional path — which is not available for Gemini 3.5 models.

After the registration, the model can be used in a query or in a function. I tested it with the following query:

SELECT google_ml.predict_row(
    model_id => 'gemini-3.5-flash',
    request_body => json_build_object(
        'contents', json_build_array(
            json_build_object(
                'role', 'user',
                'parts', json_build_array(
                    json_build_object('text', 'Explain MCP server for a relational database in 50 words or less.')
                )
            )
        )
    )
) ->'candidates' -> 0 -> 'content' -> 'parts' -> 0 -> 'text' AS ai_response;

It took about 4.1 seconds to get the response from the Gemini 3.5 Flash model. Then I registered the Gemini 2.5 Flash and Gemini 3 Flash Preview models and repeated the same query using their endpoints. For Gemini 2.5 Flash, it took 5.3 seconds on average, and 12.3 seconds for Gemini 3 Flash Preview. I expected version 3.5 to be faster than the preview version, but it was even faster than version 2.5.

Then we have to look into the quality of the response. Gemini 3.5 Flash provided an answer that aligns better with how the term “MCP” is used today when compared to the response from Gemini 2.5 Flash.

Here is an example of the response from Gemini 2.5 Flash:

“MCP server isn’t a standard term in relational databases. It *could* refer to a Master Control Program in legacy systems, acting as a central process coordinating database operations. It’s not a common component in modern database architectures.”

It looks like Gemini 2.5 Flash talks about an OS for mainframes. I am not sure how relevant that is for a general audience. Now, let’s compare it with the response from Gemini 3.5 Flash.

“An MCP (Model Context Protocol) server is a secure bridge connecting AI models to a relational database. It translates the AI’s natural language requests into SQL commands, allowing the model to safely inspect schemas, query data, and perform database operations in real-time.”

That makes more sense and especially now when everybody uses AI and agents.

Here is the summary table for the models performance:

Model Version | Response Time | Response Quality
-----------------------|---------------|------------------
Gemini 3.5 Flash | 4.1 seconds | Excellent
Gemini 2.5 Flash | 5.3 seconds | Average
Gemini 3 Flash Preview | 12.3 seconds | Good

After the tests with Cloud SQL, I moved to my AlloyDB cluster to see how it worked there.

AlloyDB

I used a very similar procedure to register the new model in the AlloyDB database, with the only difference being the model_type parameter. In AlloyDB, we can register it with the model type “llm”. Again — don’t forget to replace the PROJECT_ID placeholder by your project id.

CALL google_ml.create_model(
model_id => 'gemini-3.5-flash',
model_request_url => 'https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/gemini-3.5-flash:generateContent',
model_provider => 'google',
model_type => 'llm'
);

To test the model in AlloyDB, I used a different function — ai.generate. The ai.generate function is much more convenient to use — you don’t need to prepare your JSON input, and it returns the output as plain text right out of the box.

SELECT ai.generate(
    'Explain MCP server for a relational database in 50 words or less.',
    model_id => 'gemini-3.5-flash'
) AS ai_response;

In AlloyDB, the response from Gemini 3.5 Flash took the same 4.2 seconds on average, with the same high-quality results.

The Gemini 2.5 Flash model was already there out of the box, and I registered Gemini 3 Flash Preview using the same approach as above. After testing the query using ai.generate, I got 5.4 seconds for Gemini 2.5 Flash and 13.1 seconds on average for Gemini 3 Flash Preview. The quality of the response was the same as in my Cloud SQL tests. And the ai.generate on AlloyDB was performing with the same speed as google_ml.predict_row on Cloud SQL. I tested it with both regional and global endpoints getting the same performance.

The new model is more expensive but considering speed and quality of the result it might pay off especially for agentic workload where low quality response can lead to more turns for an agent and more consumption in the end.

Summary

My experience so far with the new Gemini 3.5 Flash model is faster and better responses over the previous Gemini 2.5 Flash model, and it is much faster than the Gemini 3 Flash Preview we have been testing previously. I think it is the new ‘workhorse’ for most of my workloads. Try it out for yourself, and read about the new Gemini model and all the other I/O ’26 announcements in the Google blog.

Demystifying max connections limit in Cloud SQL for PostgreSQL

Gleb Otochkin — Thu, 30 Apr 2026 22:43:25 +0000

Introduction

If you’ve worked with Google Cloud SQL for PostgreSQL, you’re likely aware that it sets a maximum number of connections for your instance at creation. This number isn’t static. It depends on the size — or rather, the machine type of the instance. For example, if you choose the db-f1-micro (the smallest available tier), you are capped at 25 connections by default.

And it might happen that you’ve already experienced an early wakeup call when your application suddenly ran out of connection and triggered an error like C: 53300: remaining connection slots are reserved…. That means you have run out of connections.

Let’s discuss connections, database parameters, and the logic behind these limits. My goal is to clear up some of the common questions I hear in the community and in private conversations with developers. I’ve structured this post as a Q&A to address the most frequently asked questions.

Max Connections in Cloud SQL for Postgres

Is the connection number a hard limit?

In chats and conversations with developers, I often hear about how people try to solve the problem when they hit the max number of connections. Quite often, there is a general assumption that 25 connections for a db-f1-micro is a hard limit, and the only ways to tackle it are to change the shape of the instance or implement connection pooling.

I don’t have anything against both approaches — they might be fully justified. But it is not a hard limit, and you can change it by updating the max_connections database flag using either the Google Cloud Console or the gcloud SDK. Keep in mind that the change requires a short downtime to apply. Read more in the documentation.

Will it automatically change if the instance is resized?

If you change your Cloud SQL instance size and haven’t manually set the max_connections database flag, the value will change automatically according to your instance size. For example, if you have a db-f1-micro instance with 25 connections and increase the size to db-g1-small, your max_connections will increase to 50.

What if I set up a custom max_connections flag?

If you define your preferred max_connections as a database flag, it will stay the same even through instance reconfigurations like changing the size. This can play a trick on someone who upgrades the instance size in anticipation of a higher max_connections value, only to find that it hasn’t changed.

What are the factors impacting the max_connections?

The main factor is the instance memory. In PostgreSQL, you have shared memory for the data pages you work with, and each session also has its own individual memory. This is a bit of a simplified description, but it is probably enough to explain the memory impact. Let’s go down to the memory allocations.

You need shared buffers to work with your data — each page of data from a table or index is copied to this area so it can be accessed by your session. The more data you have and the more you need to work with, the more shared buffers you’ll need for better performance. Otherwise, you will be constantly moving data to and from the disk.

When you connect to the instance, your session allocates roughly 2 MB of memory. Then, as you work with data, the memory allocation depends on your operations and the value of the work_mem parameter. By default, it is 4 MB. If you have a sorting or hashing operation in your query (like an ORDER BY), your session will allocate those 4 MB for that operation. However, that is per operation — a single query could potentially run multiple sorting or hashing tasks, multiplying the memory allocation.

Additionally, background processes like vacuuming and logical replication require memory too.

I will live out some other details like temporary buffers and others, operation system processes and caches. You can check all that in the PostgreSQL documentation. But considering all of this, the 25 max_connections limit for a db-f1-micro instance with only 600 MB of memory doesn’t look so small anymore.

And if we look at a graph of default max_connections values relative to memory, we can see it isn’t linear. Different factors have different impacts as the instance grows in size and the number of potential connections increases.

Can I set up max_connections to a higher or lower value?

Yes, you can, but you must consider all factors regarding potential memory allocation. If you run out of available memory, your connection will be terminated by an Out of Memory (OOM) error. Additionally, keep in mind that once you set max_connections as a database flag, it will no longer change dynamically based on the instance size; you will need to update the value manually if you decide to resize the instance.

How can I handle thousands of connections?

If your application requires hundreds or thousands of concurrent connections, the best approach is not to increase the max_connections flag.

Instead, I recommend using connection pooling. Cloud SQL offers Managed Connection Pooling (MCP), which is available in the Enterprise Plus edition. Alternatively, you can set up your own using tools like PgBouncer. Connection pooling solutions act as a proxy between your application and the database. They allow thousands of lightweight client connections- from sources like serverless scripts — to share a small, fixed number of heavy backend server connections. Read more about managed connection pooling in documentation.

Summary

The default max_connections value is tied to the instance memory and adjusts automatically as you resize the instance — unless you have manually defined it as a database flag.

It’s important to remember that the default value is based on best practices and serves as guidance, not a hard limit. However, if you choose to set a custom value, you are responsible for managing it. You’ll need to remember to adjust it manually if you ever resize your instance.

If your application is designed to use a high volume of connections, your best bet might be to implement a connection pooling solution.

Setting up an AlloyDB Instance with a Public IP in Minutes

Gleb Otochkin — Tue, 14 Apr 2026 18:35:08 +0000

Introduction

Sometimes when you’re building a proof of concept or a quick demo, you just need a simple database backend with a public IP address. What if I told you that you can create an AlloyDB instance with a public IP without having to dive into private IP network configuration? Let me show you how to do exactly that.

Using gcloud CLI

AlloyDB is an enterprise-grade, fully PostgreSQL-compatible database with tons of unique features, making it a Swiss Army knife of a data backend for any kind of application. As such, it comes with only a private IP by default. This makes it more secure, but at the same time, requires additional actions at the network level. However, as I mentioned in the intro, sometimes you just want something quick and dirty to verify functionality or run a demo. So, how do you create an AlloyDB instance with a public IP enabled in a few quick steps?

I am going to use a command-line approach and show you how to create the smallest possible AlloyDB instance with a public IP without configuring a private network. I am using a brand-new project with a default network.

If this is a brand-new project, we still need to enable the minimum required APIs. Run the following command in Google Cloud Shell or from your Mac terminal. I am assuming you already have all the required privileges in the project.

gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       servicenetworking.googleapis.com

And then run the command to create your AlloyDB cluster.

export PGPASSWORD="MyVeryStrictPassword123+"
echo $PGPASSWORD
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --region=$REGION \
    --password=$PASSWORD \
    --enable-private-service-connect

Did you notice the --enable-private-service-connect parameter? This creates a Private Service Connect (PSC) enabled AlloyDB cluster. Once the cluster is created, run the following command to create the primary instance. For my tests, when they don’t require a large cache or heavy CPU power, I usually opt for the C4A machine type with a single CPU — it is enough to demonstrate functionality and costs less than other configurations.

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --machine-type=c4a-highmem-1 \
    --cpu-count=1 \
    --availability-type=ZONAL \
    --database-flags=password.enforce_complexity=on \
    --assign-inbound-public-ip=ASSIGN_IPV4 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

And that’s it — after 5–8 minutes, you will have an AlloyDB primary instance running with only a public IP enabled. From there, you can use gcloud to connect to the instance, create a database, and run your queries. For more details on that step, take a look at one of my previous posts where I explain how to use gcloud to connect to AlloyDB.

gcloud beta alloydb connect $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --public-ip

Using Terraform

Of course, you can also use Terraform Google provider to automate this process. Below is the alloydb-poc.tf configuration file:

terraform {
  required_providers {
    google = {
      source = "hashicorp/google"
      version = ">= 7.0.0"
    }
  }
}
provider "google" {
  project = var.project_id
  region = var.region
}
resource "google_alloydb_cluster" "default" {
  cluster_id = var.cluster_id
  location = var.region
  psc_config {
    psc_enabled = true
  }
  initial_user {
    user = "postgres"
    password = var.db_password
  }
}
resource "google_alloydb_instance" "primary" {
  cluster = google_alloydb_cluster.default.name
  instance_id = var.instance_id
  instance_type = "PRIMARY"
  availability_type = "ZONAL"
  machine_config {
    machine_type="c4a-highmem-1"
    cpu_count = 1
  }
  database_flags = {
    "password.enforce_complexity" = "on"
  }
  network_config {
    enable_public_ip = true
  }
}
variable "project_id" {
  description = "The GCP Project ID"
  type = string
}
variable "region" {
  description = "The GCP Region (e.g., us-central1)"
  type = string
  default = "us-central1"
}
variable "cluster_id" {
  description = "The name of the AlloyDB cluster"
  type = string
  default = "alloydb-aip-01"
}
variable "instance_id" {
  description = "The name of the primary instance"
  type = string
  default = "alloydb-aip-01-pr"
}
variable "db_password" {
  description = "Password for the default postgres user (must meet complexity requirements!)"
  type = string
  sensitive = true
}
output "alloydb_public_ip" {
  description = "The Public IP address assigned to the AlloyDB primary instance"
  value = google_alloydb_instance.primary.public_ip_address
}

Run your Terraform deployment using the commands below, but remember to replace the YOUR_PROJECT_ID placeholder with your actual Google Cloud project ID and put your own password:

terraform init
terraform apply \
  -var="project_id=YOUR_PROJECT_ID" \
  -var="db_password=MyVeryStrictPassword123+"

In just a few minutes, you’ll have an AlloyDB instance up and running with a public IP.

I hope this makes your life easier and serves as a handy ‘hack’ for your daily workflow, whether you’re building a quick proof of concept or jumping into a vibe coding session. By the way, you can use this method for some codelabs like this one when you prefer to use your local machine instead of the Google Cloud shell.

AlloyDB easy connection using gcloud

Gleb Otochkin — Sat, 11 Apr 2026 05:17:47 +0000

Introduction

For those who work with Google AlloyDB for PostgreSQL on a daily basis, connectivity is likely not an issue. As a developer or administrator, you probably already have a preferred set of tools and connection methods, or you use AlloyDB Studio to run quick queries.

However, when starting a new project or cluster — or if you are new to AlloyDB and looking for a straightforward way to connect via the command line with proper mTLS encryption — there is a new method available. You can now connect to an AlloyDB instance using the Google Cloud SDK (gcloud). Let me introduce you to how it works.

Setting up

The gcloud CLI is available out of the box on Google Cloud Shell and on Google Cloud Compute Engine VMs created with Google-provided templates. But, if you are running it from your laptop or a custom VM, you may need to install it by following the official documentation.

You also need two additional components to get everything working: a PostgreSQL client and the AlloyDB Auth Proxy. If you are using Google Cloud Shell, both components come preinstalled. However, if you are using your own laptop or VM, you will need to add them manually.

Let’s assume you’ve already installed the gcloud CLI. But before moving forward check the version using the versioncommand:

gcloud --version

It should return version 563 or higher.

my-mac:~ $ gcloud --version
Google Cloud SDK 564.0.0
...

If the version is lower then you need to update it:

gcloud components update

The next step is to get the PostgreSQL client software. The latest PostgreSQL client software can be downloaded from the official website, or installed via a package manager for Linux or the brew utility for macOS. Detailed instructions can be found in our documentation. Here is how to install version 18 using the Homebrew utility on a Mac:

brew install postgresql@18

After installing you can verify it by checking the version:

psql --version

The final component is the AlloyDB Auth Proxy. The gcloud CLI uses this proxy to create an mTLS-encrypted connection and establish a link to the instance. On a Mac, run the following commands:

URL="https://storage.googleapis.com/alloydb-auth-proxy/v1.14.2"
curl -o alloydb-auth-proxy "$URL/alloydb-auth-proxy.darwin.arm64"
chmod +x alloydb-auth-proxy
mkdir $HOME/bin
mv alloydb-auth-proxy $HOME/bin/
export PATH="$HOME/bin:$PATH"

See the AlloyDB Auth proxy documentation on how to install it for other platforms. By the way if your gcloud CLI cannot find the AlloyDB Auth Proxy in your system’s PATH, it will automatically provide instructions on how to install it.

Once all components are installed, you can run the gcloud beta alloydb instances connect command to access your database via the mTLS encryption provided by the proxy. Additionally, if you are connecting using an AlloyDB public IP, you do not need to add your personal public IP to the authorized networks; the proxy handles this automatically. Here is how I connect to my AlloyDB instance while testing a codelab:

REGION=us-central1
CLUSTER_NAME=alloydb-aip-01
INSTANCE_NAME=alloydb-aip-01-pr
gcloud beta alloydb connect $INSTANCE_NAME --cluster=$CLUSTER_NAME --region=$REGION --public-ip

Then you type your password for the user postgres and you are in.

my-mac:~ $ REGION=us-central1
my-mac:~ $ CLUSTER_NAME=alloydb-aip-01
my-mac:~ $ INSTANCE_NAME=alloydb-aip-01-pr
my-mac:~ $ gcloud beta alloydb connect $INSTANCE_NAME --cluster=$CLUSTER_NAME --region=$REGION --public-ip
Starting the AlloyDB Auth Proxy...
Running command:
 alloydb-auth-proxy projects/gleb-genai-002/locations/us-central1/clusters/alloydb-aip-01/instances/alloydb-aip-01-pr --port 9471 --public-ip

Connecting to the AlloyDB Auth Proxy...
Running command:
 psql -h 127.0.0.1 -p 9471 -U postgres -d postgres
Password for user postgres:
psql (18.0 (Postgres.app), server 16.11)
Type "help" for help.

postgres=>

A few final notes: You must be authenticated with an account that has the proper permissions to connect to AlloyDB instances, specifically the roles/alloydb.cloent and roless/serviceUsageConsumer roles. Additionally, if you want to use IAM authentication, ensure it is enabled on your AlloyDB cluster, that you have the roles/alloydb.databaseUser role, and that you have created the IAM user within the cluster itself.

As of the time of writing, this feature is still in Preview. And once more — you may need to update your Google Cloud SDK to version 563.0.0 or higher. For more details on available flags and configurations, refer to the official documentation regarding connecting via the gcloud CLI.

Happy testing! You can try it with some of our latest AlloyDB codelabs.

Talk to Your Data: Analyze Data in AlloyDB Using Natural Language

Gleb Otochkin — Sat, 07 Feb 2026 05:22:03 +0000

SQL is language for your data

If you are working with databases as an analyst or developer you are probably quite familiar with SQL, or Structured Query Language. This is the language you use to work with data, extract and aggregate information, analyze and build the entire database backend. The language itself is easy and difficult at the same time depending on what you want to do and how complicated your data schema is. Modern AI models can translate natural language requests to SQL queries relatively well but the devil is in the details. Did I mention that SQL in Oracle can be slightly different from SQL in Postgres? And to write a good SQL query working correctly with your data you most likely need to dig down into the data, understand dependencies between tables and columns and how to combine them together to achieve the results.

So, how to make it reliable and make sure the AI model knows enough to make it working? That’s the main topic of this blog. AlloyDB has a new set of functionswhich can help you to create complex SQL queries using a natural language request.

Components

I am starting with some basic components and functions required to enable NL2SQL (Natural Language To SQL) capabilities in AlloyDB. Just to be on the safe side — at the time when the blog is written the feature is still in preview and some things can be changed in the final version.

The functionality is provided by the alloydb_ai_nl extension. You can read in the documentation how to enable the extension and make it working in your database.

Once the feature is enabled you can create a basic configuration by one simple command using alloydb_ai_nl.g_create_configuration function. That provides you basic functionality and you can already try to generate queries using alloydb_ai_nl.get_sql function. In this mode AlloyDB translates your natural language query to a SQL primarily based on tables and columns names making logical connections between different relations. That’s not too bad and you might get some results out of the box. But … if your data are not in the “public” schema then it is not useful since by default it checks only tables and views in that “public” schema. And what would happen if you have similar table names or same name columns in different tables? In such a case you really need to know your data. So, we have to give more information about the data and tables layout to the AlloyDB natural language processing.

Let us dive into the process and go through the general steps to make the best out of the feature.

The first step as we’ve already mentioned is to create a configuration using the alloydb_ai_nl.g_create_configuration function. That will create some kind of container for all future information about our data.

Then we register our schema in the configuration using alloydb_ai_nl.g_manage_configuration function. And when I say “schema” I mean the Postgres schema where you create your database objects. By default it is “public” but if you are serious about data separation and access you might use a dedicated schema for your application tables, indexes and other objects. Here is how you register schemas ecomm and public for the natural language configuration. In the example we have named our natural language configuration as cymbal_ecomm_config.

SELECT
  alloydb_ai_nl.g_manage_configuration(
    operation => 'register_schema',
    configuration_id_in => 'cymbal_ecomm_config',
    schema_names_in => '{ecomm,public}'
  );

We can configure one or multiple schemas in the same configuration. It can be useful when you want to build cross schema analytical queries for example.

When you register a schema in the configuration it is getting first knowledge about your data and can build queries based on that information. It will try to build the query based primarily on the tables metadata but it might not be enough. To make it more reliable and accurate we need to check the actual contents of the tables and their dependencies.

To build that information layer about your data we create a schema context. In the automatic mode it will analyze all your tables and columns in the registered schema trying to understand dependencies and what exactly is stored there. That is done using the alloydb_ai_nl.generate_schema_context function. Here is an example of generating context for our cymbal_ecomm_config configuration.

SELECT
  alloydb_ai_nl.generate_schema_context(
    nl_config_id => 'cymbal_ecomm_config',
    overwrite_if_exist => TRUE
  );

After the execution, which can take some time, the generated information will be stored in the internal tables but not yet applied. You can review it before applying using the alloydb_ai_nl.generated_schema_context_view.

SELECT schema_object, object_context
FROM alloydb_ai_nl.generated_schema_context_view;

And you can be more specific and read the generated context for a particular table or a column. For example, if we want to get information about the ecomm.events table we can run the following.

SELECT
  object_context
FROM
  alloydb_ai_nl.generated_schema_context_view
WHERE
  schema_object = 'ecomm.events';

If you are not satisfied with the result you can update the context for the table using the alloydb_ai_nl.update_generated_relation_context function. In my experience in most of the cases the automatically generated context is mostly correct and doesn’t require additional correction.

Then you can choose what context you want to be used for the query generation. You might choose to apply all of it or, for example, only for a particular table. Here is an example of how to apply it only for the ecomm.events table.

SELECT alloydb_ai_nl.apply_generated_relation_context(
  relation_name => 'ecomm.events', 
  overwrite_if_exist => TRUE
);

You already noticed optional parameter overwrite_if_exist for the managing context functions. It commands to replace any existing context by the new one. It helps to redefine context from time to time making it better.

After applying the context it disappears from the alloydb_ai_nl.generated_schema_context_view and starts to be used for all the new queries generations.

By the way you also can add your custom application context based on your internal domestic knowledge about queries patterns and conditions. You can read about it more in the documentation.

That can be sufficient for some applications but what if we have some particular queries patterns where we use some domestic functions or maybe certain predicates to be used? In such a case you might look at the query templates. A query template can be added to the configuration based on the natural language intent and define the query structure to be used to get reliable and deterministic execution for the known query patterns specific for your business. Query templates support intent parametrization and query fragments to make it more flexible.

There are functions in the allydb_ai_nl extension to manage the query templates and fragments. Here is an example of how to add a query template:

SELECT alloydb_ai_nl.add_template(
    nl_config_id => 'cymbal_ecomm_config',
    intent => 'List the last names and the country of all customers who bought products of `Republic Outpost` in the last year.',
    sql => 'SELECT DISTINCT u."last_name", u."country" FROM "ecomm"."users" AS u INNER JOIN "ecomm"."order_items" AS oi ON u.id = oi."user_id" INNER JOIN "ecomm"."products" AS ep ON oi.product_id = ep.id WHERE ep.brand = ''Republic Outpost'' AND oi.created_at >= DATE_TRUNC(''year'', CURRENT_DATE - INTERVAL ''1 year'') AND oi.created_at < DATE_TRUNC(''year'', CURRENT_DATE)',
    sql_explanation => 'To answer this question, JOIN `ecomm.users` with `ecom.order_items` on having the same `users.id` and `order_items.user_id`, and JOIN the result with ecom.products on having the same `order_items.product_id` and `products.id`. Then filter rows with products.brand = ''Republic Outpost'' and by `order_items.created_at` for the last year. Return the `last_name` and the `country` of the users with matching records.',
    check_intent => TRUE
);

Or disable the query template

SELECT alloydb_ai_nl.disable_template(INPUT template_id);

And you can automatically generate query templates based on your query history using alloydb_ai_nl.generate_templates functions.

SELECT
  alloydb_ai_nl.generate_templates(
    'cymbal_ecomm_config',
);

In addition to all that configuration options you also can create value indexes based on samples of your data in the tables. The value index provides associations between column name plus values in the column and a concept type which can be for example a city, country or name. It can associate a value used in the natural language request with a potential concept type and what table and column can be used in the resulting SQL. So if somebody asks “How many Clades do we have?” — the value index can help to figure out that the “Clades” in the request is the brand name, not a name for a product. You can get more information about concepts types and value indexes in the guide.

If you combine all those components together it can help you to avoid uncertainty and make the natural language to SQL reliable and predictable. At the same time the fragments and deep knowledge of your data helps to generate queries for deep analysis and still avoid disambiguation.

Summary

The AlloyDB NL2SQL makes your natural language to SQL processing robust and enterprise ready, preventing disambiguation and saving from the non-deterministic nature of AI models. At the same time it is still flexible enough to make it useful and dynamic, helping data analysts to dig in through data generating reports and helping with analysis.

You can try it now and let us know what you think. Start from the Google Cloud codelab and test all the listed NL2SQL features in your own project. Happy testing.

AlloyDB Agentic RAG Application with MCP Toolbox [Part 2]

Gleb Otochkin — Tue, 25 Nov 2025 21:18:50 +0000

This is Part 2 of the AlloyDB Agentic RAG application codelab, please start with Part 1.

7. Deploy the MCP Toolbox to Cloud Run

Now we can deploy the MCP Toolbox to Cloud Run. There are different ways how the MCP toolbox can be deployed. The simplest way is to run it from the command line but if we want to have it as a scalable and reliable service then Cloud Run is a better solution.

Prepare Client ID

To use booking functionality of the application we need to prepare OAuth 2.0 Client ID using Cloud Console. Without it we cannot sign into the application with our Google credentials to make a booking and record the booking to the database.

In the Cloud Console go to the APIs and Services and click on "OAuth consent screen". Here is a link to the page. It will open the Oauth Overview page where we click Get Started.

On the next page we provide the application name, user support email and click Next.

On the next screen we choose Internal for our application and click Next again.

Then again we provide contact email and click Next

Then we agree with Google API services policies and push the Create button.

It will lead us to the page where we can create an OAuth client.

On the screen we choose "Web Application" from the dropdown menu, put "Cymbal Air" as application and push the Add URI button.

The URIs represent trusted sources for the application and they depend on where you are trying to reach the application from. We put "http://localhost:8081" as authorized URI and "http://localhost:8081/login/google" as redirect URI. Those values would work if you put in your browser "http://localhost:8081" as a URI for connection. For example, when you connect through an SSH tunnel from your computer for example. I will show you how to do it later.

After pushing the "Create" button you get a popup window with your clients credentials. And the credentials will be recorded in the system. You always can copy the client ID to be used when you start your application.

Later you will see where you provide that client ID.

Create Service Account

We need a dedicated service account for our Cloud Run service with all required privileges. For our service we need access to AlloyDB and Cloud Secret Manager. As for the name for the service account we are going to use toolbox-identity.

Open another Cloud Shell tab using the sign "+" at the top.

In the new cloud shell tab execute:

export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create toolbox-identity

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/alloydb.client"
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

Please pay attention if you have any errors. The command is supposed to create a service account for cloud run service and grant privileges to work with secret manager, database and Vertex AI.

Close the tab by either pressing ctrl+d or executing command "exit" in the tab:

exit

Prepare MCP Toolbox Configuration

Prepare configuration file for the MCP Toolbox. You can read about all configuration options in the documentation but here we are going to use the sample tools.yaml file and replace some values such as cluster and instance name, AlloyDB password and the project id by our actual values.

Export AlloyDB Password:

export PGPASSWORD=<noted AlloyDB password>

Export client ID we prepared in the previous step:

export CLIENT_ID=<noted OAuth 2.0 client ID for our application>

Prepare configuration file.

PROJECT_ID=$(gcloud config get-value project)
ADBCLUSTER=alloydb-aip-01
sed -e "s/project: retrieval-app-testing/project: $(gcloud config get-value project)/g" \
-e "s/cluster: my-alloydb-cluster/cluster: $ADBCLUSTER/g" \
-e "s/instance: my-alloydb-instance/instance: $ADBCLUSTER-pr/g" \
-e "s/password: postgres/password: $PGPASSWORD\\n    ipType: private/g" \
-e "s/^ *clientId: .*/    clientId: $CLIENT_ID/g" \
cymbal-air-toolbox-demo/tools.yaml >~/tools.yaml

If you look into the file section defining the target data source you will see that we also added a line to use private IP for connection.

sources:
  my-pg-instance:
    kind: alloydb-postgres
    project: gleb-test-short-003-471020
    region: us-central1
    cluster: alloydb-aip-01
    instance: alloydb-aip-01-pr
    database: assistantdemo
    user: postgres
    password: L23F...
    ipType: private
authServices:
  my_google_service:
    kind: google
    clientId: 96828*******-***********.apps.googleusercontent.com

Create a secret using the tools.yaml configuration as a source.

In the VM ssh console execute:

gcloud secrets create tools --data-file=tools.yaml

Expected console output:

student@instance-1:~$ gcloud secrets create tools --data-file=tools.yaml
Created version [1] of the secret [tools].

Deploy the MCP Toolbox as a Cloud Run Service

Now everything is ready to deploy the MCP Toolbox as a service to Cloud Run. For local testing you can run "./toolbox –tools-file=./tools.yaml" but if we want our application to run in the cloud the deployment in Cloud Run makes much more sense.

In the VM SSH session execute:

export IMAGE=us-central1-docker.pkg.dev/database-toolbox/toolbox/toolbox:latest
gcloud run deploy toolbox \
    --image $IMAGE \
    --service-account toolbox-identity \
    --region us-central1 \
    --set-secrets "/app/tools.yaml=tools:latest" \
    --args="--tools-file=/app/tools.yaml","--address=0.0.0.0","--port=8080" \
    --network default \
    --subnet default \
    --no-allow-unauthenticated

Expected console output:

student@instance-1:~$ export IMAGE=us-central1-docker.pkg.dev/database-toolbox/toolbox/toolbox:latest
gcloud run deploy toolbox \
    --image $IMAGE \
    --service-account toolbox-identity \
    --region us-central1 \
    --set-secrets "/app/tools.yaml=tools:latest" \
    --args="--tools-file=/app/tools.yaml","--address=0.0.0.0","--port=8080" \
    --network default \
    --subnet default \
    --no-allow-unauthenticated
Deploying container to Cloud Run service [toolbox] in project [gleb-test-short-002-470613] region [us-central1]
✓ Deploying new service... Done.                                                                                                                                                                                                
  ✓ Creating Revision...                                                                                                                                                                                                        
  ✓ Routing traffic...                                                                                                                                                                                                          
Done.                                                                                                                                                                                                                           
Service [toolbox] revision [toolbox-00001-l9c] has been deployed and is serving 100 percent of traffic.
Service URL: https://toolbox-868691532292.us-central1.run.app

student@instance-1:~$

Verify The Service

Now we can check if the service is up and we can access the endpoint. We use gcloud utility to get the retrieval service endpoint and the authentication token. Alternatively you can check the service URI in the cloud console.

You can copy the value and replace in the curl command the "$(gcloud run services list –filter="(toolbox)" –format="value(URL)" part .

Here is how to get the URL dynamically from the command line:

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" $(gcloud  run services list --filter="(toolbox)" --format="value(URL)")

Expected console output:

student@instance-1:~$ curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" $(gcloud  run services list --filter="(toolbox)" --format="value(URL)")
🧰 Hello, World! 🧰student@instance-1:~$

If we see the "Hello World" message it means our service is up and serving the requests.

8. Deploy Sample Application

Now when we have the retrieval service up and running we can deploy a sample application. The application represents an online airport assistant which can give you information about flights, airports and even book a flight based on the flights and airport data from our database.

The application can be deployed locally, on a VM in the cloud or any other service like Cloud Run or Kubernetes. Here we are going to show how to deploy it on the VM first.

Prepare the environment

We continue to work on our VM using the same SSH session. To run our application we need some Python modules and we have already added them when we initiated our database earlier. Let's switch to our Python virtual environment and change our location to the app directory.

In the VM SSH session execute:

source ~/.venv/bin/activate
cd cymbal-air-toolbox-demo

Expected output (redacted):

student@instance-1:~$ source ~/.venv/bin/activate
cd cymbal-air-toolbox-demo
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$

Run Assistant Application

Before starting the application we need to set up some environment variables. The basic functionality of the application such as query flights and airport amenities requires only TOOLBOX_URL which points application to the retrieval service. We can get it using the gcloud command .

In the VM SSH session execute:

export TOOLBOX_URL=$(gcloud  run services list --filter="(toolbox)" --format="value(URL)")

Expected output (redacted):

student@instance-1:~/cymbal-air-toolbox-demo$ export BASE_URL=$(gcloud  run services list --filter="(toolbox)" --format="value(URL)")

To use more advanced capabilities of the application like booking and changing flights we need to sign-in to the application using our Google account and for that purpose we need to provide CLIENT_ID environment variable using the OAuth client ID from the Prepare Client ID chapter:

export CLIENT_ID=215....apps.googleusercontent.com

Expected output (redacted):

student@instance-1:~/cymbal-air-toolbox-demo$ export CLIENT_ID=215....apps.googleusercontent.com

And now we can run our application:

python run_app.py

Expected output:

student@instance-1:~/cymbal-air-toolbox-demo/llm_demo$ python run_app.py
INFO:     Started server process [2900]
INFO:     Waiting for application startup.
Loading application...
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8081 (Press CTRL+C to quit)

Connect to the Application

You have several ways to connect to the application running on the VM. For example you can open port 8081 on the VM using firewall rules in the VPC or create a load balancer with public IP. Here we are going to use a SSH tunnel to the VM translating the local port 8080 to the VM port 8081.

Connecting From Local Machine

When we want to connect from a local machine we need to run a SSH tunnel. It can be done using gcloud compute ssh:

gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8081:localhost:8081

Expected output:

student-macbookpro:~ student$ gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8080:localhost:8081
Warning: Permanently added 'compute.7064281075337367021' (ED25519) to the list of known hosts.
Linux instance-1.us-central1-c.c.gleb-test-001.internal 6.1.0-21-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
student@instance-1:~$

Now we can open the browser and use http://localhost:8081 to connect to our application. We should see the application screen.

Connecting From Cloud Shell

Alternatively we can use Google Cloud Shell to connect. Open another Cloud Shell tab using the sign "+" at the top.

In the new tab get the origin and redirect URI for your web client executing the gcloud command:

echo "origin:"; echo "https://8080-$WEB_HOST"; echo "redirect:"; echo "https://8080-$WEB_HOST/login/google"

Here is the expected output:

student@cloudshell:~ echo "origin:"; echo "https://8080-$WEB_HOST"; echo "redirect:"; echo "https://8080-$WEB_HOST/login/google"
origin:
https://8080-cs-35704030349-default.cs-us-east1-rtep.cloudshell.dev
redirect:
https://8080-cs-35704030349-default.cs-us-east1-rtep.cloudshell.dev/login/google

And use the origin and the redirect of URIs as the "Authorized JavaScript origins" and "Authorized redirect URIs" for our credentials created in the "Prepare Client ID" chapter replacing or adding to the originally provided http://localhost:8080 values.

Click on "Cymbal Air" on the OAuth 2.0 client IDs page.

Put the origin and redirect URIs for the Cloud Shell and push the Save button.

In the new cloud shell tab start the tunnel to your VM by executing the gcloud command:

gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8080:localhost:8081

If it will show an error "Cannot assign requested address" - please ignore it.

Here is the expected output:

student@cloudshell:~ gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8080:localhost:8081
bind [::1]:8081: Cannot assign requested address
inux instance-1.us-central1-a.c.gleb-codelive-01.internal 6.1.0-21-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat May 25 19:15:46 2024 from 35.243.235.73
student@instance-1:~$

It opens port 8080 on your cloud shell which can be used for the "Web preview".

Click on the "Web preview" button on the right top of your Cloud Shell and from the drop down menu choose "Preview on port 8080"

It opens a new tab in your web browser with the application interface. You should be able to see the "Cymbal Air Customer Service Assistant" page.

Sign into the Application

When everything is set up and your application is open we can use the "Sign in" button at the top right of our application screen to provide our credentials. That is optional and required only if you want to try booking functionality of the application.

It will open a pop-up window where we can choose our credentials.

After signing in the application is ready and you can start to post your requests into the field at the bottom of the window.

This demo showcases the Cymbal Air customer service assistant. Cymbal Air is a fictional passenger airline. The assistant is an AI chatbot that helps travelers to manage flights and look up information about Cymbal Air's hub at San Francisco International Airport (SFO).

Without signing in (without CLIENT_ID) it can help answer users questions like:

When is the next flight to Denver?

Are there any luxury shops around gate C28?

Where can I get coffee near gate A6?

Where can I buy a gift?

Please find a flight from SFO to Denver departing today

When you are signed in to the application you can try other capabilities like booking flights or check if the seat assigned to you is a window or aisle seat.

The application uses the latest Google foundation models to generate responses and augment it by information about flights and amenities from the operational AlloyDB database. You can read more about this demo application on the Github page of the project.

9. Clean up environment

Now when all tasks are completed we can clean up our environment

Delete Cloud Run Service

In Cloud Shell execute:

gcloud run services delete toolbox --region us-central1

Expected console output:

student@cloudshell:~ (gleb-test-short-004)$ gcloud run services delete retrieval-service --region us-central1
Service [retrieval-service] will be deleted.

Do you want to continue (Y/n)?  Y

Deleting [retrieval-service]...done.                                                                                                                                                                                                                 
Deleted service [retrieval-service].

Delete the Service Account for cloud run service

In Cloud Shell execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts delete toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com --quiet

Expected console output:

student@cloudshell:~ (gleb-test-short-004)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-222]
student@cloudshell:~ (gleb-test-short-004)$ gcloud iam service-accounts delete retrieval-identity@$PROJECT_ID.iam.gserviceaccount.com --quiet
deleted service account [retrieval-identity@gleb-test-short-004.iam.gserviceaccount.com]
student@cloudshell:~ (gleb-test-short-004)$

Destroy the AlloyDB instances and cluster when you are done with the lab.

Delete AlloyDB cluster and all instances

If you've used the trial version of AlloyDB. Do not delete the trial cluster if you have plans to test other labs and resources using the trial cluster. You will not be able to create another trial cluster in the same project.

The cluster is destroyed with option force which also deletes all the instances belonging to the cluster.

In the cloud shell define the project and environment variables if you've been disconnected and all the previous settings are lost:

gcloud config set project <your project id>

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
export PROJECT_ID=$(gcloud config get-value project)

Delete the cluster:

gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

📝 Note: The command takes 3-5 minutes to execute

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

All of the cluster data will be lost when the cluster is deleted.

Do you want to continue (Y/n)?  Y

Operation ID: operation-1697820178429-6082890a0b570-4a72f7e4-4c5df36f
Deleting cluster...done.

Delete AlloyDB Backups

Delete all AlloyDB backups for the cluster:

📝 Note: The command will destroy all data backups for the cluster with name specified in environment variable

for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done
Operation ID: operation-1697826266108-60829fb7b5258-7f99dc0b-99f3c35f
Deleting backup...done.

Now we can destroy our VM

Delete GCE VM

In Cloud Shell execute:

export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet
Deleted

Delete the Service Account for GCE VM and The Retrieval service

In Cloud Shell execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts delete compute-aip@$PROJECT_ID.iam.gserviceaccount.com --quiet

Expected console output:

student@cloudshell:~ (gleb-test-short-004)$ PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts delete compute-aip@$PROJECT_ID.iam.gserviceaccount.com --quiet
Your active configuration is: [cloudshell-222]
deleted service account [compute-aip@gleb-test-short-004.iam.gserviceaccount.com]
student@cloudshell:~ (gleb-test-short-004)$

10. Congratulations

Congratulations for completing the codelab.

What we've covered

✅ How to deploy AlloyDB Cluster
✅ How to connect to the AlloyDB
✅ How to configure and deploy MCP Toolbox Service
✅ How to deploy a sample application using the deployed service

AlloyDB Agentic RAG Application with MCP Toolbox [Part 1]

Gleb Otochkin — Tue, 25 Nov 2025 21:17:28 +0000

1. Introduction

In this codelab, you will learn how to create an AlloyDB cluster, deploy the MCP toolbox, and configure it to use AlloyDB as a data source. You'll then build a sample interactive RAG application that uses the deployed toolbox to ground its requests.

You can get more information about the MCP Toolbox on the documentation page and the sample Cymbal Air application here.

This lab is part of a lab collection dedicated to AlloyDB AI features. You can read more on the AlloyDB AI page in documentation and see other labs.

Prerequisites

A basic understanding of the Google Cloud Console
Basic skills in command line interface and Google Cloud shell

What you'll learn

✅ How to deploy AlloyDB Cluster with Vertex AI integration
✅ How to connect to the AlloyDB
✅ How to configure and deploy MCP Tooolbox Service
✅ How to deploy a sample application using the deployed service

What you'll need

A Google Cloud Account and Google Cloud Project
A web browser such as Chrome

2. Setup and Requirements

Self-paced environment setup

Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.

⚠️ Caution: A project ID is globally unique and can't be used by anyone else after you've selected it. You are the only user of that ID. Even if a project is deleted, the ID can't be used again

📝 Note: If you use a Gmail account, you can leave the default location set to No organization. If you use a Google Workspace

Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.

3. Before you begin

Enable API

Output:

Please be aware that some resources you enable are going to incur some cost if you are not using the promotional tier. In normal circumstances if all the resources are destroyed upon completion of the lab the cost of all resources would not exceed $5. We recommend checking your billing and making sure the exercise is acceptable for you.

Inside Cloud Shell, make sure that your project ID is setup:

Usually the project ID is shown in parentheses in the command prompt in the cloud shell as it is shown in the picture:

gcloud config set project [YOUR-PROJECT-ID]

Then set the PROJECT_ID environment variable to your Google Cloud project ID:

PROJECT_ID=$(gcloud config get-value project)

Enable all necessary services:

gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       vpcaccess.googleapis.com \
                       aiplatform.googleapis.com \
                       cloudbuild.googleapis.com \
                       artifactregistry.googleapis.com \
                       run.googleapis.com \
                       iam.googleapis.com \
                       secretmanager.googleapis.com

Expected output

student@cloudshell:~ (gleb-test-short-004)$ gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       vpcaccess.googleapis.com \
                       aiplatform.googleapis.com \
                       cloudbuild.googleapis.com \
                       artifactregistry.googleapis.com \
                       run.googleapis.com \
                       iam.googleapis.com \
                       secretmanager.googleapis.com
Operation "operations/acf.p2-404051529011-664c71ad-cb2b-4ab4-86c1-1f3157d70ba1" finished successfully.

4. Deploy AlloyDB Cluster

Create AlloyDB cluster and primary instance. The following procedure describes how to create an AlloyDB cluster and instance using Google Cloud SDK. If you prefer the console approach you can follow the documentation here.

Before creating an AlloyDB cluster we need an available private IP range in our VPC to be used by the future AlloyDB instance. If we don't have it then we need to create it, assign it to be used by internal Google services and after that we will be able to create the cluster and instance.

Create private IP range

📝 Note: This step is required only if you don't already have an unused private IP range assigned to work with Google internal services.

We need to configure Private Service Access configuration in our VPC for AlloyDB. The assumption here is that we have the "default" VPC network in the project and it is going to be used for all actions.

Create the private IP range:

gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default

Create private connection using the allocated IP range:

gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default

📝 Note: The second command takes a couple of minutes to execute

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default
Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/global/addresses/psa-range].

student@cloudshell:~ (test-project-402417)$ gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default
Operation "operations/pssn.p24-4470404856-595e209f-19b7-4669-8a71-cbd45de8ba66" finished successfully.

student@cloudshell:~ (test-project-402417)$

Create AlloyDB Cluster

In this section we are creating an AlloyDB cluster in the us-central1 region.

Define password for the postgres user. You can define your own password or use a random function to generate one

export PGPASSWORD=`openssl rand -hex 12`

Expected console output:

student@cloudshell:~ (test-project-402417)$ export PGPASSWORD=`openssl rand -hex 12`

Note the PostgreSQL password for future use.

echo $PGPASSWORD

You will need that password in the future to connect to the instance as the postgres user. I suggest writing it down or copying it somewhere to be able to use later.

Expected console output:

student@cloudshell:~ (test-project-402417)$ echo $PGPASSWORD
bbefbfde7601985b0dee5723

Create a Free Trial Cluster

If you haven't been using AlloyDB before you can create a free trial cluster:

Define region and AlloyDB cluster name. We are going to use us-central1 region and alloydb-aip-01 as a cluster name:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

Run command to create the cluster:

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL

Expected console output:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.

Create an AlloyDB primary instance for our cluster in the same cloud shell session. If you are disconnected you will need to define the region and cluster name environment variables again.

📝 Note: The instance creation usually takes 6-10 minutes to complete

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.

Create AlloyDB Standard Cluster

If it is not your first AlloyDB cluster in the project proceed with creation of a standard cluster.

Define region and AlloyDB cluster name. We are going to use us-central1 region and alloydb-aip-01 as a cluster name:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

Run command to create the cluster:

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION

Expected console output:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION 
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.

Create an AlloyDB primary instance for our cluster in the same cloud shell session. If you are disconnected you will need to define the region and cluster name environment variables again.

📝 Note: The instance creation usually takes 6-10 minutes to complete

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.

Grant Necessary Permissions to AlloyDB

Add Vertex AI permissions to the AlloyDB service agent.

Open another Cloud Shell tab using the sign "+" at the top.

In the new cloud shell tab execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-11039]
student@cloudshell:~ (test-project-001-402417)$ gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
Updated IAM policy for project [test-project-001-402417].
bindings:
- members:
  - serviceAccount:service-4470404856@gcp-sa-alloydb.iam.gserviceaccount.com
  role: roles/aiplatform.user
- members:
...
etag: BwYIEbe_Z3U=
version: 1

Close the tab by either execution command "exit" in the tab:

exit

5. Prepare GCE Virtual Machine

We are going to use a Google Compute Engine (GCE) VM as our platform to work with the database and deploy different parts of the sample application. Using a VM gives us more flexibility in installed components and direct access to the private AlloyDB IP for data preparation steps.

Create Service Account

Since we will use our VM to deploy the MCP Toolbox as a service and deploy or host the sample application, the first step is to create a Google Service Account (GSA). The GSA will be used by the GCE VM, and we will need to grant it the necessary privileges to work with other services.

In the Cloud Shell execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create compute-aip --project $PROJECT_ID

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/cloudbuild.builds.editor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/artifactregistry.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/run.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/iam.serviceAccountUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/alloydb.viewer"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/alloydb.client"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com \
    --role roles/secretmanager.admin
Deploy GCE VM
Create a GCE VM in the same region and VPC as the AlloyDB cluster.

In Cloud Shell execute:

ZONE=us-central1-a
PROJECT_ID=$(gcloud config get-value project)
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --service-account=compute-aip@$PROJECT_ID.iam.gserviceaccount.com

Expected console output:

student@cloudshell:~ (test-project-402417)$ ZONE=us-central1-a
PROJECT_ID=$(gcloud config get-value project)
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --service-account=compute-aip@$PROJECT_ID.iam.gserviceaccount.com
Your active configuration is: [cloudshell-10282]
Created [https://www.googleapis.com/compute/v1/projects/gleb-test-short-002-470613/zones/us-central1-a/instances/instance-1].
NAME: instance-1
ZONE: us-central1-a
MACHINE_TYPE: n1-standard-1
PREEMPTIBLE: 
INTERNAL_IP: 10.128.0.2
EXTERNAL_IP: 34.28.55.32
STATUS: RUNNING

Install Postgres Client

Install the PostgreSQL client software on the deployed VM

Connect to the VM:

🗒️ Note: First time the SSH connection to the VM can take longer since the process includes creation of RSA key for secure connection and propagating the public part of the key to the project

gcloud compute ssh instance-1 --zone=us-central1-a

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud compute ssh instance-1 --zone=us-central1-a
Updating project ssh metadata...working..Updated [https://www.googleapis.com/compute/v1/projects/test-project-402417].                                                                                                                                                         
Updating project ssh metadata...done.                                                                                                                                                                                                                                              
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.5110295539541121102' (ECDSA) to the list of known hosts.
Linux instance-1 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
student@instance-1:~$

Install the software running command inside the VM:

sudo apt-get update
sudo apt-get install --yes postgresql-client

Expected console output:

student@instance-1:~$ sudo apt-get update
sudo apt-get install --yes postgresql-client
Get:1 file:/etc/apt/mirrors/debian.list Mirrorlist [30 B]
Get:4 file:/etc/apt/mirrors/debian-security.list Mirrorlist [39 B]
Hit:7 https://packages.cloud.google.com/apt google-compute-engine-bookworm-stable InRelease
Get:8 https://packages.cloud.google.com/apt cloud-sdk-bookworm InRelease [1652 B]
Get:2 https://deb.debian.org/debian bookworm InRelease [151 kB]
Get:3 https://deb.debian.org/debian bookworm-updates InRelease [55.4 kB]
...redacted...
update-alternatives: using /usr/share/postgresql/15/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (15+248) ...
Processing triggers for man-db (2.11.2-2) ...
Processing triggers for libc-bin (2.36-9+deb12u7) ...

Connect to the AlloyDB Instance

Connect to the primary instance from the VM using psql.

Continue with the opened SSH session to your VM. If you have been disconnected then connect again using the same command as above.

Use the previously noted $PGASSWORD and the cluster name to connect to AlloyDB from the GCE VM:

export PGPASSWORD=<Noted password>

PROJECT_ID=$(gcloud config get-value project)
REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres sslmode=require"

Expected console output:

student@instance-1:~$ PROJECT_ID=$(gcloud config get-value project)
REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres sslmode=require"
psql (15.13 (Debian 15.13-0+deb12u1), server 16.8)
WARNING: psql major version 15, server major version 16.
         Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=>

Exit from the psql session keeping the SSH connection up:

exit

Expected console output:

postgres=> exit
student@instance-1:~$

6. Initialize the database

We are going to use our client VM as a platform to populate our database with data and host our application. The first step is to create a database and populate it with data.

Create Database

Create a database with the name "assistantdemo".

In the GCE VM session execute:

📝 Note: If your SSH session was terminated you need to reset your environment variables such as:

export PGPASSWORD=

export REGION=us-central1

export ADBCLUSTER=alloydb-aip-01

export INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")

psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE assistantdemo"

Expected console output:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE assistantdemo"
CREATE DATABASE
student@instance-1:~$

Prepare Python Environment

To continue we are going to use prepared Python scripts from GitHub repository but before doing that we need to install the required software.

In the GCE VM execute:

sudo apt install -y python3.11-venv git
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip

Expected console output:

student@instance-1:~$ sudo apt install -y python3.11-venv git
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  git-man liberror-perl patch python3-distutils python3-lib2to3 python3-pip-whl python3-setuptools-whl
Suggested packages:
  git-daemon-run | git-daemon-sysvinit git-doc git-email git-gui gitk gitweb git-cvs git-mediawiki git-svn ed diffutils-doc
The following NEW packages will be installed:
  git git-man liberror-perl patch python3-distutils python3-lib2to3 python3-pip-whl python3-setuptools-whl python3.11-venv
0 upgraded, 9 newly installed, 0 to remove and 2 not upgraded.
Need to get 12.4 MB of archives.
After this operation, 52.2 MB of additional disk space will be used.
Get:1 file:/etc/apt/mirrors/debian.list Mirrorlist [30 B]
...redacted...
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-24.0
(.venv) student@instance-1:~$

Verify Python version.

In the GCE VM execute:

python -V

Expected console output:

(.venv) student@instance-1:~$ python -V
Python 3.11.2
(.venv) student@instance-1:~$

Install MCP Toolbox Locally

MCP Toolbox for Databases (later in the text MCP toolbox or toolbox) is an open source MCP server working with different data sources. It helps you to develop tools faster by providing a level of abstraction for different data sources and adding features like authentication and connection pooling. You can read about all the features on the official page.

We are going to use the MCP toolbox to initiate our sample dataset and later to be used as MCP server to handle data source requests from our application during Retrieval Augmented Generation (RAG) flow.

Let's install the MCP toolbox locally to populate the assistantdemo database.

In the GCE VM execute:

export VERSION=0.16.0
curl -O https://storage.googleapis.com/genai-toolbox/v$VERSION/linux/amd64/toolbox
chmod +x toolbox

Expected console output:

(.venv) student@instance-1:~$ export VERSION=0.16.0
curl -O https://storage.googleapis.com/genai-toolbox/v$VERSION/linux/amd64/toolbox
chmod +x toolbox
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  133M  100  133M    0     0   158M      0 --:--:-- --:--:-- --:--:--  158M

Run Toolbox for Data Initialization

In the GCE VM execute:

📝Note: If your SSH session was terminated by inactivity or any other reason you need to set your environment variables such as:

export PGPASSWORD=

REGION=us-central1

ADBCLUSTER=alloydb-aip-01

INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")

Export environment variables for database population:

export ALLOYDB_POSTGRES_PROJECT=$(gcloud config get-value project)
export ALLOYDB_POSTGRES_REGION="us-central1"
export ALLOYDB_POSTGRES_CLUSTER="alloydb-aip-01"
export ALLOYDB_POSTGRES_INSTANCE="alloydb-aip-01-pr"
export ALLOYDB_POSTGRES_DATABASE="assistantdemo"
export ALLOYDB_POSTGRES_USER="postgres"
export ALLOYDB_POSTGRES_PASSWORD=$PGPASSWORD
export ALLOYDB_POSTGRES_IP_TYPE="private"

Start toolbox for the database initiation. It will start the process locally which will help you to connect seamlessly to the destination database on AlloyDB to fill it up with sample data.

./toolbox --prebuilt alloydb-postgres

Expected console output. You should see in the last line of the output - "Server ready to serve!":

student@instance-1:~$ cexport ALLOYDB_POSTGRES_PROJECT=$PROJECT_ID
export ALLOYDB_POSTGRES_REGION="us-central1"
export ALLOYDB_POSTGRES_CLUSTER="alloydb-aip-01"
export ALLOYDB_POSTGRES_INSTANCE="alloydb-aip-01-pr"
export ALLOYDB_POSTGRES_DATABASE="assistantdemo"
export ALLOYDB_POSTGRES_USER="postgres"
export ALLOYDB_POSTGRES_PASSWORD=$PGPASSWORD
export ALLOYDB_POSTGRES_IP_TYPE="private"
student@instance-1:~$ ./toolbox --prebuilt alloydb-postgres
2025-09-02T18:30:58.957655886Z INFO "Using prebuilt tool configuration for alloydb-postgres" 
2025-09-02T18:30:59.507306664Z INFO "Initialized 1 sources." 
2025-09-02T18:30:59.50748379Z INFO "Initialized 0 authServices." 
2025-09-02T18:30:59.507618807Z INFO "Initialized 2 tools." 
2025-09-02T18:30:59.507726704Z INFO "Initialized 2 toolsets." 
2025-09-02T18:30:59.508258894Z INFO "Server ready to serve!"

Do not exit or close this tab of the Cloud Shell until data population is complete.

Populate Database

Open another Cloud Shell tab using the sign "+" at the top.

And connect to the instance-1 VM:

gcloud compute ssh instance-1 --zone=us-central1-a

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud compute ssh instance-1 --zone=us-central1-a
Linux instance-1 6.1.0-37-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.140-1 (2025-05-22) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Sep  2 21:44:07 2025 from 35.229.111.9
student@instance-1:~$

Clone the GitHub repository with the code for the retrieval service and sample application.

In the GCE VM execute:

git clone  https://github.com/GoogleCloudPlatform/cymbal-air-toolbox-demo.git

Expected console output:

student@instance-1:~$ git clone  https://github.com/GoogleCloudPlatform/cymbal-air-toolbox-demo.git
Cloning into 'cymbal-air-toolbox-demo'...
remote: Enumerating objects: 3481, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 3481 (delta 16), reused 7 (delta 5), pack-reused 3434 (from 3)
Receiving objects: 100% (3481/3481), 57.96 MiB | 6.04 MiB/s, done.
Resolving deltas: 100% (2549/2549), done.
student@instance-1:~

Please pay attention if you have any errors.

Prepare Python environment and install requirement packages:

source .venv/bin/activate
cd cymbal-air-toolbox-demo
pip install -r requirements.txt

Set Python path to the repository root folder and run script to populate the database with the sample dataset. The first command is adding a path to our Python modules to our environment and the second command is populating our database with the data.

export PYTHONPATH=$HOME/cymbal-air-toolbox-demo
python data/run_database_init.py

Expected console output(redacted). You should see "database init done" at the end:

student@instance-1:~$ source .venv/bin/activate
(.venv) student@instance-1:~$ 
(.venv) student@instance-1:~$ cd cymbal-air-toolbox-demo/
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$ pip install -r requirements.txt
python run_database_init.py
Collecting fastapi==0.115.0 (from -r requirements.txt (line 1))
  Downloading fastapi-0.115.0-py3-none-any.whl.metadata (27 kB)
Collecting google-auth==2.40.3 (from -r requirements.txt (line 2))
  Downloading google_auth-2.40.3-py2.py3-none-any.whl.metadata (6.2 kB)
Collecting google-cloud-aiplatform==1.97.0 (from google-cloud-aiplatform[evaluation]==1.97.0->-r requirements.txt (line 3))
  Downloading google_cloud_aiplatform-1.97.0-py2.py3-none-any.whl.metadata (36 kB)
Collecting itsdangerous==2.2.0 (from -r requirements.txt (line 4))
  Downloading itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
Collecting jinja2==3.1.5 (from -r requirements.txt (line 5))
  Downloading jinja2-3.1.5-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-community==0.3.25 (from -r requirements.txt (line 6))
  Downloading langchain_community-0.3.25-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain==0.3.25 (from -r requirements.txt (line 7))
...

(.venv) student@instance-1:~/cymbal-air-toolbox-demo$ 
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$ export PYTHONPATH=$HOME/cymbal-air-toolbox-demo
python data/run_database_init.py
Airports table initialized
Amenities table initialized
Flights table initialized
Tickets table initialized
Policies table initialized
database init done.
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$

You can close this tab now.

In the VM session execute:

exit

And in the Cloud Shell session press ctrl+d or execute :

exit

In the first tab with running MCP Toolbox press ctrl+c in to exit from the toolbox running session.

The database has been populated with sample data for the application.

You can verify it by connecting to the database and checking the number of rows in the airports table. You can use the psql utility as we've used before or AlloyDB Studio . here is how you can check it using psql

In the ssh session to instance-1 VM execute:

export PGPASSWORD=<Noted AlloyDB password>

REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres dbname=assistantdemo" -c "SELECT COUNT(*) FROM airports"

Expected console output:

student@instance-1:~$ REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres dbname=assistantdemo" -c "SELECT COUNT(*) FROM airports"
 count 
-------
  7698
(1 row)

The database is ready and we can move on to MCP Toolbox deployment.

You've completed Part 1 of the AlloyDB Agentic RAG application codelab, please continue to Part 2.

Cloud SQL vs. Specialized Databases: Choosing Your Vector Search Solution

Gleb Otochkin — Wed, 29 Oct 2025 04:32:59 +0000

Introduction

The popularity of vector search has exploded with the introduction of Generative AI, where it serves as the main engine for semantic search.

This demand initially led to specialized vector databases like Pinecone, Milvus and others, but now, most mainstream databases have also incorporated vector search capabilities.

This leaves developers with a critical decision: adopt a new, specialized database or use their existing one? The wrong choice can lead to poor performance and costly refactoring. This post explains why the best solution is often the simplest one — leveraging your current relational database.

In this post I explain when you don’t need a specialized vector solution and your relational database might be a better choice. I will use Cloud SQL for MySQL as one of the relational databases with full vector support.

Simplified architecture and operations

Using your existing database like Cloud SQL for MySQL for vectors greatly simplifies your architecture and daily operations for three key reasons.

First, unified data management means your vectors live with your operational data. This streamlines everything from security and backups to monitoring, eliminating the need to manage a separate database and a complex data pipeline.
Second, you use standard tools. There are no new SDKs or languages to learn, as you can manage vectors using the standard SQL you already know.
Finally, this creates a flat learning curve. Your team can adopt vector search immediately without having to learn a new distributed system or syntax.

Data consistency

Data in applications are not static and often the source data for the vector embeddings are subject to change too. When it happens the corresponding vector has to be updated.

If you use Cloud SQL as your vector store you can do it in the same transaction inserting, deleting or updating the corresponding data and the vector together. It will eliminate any possible discrepancy between the vector and the data it represents. If a transaction is rolled back, data consistency stays intact.
When your vectors are in the Cloud SQL for MySQL, your application doesn’t need to wait for a data pipeline (ETL) to sync changes from your application database to your vector store. This removes a potential point of failure and eliminates data lag. You always have the latest data and vectors for your search.
With a separate store for vectors you need to capture the changes in the database and transfer those changes to the vector database or introduce the vector store to your application making it part of the application. That might create additional problems synchronizing different application services caches or correct handling of rollbacks on application level.

Hybrid Search

One of the main benefits of keeping the data and vectors together is the ability to use filtering and hybrid searches using non-vectorized data.

For example, if you search a product based on a product description using embedding vectors for the description you can also introduce a filter on the product brand or combine it with other preferences from a user profile.
Pre-filtering combines B-Tree indexes on other columns with vector search, reducing the search space for the vectors. It can drastically improve performance and required memory.
In some cases if you get data from a vector search and try to apply post-filtering they can remove bulk of the returned dataset and effectively reduce the number of returned values in some cases to zero.

Lower Total Cost of Ownership

This is more of a business reason than a technical one, but that doesn’t make it any less important. Moving your vectors into Cloud SQL for MySQL can significantly reduce your bill. Here is the reasoning behind that statement:

No data transfer cost. All your data is in the same place in the same database and you don’t need to keep a pipeline and additional resources.
Consolidation of resources. All your data is stored, backed up and managed in the same environment. Instead of provisioning a new database just for vectors, you can utilize your existing resources.
Reduced engineering hours. Your engineers spend less time learning, deploying and maintaining separate systems to keep your vectors.

When a Vector Database is a better choice

Cloud SQL for MySQL can be one of the better choices to store and work with your vectors but sometimes it might be prudent to consider a specialized vector database.

Some advanced vector-specific features provided by some of the vector databases which are not available on the Cloud SQL. For example, the concept of namespaces in Pinecone can be appealing to some workloads.
A massive scale with billions of vectors. In such a case one of the specialized solutions like the Google Vector Search might be a more feasible destination.
Sometimes decoupling makes sense when the vectors are serving different applications in microservices deployments.

Want to know more?

This is just a first blog from a series of articles about vector search in Cloud SQL written by me and my colleagues. If you want to know more about KNN and ANN and what stands behind it please read a blog Vector Search: Demystifying ANN and KNN written by Shu Zhou.

Automating AlloyDB Operations

Gleb Otochkin — Sat, 30 Aug 2025 02:58:21 +0000

Introduction

One of the best features of cloud services is the management API. Let’s imagine you need to implement some automated tasks. For example, you want your database instance, or multiple instances to stop at certain times, scale up, or do something else like start an on-demand backup. In the case of an in-house deployment, you need to program everything by yourself from start to finish. And believe me, I’ve done it. It may sound easy, but it is not.

However, virtually every cloud service comes with an API to handle all those tasks. That API is documented, aligned with what the service can do, and in some cases, it also has a client SDK. AlloyDB is no exception, and it has a documented API that can be used for automation. In one of my previous blogs, I’ve written how to scale up primary instances using CPU monitoring. Here, I am going to show how you can automate some other tasks.

APIs

The AlloyDB API documentation is available on the main reference page. There, you can find there reverence for version v1, v1beta and shared types such as “Date” and others.

Expanding the API reference might seem overwhelming at first with its long list of resources and types.

In reality, it’s quite straightforward. To help visualize the structure, let’s create a graph of the main AlloyDB API resources.

At a high level, the AlloyDB resource hierarchy begins with a Project, which contains one or more Locations. Nested under a specific location is the Cluster, which in turn contains resources like Instances, Backups, and Users.

All changes to these resources are done through Operations. Any request that modifies a resource, such as creating an instance or a backup, triggers an operation that you can monitor.

While this is a simplified view, it covers the basics. For this post, we’ll focus on just a few of these key resources and show how to manage them using Go and the REST API.

Cluster

In a project we can have one unique cluster per location. You cannot have two clusters with the same name in the same region. So, to define a cluster or clusters we want to modify we have to specify a project and a certain location as root resources. In case of an AlloyDB cluster the location will be represented by a region. You can get the location resource definition here in the reference. Here is an example of how the location resource can be defined in the code.

// List of available locations
type Locations struct {
 Locations []Location `json:"locations"`
}

type Location struct {
 Name string `json:"name"`
 LocationId string `json:"locationId"`
 DisplayName string `json:"displayName"`
}

...
// Get list of all locations instances for clusters with defined name in all locations
   locationsURL := fmt.Sprintf("%s/projects/%s/locations", apiURL, project)

   resp, err := client.Get(locationsURL)
   if err != nil {
    return nil, fmt.Errorf("failed to get all locations for project %s: %v", project, err)
   }
...
// Get list of locations 
locations := Locations{}
   err = json.Unmarshal(locationsListBody, &locations)
   if err != nil {
    return nil, fmt.Errorf("failed to unmarshal all locations: %v", err)
   }

The examples in this post use the Go language to make direct HTTP requests to the API. You can find the full source code for the Cloud Run function here.

Once we know the location we can define our cluster using cluster name as a parameter. For all clusters in the project we use an alias ALL and it will tell us to use “-” in the request URL to define all clusters.

Our sample code focuses on instance management, so we don’t explicitly define a Cluster type (or struct) in Go. Instead, we simply use the cluster’s name directly in the request URL to target the correct resources.

However, if you were performing actions on the cluster itself, like creating a new one, you would need to define that Cluster type in your code to properly structure the API request. You can find the full description of the cluster API resource in the documentation.

Instance

Each cluster can have one or more instances where one instance is the primary and the rest would belong to one of the read pools. Some operations like backup require the primary instance to be available. To create a proper request body for the instance it is defined as a type or struct in the Go code.

Here is how we would define the instances in the code.

// List of AlloyDB instances from API response
type Instances struct {
 Instances []Instance `json:"instances"`
}

// A single instance from the list
type Instance struct {
 Name string `json:"name"`
 DisplayName string `json:"displayName"`
 Uid string `json:"uid"`
 CreateTime string `json:"createTime"`
 UpdateTime string `json:"updateTime"`
 DeleteTime string `json:"deleteTime"`
 State string `json:"state"`
 InstanceType string `json:"instanceType"`
 Description string `json:"description"`
 IpAddress string `json:"ipAddress"`
 Reconciling bool `json:"reconciling"`
 Etag string `json:"etag"`
}

We have two types — one for instance itself and the other for a list of instances where each member is a single instance. It helps when we want to perform an operation on all instances of a cluster. Now, let’s talk about operations or what we can do with an instance.

Operations

What can we do with an instance? We can create, delete, change shape, and, more recently, start and stop them. You can read about starting and stopping instances in one of my previous blogs. For a complete list of all available methods, please refer to the reference documentation.

To illustrate, let’s delete an instance using curl. This is done by sending a DELETE request to the instance URL.

For this example, we’ll assume the following details:

Project ID: test-project-123
Region: us-central1
Cluster Name: my-cluster
Instance Name: my-instance

Given these parameters, the request would look like this:

curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" https://alloydb.googleapis.com/v1beta/projects/test-project-123/locations/us-central1/clusters/my-cluster/instances/my-instance

You noticed that I used an OAuth token to authenticate my request. That command works if you have gcloud SDK on your machine and authenticated in the cloud.

When you post such a request it will return an id of the operation triggered by that request. You can monitor the operation status using a “GET” request to the operations endpoint.

It’s also worth mentioning the failover operation, which is useful for managing High Availability (HA) instances by allowing you to switch between zones.

There are other AlloyDB resources such as backups and users which you can include to your tool and automate but now I want to focus on the way you initiate one or another operation.

Cloud Function

Technically you could use Google Cloud Scheduler to send HTTPS requests directly to AlloyDB API endpoints, but this approach has limitations. One of such limitations is that you often don’t know the exact name of a resource in advance.

For example, if you want to stop all AlloyDB instances in a project, you first need to query the API to get a list of those instances before you can send a ‘stop’ request for each one. The same is true for managing backups, where you must know a backup’s unique name to interact with it.

A more flexible solution is to use a serverless function (like Cloud Functions or Cloud Run) that is triggered by a Pub/Sub message. The message payload can dynamically specify the desired action, the target resources, and any other parameters you need.

Here is an example of what such a message payload might look like:

{
    "project": "test-project-123",
    "location": "us-central1",
    "operation": "STOP",
    "cluster": "my-cluster",
    "instance": "my-instance",
    "retention": 0
}

This message initiates a STOP operation on the my-instance instance of the my-cluster AlloyDB cluster in the us-central1 region.

The retention field here is for a future implementation of backup management, where you can specify a retention period for your manual backups.

In the code, the message would be represented by structs like the following:

type PubSubMessage struct {
 Data []byte `json:"data"`
}
type Parameters struct {
 Project string `json:"project"`
 Location string `json:"location"`
 Operation string `json:"operation"`
 Cluster string `json:"cluster"`
 Instance string `json:"instance"`
 Retention int `json:"retention"`
}

When we create a function, we specify an EventArc trigger that will invoke the function whenever a Pub/Sub message is published to a topic. The web console interface allows us to create the Pub/Sub topic at the same time we define the trigger for the function.

After defining all the metadata for the function, we can add the source code. As a reminder, the sample code is available for download from GitHub.

Now, whenever you publish a message to the alloydb-mgmt-topic using the JSON format discussed earlier, you can start, stop, scale, or delete a specific instance or all instances in a project. This can also be combined with monitoring to, for example, scale an instance up or down, as was described in one of the previous blogs.

Summary

AlloyDB service in Google Cloud gives you a great start with automated services and features that require minimal management. However, as your business grows and requires unique features, the AlloyDB API provides the ability to manage and automate all kinds of tasks based on your requirements.

Try the sample function code with the REST API and HTTP requests and also check the examples in the previously published blogs about start and stop automation and vertical autoscaling. Google also provides a Go SDK for AlloyDB along with SDK for other languages. Please try the code and let me know if you would like to see a version of the sample function that is based on the Go SDK.

B-tree indexes for JSON in PostgreSQL

Gleb Otochkin — Thu, 17 Jul 2025 02:28:46 +0000

Introduction

PostgreSQL is a swiss knife of databases — it supports all kinds of data. But different data types create different challenges. If you are using PostgreSQL to store JSON data you probably have heard about potential performance problems of querying JSON data. The general rule of thumb is to use JSONB data types when you can. But sometimes you want to use JSON not JSONB to preserve some information, like duplicated keys, maybe some null values or order of keys in the JSON. But how can we query the data more efficiently if we use JSON? In this blog I will talk about using B-tree indexes on JSON data.

JSON and B tree indexes

Let’s start from the basics and talk about JSON in Postgres. It can be stored either as a JSON data type where JSON is stored as a text or as a JSONB in a binary format. You could hear that you need JSONB to use indexes for your data. That’s not entirely true. While JSONB has more options for indexes, the JSON data type still supports some indexing. For example, you can create a B-Tree index for your known JSON keys and that index will be similar in behaviour to any other B-Tree indexes on expression in Postgres.

Here is an example of how you can create one.

I have a table jproducts with two columns — id and product where id is a primary key and the product is a JSON data type. The table has about 29 000 rows.

testdb=> \d ecomm.jproducts
              Table "ecomm.jproducts"
 Column | Type | Collation | Nullable | Default
---------+--------+-----------+----------+---------
 id | bigint | | not null |
 product | json | | |
Indexes:
    "jproducts_pkey" PRIMARY KEY, btree (id)

testdb=>

And here is the list of keys in my JSON:

          key | value_type
------------------------+------------
 brand | string
 category | string
 cost | number
 department | string
 distribution_center_id | number
 id | number
 name | string
 product_description | string
 product_image_uri | string
 retail_price | number
 sku | string

Let’s say we know that our application is going to filter data by brand and it is going to be executed often enough. If we run the query without an index then we can see it takes about 42 ms.

testdb=> explain analyze select product->>'department' from ecomm.jproducts where product @> jsonb_build_object('brand','Victor');
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Gather (cost=1000.00..7143.47 rows=2036 width=32) (actual time=0.310..41.958 rows=178 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on jproducts (cost=0.00..5939.87 rows=848 width=32) (actual time=0.068..39.653 rows=59 loops=3)
         Filter: ((product ->> 'brand'::text) = 'Victor'::text)
         Rows Removed by Filter: 9647
 Planning Time: 0.051 ms
 Execution Time: 41.998 ms
(8 rows)

Even if 42ms looks reasonable enough it can create some performance problems when it scales to hundreds of requests per second. What if we create an index for the key ‘brand’?

create index jproducts_brand on ecomm.jproducts using btree ((product->>'brand'));

Now if we execute the same query we can see the index is used:

testdb=> explain analyze select product->>'department' from ecomm.jproducts where product->>'brand' = 'Victor';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.04 rows=146 width=32) (actual time=0.051..0.481 rows=178 loops=1)
   Recheck Cond: ((product ->> 'brand'::text) = 'Victor'::text)
   Heap Blocks: exact=164
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..5.38 rows=146 width=0) (actual time=0.024..0.024 rows=178 loops=1)
         Index Cond: ((product ->> 'brand'::text) = 'Victor'::text)
 Planning Time: 0.153 ms
 Execution Time: 0.455 ms
(7 rows)

Our new index reduces the execution time almost by 100 times. That is a significant performance boost. But will the index be used for any operation when we work with the ‘brand’ key? Let’s change our query a bit and use the LIKE operator or the UPPER function.

testdb=> explain analyze select product->>'department' from ecomm.jproducts where product->>'brand' like '%Victor%';
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on jproducts (cost=0.00..3550.82 rows=9 width=32) (actual time=0.079..42.731 rows=178 loops=1)
   Filter: ((product ->> 'brand'::text) ~~'%Victor%'::text)
   Rows Removed by Filter: 28942
 Planning Time: 0.053 ms
 Execution Time: 42.768 ms
(5 rows)

testdb=> explain analyze select product->>'department' from ecomm.jproducts where upper(product->>'brand') = 'VICTOR';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on jproducts (cost=0.00..3623.96 rows=146 width=32) (actual time=1.221..54.989 rows=178 loops=1)
   Filter: (upper((product ->> 'brand'::text)) = 'VICTOR'::text)
   Rows Removed by Filter: 28942
 Planning Time: 0.052 ms
 Execution Time: 55.028 ms
(5 rows)

testdb=>

The index is not used and it resembles exactly the same behaviour as for any other B-tree indexes on expressions. It has to be able to search in its binary tree using exactly the same expression as specified during the index creation. If we want the index to be used for a query with the UPPER function then the function has to be defined in the index.

create index jproducts_brand_upper on ecomm.jproducts using btree (UPPER(product->>'brand'));

And then the index will indeed be used if we have a query with the UPPER function.

testdb=> explain analyze select product->>'department' from ecomm.jproducts where upper(product->>'brand') = 'VICTOR';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.41 rows=146 width=32) (actual time=0.090..0.910 rows=178 loops=1)
   Recheck Cond: (upper((product ->> 'brand'::text)) = 'VICTOR'::text)
   Heap Blocks: exact=164
   -> Bitmap Index Scan on jproducts_brand_upper (cost=0.00..5.38 rows=146 width=0) (actual time=0.023..0.023 rows=178 loops=1)
         Index Cond: (upper((product ->> 'brand'::text)) = 'VICTOR'::text)
 Planning Time: 0.149 ms
 Execution Time: 0.952 ms
(7 rows)

testdb=>

Let’s talk about planner and stats for the index. Does PostgreSQL gather statistics for the keys in the JSON, and would it impact the planner’s decision on whether or not to use the index? I analyzed the table before creating our index but didn’t do it after.

Let’s check the planner’s expectation about the number of rows for the brands ‘Victor’ and ‘Verona Q’ .

testdb=> explain select product->>'department' from ecomm.jproducts where product->>'brand' = 'Victor';
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.04 rows=146 width=32)
   Recheck Cond: ((product ->> 'brand'::text) = 'Victor'::text)
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..5.38 rows=146 width=0)
         Index Cond: ((product ->> 'brand'::text) = 'Victor'::text)
(4 rows)

testdb=> explain select product->>'department' from ecomm.jproducts where product->>'brand' = 'Verona Q';
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.04 rows=146 width=32)
   Recheck Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..5.38 rows=146 width=0)
         Index Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
(4 rows)

testdb=>

It seems that the planner assumes the number of rows is roughly the same and equal to 146. By default PostgreSQL doesn’t gather statistics for keys in your JSON but it can do it for expressions or when you create an index on expression. We have created the index and we can help our planner by adding the statistics.

Let’s analyze the table and repeat one of our queries.

testdb=> analyze ecomm.jproducts;
ANALYZE
testdb=> explain select product->>'department' from ecomm.jproducts where product->>'brand' = 'Verona Q';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=16.30..2162.20 rows=1034 width=32)
   Recheck Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..16.04 rows=1034 width=0)
         Index Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
(4 rows)

testdb=> select count(*) from ecomm.jproducts where product->>'brand' = 'Verona Q';
 count
-------
  1034
(1 row)

Now the planner knows exactly how many rows of each particular brand we have and that can help to make the right decision. It might prefer the sequential scan (seq_scan) operation sometimes because it can be less expensive if cardinality (number of rows returned by the query block) is too high. The reason for that is the cost of querying using seq_scan is by default four times less costly than the random scan used for index. You can see (and change) it using the following parameters.

testdb=> show seq_page_cost ;
 seq_page_cost
---------------
 1
(1 row)

testdb=> show random_page_cost ;
 random_page_cost
------------------
 4
(1 row)

We can demonstrate it using an index on product->department key.

testdb=> create index jproducts_department on ecomm.jproducts using btree ((product->>'department'));
CREATE INDEX
Time: 93.934 ms
testdb=> explain select product->>'brand' from ecomm.jproducts where product->>'department' = 'Women';
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..491.90 rows=146 width=32)
   Recheck Cond: ((product ->> 'department'::text) = 'Women'::text)
   -> Bitmap Index Scan on jproducts_department (cost=0.00..5.38 rows=146 width=0)
         Index Cond: ((product ->> 'department'::text) = 'Women'::text)
(4 rows)

Time: 63.887 ms
testdb=> analyze ecomm.jproducts;
ANALYZE
Time: 166.026 ms
testdb=> explain select product->>'brand' from ecomm.jproducts where product->>'department' = 'Women';
                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on jproducts (cost=0.00..3869.77 rows=15989 width=32)
   Filter: ((product ->> 'department'::text) = 'Women'::text)
(2 rows)

Time: 58.981 ms
testdb=>

You can see that after creating the index we have the same default assumption about the number of rows for each department and the index is chosen to get the data. But as soon as we analyzed the table again it updated the stats and recognized that more than half of the table needs to be scanned. The overall cost for our index scan is higher than the sequential scan for the table. This is because we’re retrieving most of our table’s pages, and an index scan has a higher cost per page. As a result, the planner chooses to use sequential scan and ignore the index.

That’s great but what would we do if we don’t know what keys would be used in our application queries? Can we create an index for each key? We can but it might not be the best decision. Each index generates significant overhead for all operations on the data. Every inserted, deleted or updated row should update the indexes and then later be a subject of the vacuuming process. Here is a simple example of impact on insert from our two indexes.

-- Without any indexes on JSON
testdb=> insert into ecomm.jproducts select id, to_json(t) from products t;
INSERT 0 29120
Time: 954.380 ms
testdb=>

-- Creating indexes 
testdb=> create index jproducts_brand on ecomm.jproducts using btree ((product->>'brand'));
CREATE INDEX
Time: 69.097 ms
testdb=> create index jproducts_department on ecomm.jproducts using btree ((product->>'department'));
CREATE INDEX
Time: 63.080 ms
testdb=>

-- Insert with two indexes
testdb=> insert into ecomm.jproducts select id, to_json(t) from products t;
INSERT 0 29120
Time: 3008.885 ms (00:03.009)
testdb=>

Just two indexes increased insert time by 3 times and it is not taking into consideration the full impact from any deletes or updates, which will result in additional vacuuming of obsolete tuples.

Conclusion

Let’s recap what we’ve discussed here.

You can use index on expressions for your JSON data when you know what JSON keys your application or queries are going to use.
The indexes follow the same rules as any other b-tree indexes on expressions — your query should use a compatible expression.
You need to analyze your table after creating an index to have correct statistics for your indexed keys to help planner to make correct decisions about using the index.
Be mindful of index overhead and potential impact on your DML and maintenance operations. Create only indexes you really need.

What if you don’t know what keys are going to be used in your application and maybe don’t know what new keys can appear in your JSON data? Then maybe it makes sense to look into JSONB data type and GIN index. And that is what we are going to discuss in the next blog. Stay tuned.