DEV Community

Gleb Otochkin
Gleb Otochkin

Posted on • Originally published at Medium on

State-of-the-art text embedding in AlloyDB with the latest Gemini model

I think most of the readers are aware about large language models (LLM) and their ability to use the semantic meaning of a phrase rather than the exact wording. The LLM embeddings is a numeric representation of that meaning in the way of a numeric array. Those arrays can be compared with one another and the difference between them would be called “distance”. The closer the “distance” between the embeddings, the more similar the semantic meaning of the corresponding phrases.

There are multiple different embedding models around. Are they all the same? Of course not. Some do work better than others and we have different ways to compare their quality. For example there is a Massive Text Embedding Benchmark (MTEB) (Multilingual) leaderboard which ranks hundreds of embedding models, and you can can have a look at it here.

Google recently released the latest embedding model — Gemini Embedding text model. You can read about the model in the Google blog.

And the model is available right now in AlloyDB. You can use the standard embedding function to call the new model out of the box. Let’s briefly check the new model.

How to call the embedding model

You can call the model using something like the following:

select embedding('gemini-embedding-001', 'What is AlloyDB?');
Enter fullscreen mode Exit fullscreen mode

It works out of box and generates embeddings returning the result for a single request with about the same speed as the current text embedding model text-embedding-005. The exact timing depends on the request itself and multiple other factors which could impact communications between instance and the model endpoint.

quickstart_db=> explain analyze select embedding('text-embedding-005', 'What is AlloyDB?');
QUERY PLAN
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 87.099 ms
Execution Time: 0.012 ms
(3 rows)
Time: 87.827 ms
quickstart_db=> explain analyze select embedding('gemini-embedding-001', 'What is AlloyDB?');
QUERY PLAN
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 77.402 ms
Execution Time: 0.016 ms
(3 rows)
Time: 78.135 ms
quickstart_db=>
Enter fullscreen mode Exit fullscreen mode

But keep in mind that the gemini-embedding model returns by default a 3072-dimensional vector vs 768 for the text-embedding-005. It means you will need to use the vector(3072) column data type for the new model and it will consume more space.

quickstart_db=> create table t1 as select id, my_text, embedding('text-embedding-005', my_text) from t0;
SELECT 2000
quickstart_db=> SELECT pg_size_pretty(pg_total_relation_size('t1'));
pg_size_pretty
 - - - - - - - - 
8496 kB
(1 row)
Time: 1.030 ms
quickstart_db=> create table t2 as select id, my_text, embedding('gemini-embedding-001', my_text) from t0;
SELECT 2000
quickstart_db=> SELECT pg_size_pretty(pg_total_relation_size('t2'));
pg_size_pretty
 - - - - - - - - 
28 MB
(1 row)
Enter fullscreen mode Exit fullscreen mode

What about the quality of the response? Of course the previously mentioned MTEB gives standardized and comprehensive data but what about real life experience? Let’s try to find an answer to one of the most important questions in the universe.

Battle of Bagels

I am creating a table expression with some basic information about bagel vendors and asking which ones are the most true bagels.

First we try it using the text-embedding-005 model and a short random selection of some bagels shops.

WITH bagels_vendors(brand, description, location) AS (
  VALUES
    ('Brooklyn Bagel & Coffee Company', 'Offers large, hand-rolled bagels with a good balance of chewy and soft texture.', 'Multiple locations in Manhattan and Brooklyn'),
    ('Ess-a-Bagel', 'Offers large, chewy, and dense bagels, considered a classic New York style.', 'Midtown East, Manhattan'),
    ('Tompkins Square Bagels', 'Popular spot with a wide variety of creative and classic bagel flavors and toppings.', 'East Village, Manhattan'),
    ('St-Viateur Bagel', 'Iconic Montreal bagel shop since 1957, known for hand-rolled bagels baked in a wood-fired oven.', 'Montreal, Quebec'),
    ('Fairmount Bagel', 'A long-standing Montreal institution since 1949, famous for its slightly denser and wood-fired bagels.', 'Montreal, Quebec'),
    ('Bagel Etc.', 'A popular Montreal spot since 1982, known for its neon-and-vinyl diner aesthetic and bagel-centric breakfast.', 'Montreal, Quebec')
),
combined_bagels (brand, description, location,text_embedding_005) AS (
  SELECT brand, description, location,embedding('text-embedding-005','brand name: '||brand||' description: '||description)::vector
    FROM bagels_vendors
)
select brand, location,text_embedding_005 <=> embedding ('text-embedding-005','The only correct way to make bagels')::vector as distance from combined_bagels order by distance;

              brand | location | distance
---------------------------------+----------------------------------------------+---------------------
 Brooklyn Bagel & Coffee Company | Multiple locations in Manhattan and Brooklyn | 0.3357732955292745
 Ess-a-Bagel | Midtown East, Manhattan | 0.34294438809905525
 Tompkins Square Bagels | East Village, Manhattan | 0.36784331284309657
 Fairmount Bagel | Montreal, Quebec | 0.3744778134972202
 St-Viateur Bagel | Montreal, Quebec | 0.3906381759225491
 Bagel Etc. | Montreal, Quebec | 0.42936727199347535
Enter fullscreen mode Exit fullscreen mode

You can see all 3 first places are taken by New York style bagels. What if we replace the model by the new gemini-embedding-001?

WITH bagels_vendors(brand, description, location) AS (
  VALUES
    ('Brooklyn Bagel & Coffee Company', 'Offers large, hand-rolled bagels with a good balance of chewy and soft texture.', 'Multiple locations in Manhattan and Brooklyn'),
    ('Ess-a-Bagel', 'Offers large, chewy, and dense bagels, considered a classic New York style.', 'Midtown East, Manhattan'),
    ('Tompkins Square Bagels', 'Popular spot with a wide variety of creative and classic bagel flavors and toppings.', 'East Village, Manhattan'),
    ('St-Viateur Bagel', 'Iconic Montreal bagel shop since 1957, known for hand-rolled bagels baked in a wood-fired oven.', 'Montreal, Quebec'),
    ('Fairmount Bagel', 'A long-standing Montreal institution since 1949, famous for its slightly denser and wood-fired bagels.', 'Montreal, Quebec'),
    ('Bagel Etc.', 'A popular Montreal spot since 1982, known for its neon-and-vinyl diner aesthetic and bagel-centric breakfast.', 'Montreal, Quebec')
),
combined_bagels (brand, description, location,gemini_embedding_001) AS (
  SELECT brand, description, location,embedding('gemini-embedding-001','brand name: '||brand||' description: '||description)::vector
    FROM bagels_vendors
)
select brand, location,gemini_embedding_001 <=> embedding('gemini-embedding-001','The only correct way to make bagels')::vector as distance from combined_bagels order by distance;
Enter fullscreen mode Exit fullscreen mode

And here we have Montreal style bagels moving a bit up. Not to be the first place but the St-Viateur Bagel takes the honorary 3rd place here.

              brand | location | distance
---------------------------------+----------------------------------------------+---------------------
 Ess-a-Bagel | Midtown East, Manhattan | 0.32417611253786394
 Brooklyn Bagel & Coffee Company | Multiple locations in Manhattan and Brooklyn | 0.35497361421585083
 St-Viateur Bagel | Montreal, Quebec | 0.3588242530822754
 Fairmount Bagel | Montreal, Quebec | 0.3619239710614687
 Tompkins Square Bagels | East Village, Manhattan | 0.3629311159767006
 Bagel Etc. | Montreal, Quebec | 0.40006683469017823
Enter fullscreen mode Exit fullscreen mode

We still have the same two shops on the first and the second place but in different order and with bigger difference in the distance. It might look small but in some real applications that difference can change behaviour. From my testing using a couple of other datasets it felt like the new model gave better results. And of course you should definitely test it on your data.

Try it out

In my opinion Montreal style bagels are the best and should be among top rated bagels. The new Gemini embedding model gives them better ratings than the old one and it is of course right.

Try it with your data and see if you can get more accurate results using the new Gemini embedding model in AlloyDB for your text embedding needs today. And if you want some hands-on experience with embeddings try one of embeddings codelabs for AlloyDB.


Top comments (0)