Cloud SQL vs. Specialized Databases: Choosing Your Vector Search Solution

#data #googlecloudsql #rags #vectordatabase

Introduction

The popularity of vector search has exploded with the introduction of Generative AI, where it serves as the main engine for semantic search.

This demand initially led to specialized vector databases like Pinecone, Milvus and others, but now, most mainstream databases have also incorporated vector search capabilities.

This leaves developers with a critical decision: adopt a new, specialized database or use their existing one? The wrong choice can lead to poor performance and costly refactoring. This post explains why the best solution is often the simplest one — leveraging your current relational database.

In this post I explain when you don’t need a specialized vector solution and your relational database might be a better choice. I will use Cloud SQL for MySQL as one of the relational databases with full vector support.

Simplified architecture and operations

Using your existing database like Cloud SQL for MySQL for vectors greatly simplifies your architecture and daily operations for three key reasons.

First, unified data management means your vectors live with your operational data. This streamlines everything from security and backups to monitoring, eliminating the need to manage a separate database and a complex data pipeline.
Second, you use standard tools. There are no new SDKs or languages to learn, as you can manage vectors using the standard SQL you already know.
Finally, this creates a flat learning curve. Your team can adopt vector search immediately without having to learn a new distributed system or syntax.

Data consistency

Data in applications are not static and often the source data for the vector embeddings are subject to change too. When it happens the corresponding vector has to be updated.

If you use Cloud SQL as your vector store you can do it in the same transaction inserting, deleting or updating the corresponding data and the vector together. It will eliminate any possible discrepancy between the vector and the data it represents. If a transaction is rolled back, data consistency stays intact.
When your vectors are in the Cloud SQL for MySQL, your application doesn’t need to wait for a data pipeline (ETL) to sync changes from your application database to your vector store. This removes a potential point of failure and eliminates data lag. You always have the latest data and vectors for your search.
With a separate store for vectors you need to capture the changes in the database and transfer those changes to the vector database or introduce the vector store to your application making it part of the application. That might create additional problems synchronizing different application services caches or correct handling of rollbacks on application level.

Hybrid Search

One of the main benefits of keeping the data and vectors together is the ability to use filtering and hybrid searches using non-vectorized data.

For example, if you search a product based on a product description using embedding vectors for the description you can also introduce a filter on the product brand or combine it with other preferences from a user profile.
Pre-filtering combines B-Tree indexes on other columns with vector search, reducing the search space for the vectors. It can drastically improve performance and required memory.
In some cases if you get data from a vector search and try to apply post-filtering they can remove bulk of the returned dataset and effectively reduce the number of returned values in some cases to zero.

Lower Total Cost of Ownership

This is more of a business reason than a technical one, but that doesn’t make it any less important. Moving your vectors into Cloud SQL for MySQL can significantly reduce your bill. Here is the reasoning behind that statement:

No data transfer cost. All your data is in the same place in the same database and you don’t need to keep a pipeline and additional resources.
Consolidation of resources. All your data is stored, backed up and managed in the same environment. Instead of provisioning a new database just for vectors, you can utilize your existing resources.
Reduced engineering hours. Your engineers spend less time learning, deploying and maintaining separate systems to keep your vectors.

When a Vector Database is a better choice

Cloud SQL for MySQL can be one of the better choices to store and work with your vectors but sometimes it might be prudent to consider a specialized vector database.

Some advanced vector-specific features provided by some of the vector databases which are not available on the Cloud SQL. For example, the concept of namespaces in Pinecone can be appealing to some workloads.
A massive scale with billions of vectors. In such a case one of the specialized solutions like the Google Vector Search might be a more feasible destination.
Sometimes decoupling makes sense when the vectors are serving different applications in microservices deployments.

Want to know more?

This is just a first blog from a series of articles about vector search in Cloud SQL written by me and my colleagues. If you want to know more about KNN and ANN and what stands behind it please read a blog Vector Search: Demystifying ANN and KNN written by Shu Zhou.