DEV Community: Gaurav Tarlok Kakkar

How About Ditching the Hype: Do We Really Need a Specialized Vector Database?

Gaurav Tarlok Kakkar — Thu, 05 Oct 2023 20:44:53 +0000

With the emergence of Generative AI, vector databases have surged in popularity. They've found their niche in powering Retrieval Augmented Generation (RAG) applications. However, as we delve into the landscape of databases, a common trend emerges: nearly every database provider is incorporating vector search capabilities into their offerings. It's a strategic move driven by the fact that vector search is integral to capturing a substantial share of the RAG workload.

Some of the major releases:

Databricks: Databricks introduces new generative AI tools.
Pgvector and Pgvector.rs: Postgres extension that provides vector similarity search.
Cloudflare launches Vectorize: A vector database for shipping AI-powered applications to production, fast.
MongoDB Atlas Vector Search: Vector Search capability designed to meet the demands of data.
Elastic - Vector search powers the next generation of search experiences
Oracle Integrated Vector Database: Integrated Vector Database to Augment Generative AI.
Sqlite-vss: A SQLite extension for efficient vector search, based on Faiss.
PlanetScale: Adding vector storage and search to MySQL.

So, the big question is: Is all this effort going to make the difference between vector and other databases disappear over time? Open thoughts 🤔

Why might customers consider moving to a separate database for vector search when their current database provider already offers vector search capabilities?
Will these databases come with RAG capabilities right out of the box, or will libraries like Langchain and llama-index be used as ETL pipelines on top of these databases to facilitate RAG?
Conversely, can these extensions or bolt-on vector search supports meet the scalability, latency, cost, and index freshness requirements of applications?
What if a specialized architectural change is needed to handle vector search due to the massive embedding size?
Perhaps both options will coexist, but for smaller workloads, the difference in performance and cost between specialized vector databases and built-in support may not be significant enough to justify maintaining a new database.

Sources:

Soft Join in PostgreSQL using LLMs

Gaurav Tarlok Kakkar — Wed, 04 Oct 2023 20:20:36 +0000

Data Analytics often struggle when there is no common column between two datasets, and therefore, there is no way to join 2 tables and aggregate the stats across datasets. 😩 However, thanks to LLM, we can now achieve it. 🙌

In this short post, I will illustrate how EvaDB enables AI-powered soft/semantic joins between tables that do not directly share a joinable column. 😎 The remarkable part is that this can be done without leaving your favorite database, whether it's PostgreSQL, MySQL, etc. 🚀

Challenge: "AI-Powered" Join

Consider a scenario where you have two tables - one with details about AirBnB listings in San Francisco and the other providing insights into the city's parks. Our objective is to identify Airbnb listings located in neighborhoods with a high concentration of nearby parks. These tables/datasets lack a common column for a straightforward join. The Airbnb dataset includes a neighborhood column, while the parks dataset features a zipcode column.

EvaDB addresses this challenge by facilitating the merging operation using LLMs. Below is the key query to create a new reference table that can be joined with other tables easily.

CREATE TABLE reference_table AS
SELECT parkname, parktype, 
       LLM(
       "Return the San Francisco neighborhood name when provided with a zipcode. The possible neighborhoods are: {neighbourhoods_str}. The response should be an item from the provided list. Do not add any more words.",
       zipcode) 
FROM postgres_db.recreational_park_dataset;

As depicted in the figure below, it generates a new table with the neighborhood column corresponding to the zipcode, enabling us to seamlessly join the two datasets using the neighborhood column.

How cool is this? 🤩 Mind-blown! 💥

Full Tutorial: Google Colab.
Show some ❤️❤️ to EvaDB! Your support motivates me to keep the project going. 🤝

Stargazers Reloaded – LLM-Powered Analyses of Your GitHub Community

Gaurav Tarlok Kakkar — Mon, 02 Oct 2023 17:44:25 +0000

GitHub ⭐ symbolizes a repository's popularity in the developer community. Whether you're a developer, open-source enthusiast, or simply curious about tech trends, these stars provide insights into the coding community.

What if we could delve into the minds of these star-givers, extracting insights from their profiles to understand their interests, locations, and more? Stargazers Reloaded makes it super easy to gain insights about your GitHub community using large language models (LLMs).

It is powered under the hood by an emerging database, EvaDB tailored for AI apps.