<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gaurav Tarlok Kakkar</title>
    <description>The latest articles on DEV Community by Gaurav Tarlok Kakkar (@gaurav274).</description>
    <link>https://dev.to/gaurav274</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1068599%2F0a60d84e-8f2e-4ead-9389-513ef478da98.png</url>
      <title>DEV Community: Gaurav Tarlok Kakkar</title>
      <link>https://dev.to/gaurav274</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gaurav274"/>
    <language>en</language>
    <item>
      <title>How About Ditching the Hype: Do We Really Need a Specialized Vector Database?</title>
      <dc:creator>Gaurav Tarlok Kakkar</dc:creator>
      <pubDate>Thu, 05 Oct 2023 20:44:53 +0000</pubDate>
      <link>https://dev.to/gaurav274/how-about-ditching-the-hype-do-we-really-need-a-specialized-vector-database-324l</link>
      <guid>https://dev.to/gaurav274/how-about-ditching-the-hype-do-we-really-need-a-specialized-vector-database-324l</guid>
      <description>&lt;p&gt;With the emergence of Generative AI, vector databases have surged in popularity. They've found their niche in powering Retrieval Augmented Generation (RAG) applications. However, as we delve into the landscape of databases, a common trend emerges: nearly every database provider is incorporating vector search capabilities into their offerings. It's a strategic move driven by the fact that vector search is integral to capturing a substantial share of the RAG workload.&lt;/p&gt;

&lt;p&gt;Some of the major releases: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.databricks.com/company/newsroom/press-releases/databricks-introduces-new-generative-ai-tools-investing-lakehouse" rel="noopener noreferrer"&gt;Databricks&lt;/a&gt;: Databricks introduces new generative AI tools.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;Pgvector&lt;/a&gt; and &lt;a href="https://github.com/tensorchord/pgvecto.rs" rel="noopener noreferrer"&gt;Pgvector.rs&lt;/a&gt;: Postgres extension that provides vector similarity search.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://blog.cloudflare.com/vectorize-vector-database-open-beta/" rel="noopener noreferrer"&gt;Cloudflare launches Vectorize&lt;/a&gt;: A vector database for shipping AI-powered applications to production, fast.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.mongodb.com/blog/post/introducing-atlas-vector-search-build-intelligent-applications-semantic-search-ai" rel="noopener noreferrer"&gt;MongoDB Atlas Vector Search&lt;/a&gt;: Vector Search capability designed to meet the demands of data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.elastic.co/enterprise-search/vector-search" rel="noopener noreferrer"&gt;Elastic&lt;/a&gt; - Vector search powers the next generation of search experiences&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.oracle.com/news/announcement/ocw-integrated-vector-database-augments-generative-ai-2023-09-19/" rel="noopener noreferrer"&gt;Oracle Integrated Vector Database&lt;/a&gt;: Integrated Vector Database to Augment Generative AI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/asg017/sqlite-vss" rel="noopener noreferrer"&gt;Sqlite-vss&lt;/a&gt;: A SQLite extension for efficient vector search, based on Faiss.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://planetscale.com/blog/planetscale-is-bringing-vector-search-and-storage-to-mysql" rel="noopener noreferrer"&gt;PlanetScale&lt;/a&gt;: Adding vector storage and search to MySQL.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, the big question is: Is all this effort going to make the difference between vector and other databases disappear over time? Open thoughts 🤔 &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why might customers consider moving to a separate database for vector search when their current database provider already offers vector search capabilities?&lt;/li&gt;
&lt;li&gt;Will these databases come with RAG capabilities right out of the box, or will libraries like &lt;a href="https://github.com/langchain-ai/langchain" rel="noopener noreferrer"&gt;Langchain&lt;/a&gt; and &lt;a href="https://github.com/run-llama/llama_index" rel="noopener noreferrer"&gt;llama-index&lt;/a&gt; be used as ETL pipelines on top of these databases to facilitate RAG?&lt;/li&gt;
&lt;li&gt;Conversely, can these extensions or bolt-on vector search supports meet the scalability, latency, cost, and index freshness requirements of applications? &lt;/li&gt;
&lt;li&gt;What if a specialized architectural change is needed to handle vector search due to the massive embedding size?&lt;/li&gt;
&lt;li&gt;Perhaps both options will coexist, but for smaller workloads, the difference in performance and cost between specialized vector databases and built-in support may not be significant enough to justify maintaining a new database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sources:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/item?id=37747534" rel="noopener noreferrer"&gt;https://news.ycombinator.com/item?id=37747534&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://nextword.substack.com/p/vector-database-is-not-a-separate" rel="noopener noreferrer"&gt;https://nextword.substack.com/p/vector-database-is-not-a-separate&lt;/a&gt; &lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>database</category>
      <category>chatgpt</category>
      <category>python</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Soft Join in PostgreSQL using LLMs</title>
      <dc:creator>Gaurav Tarlok Kakkar</dc:creator>
      <pubDate>Wed, 04 Oct 2023 20:20:36 +0000</pubDate>
      <link>https://dev.to/gaurav274/soft-join-in-postgresql-using-llms-1jdn</link>
      <guid>https://dev.to/gaurav274/soft-join-in-postgresql-using-llms-1jdn</guid>
      <description>&lt;p&gt;Data Analytics often struggle when there is no common column between two datasets, and therefore, there is no way to join 2 tables and aggregate the stats across datasets. 😩 However, thanks to LLM, we can now achieve it. 🙌&lt;/p&gt;

&lt;p&gt;In this short post, I will illustrate how EvaDB enables AI-powered soft/semantic joins between tables that do not directly share a joinable column. 😎 The remarkable part is that this can be done without leaving your favorite database, whether it's PostgreSQL, MySQL, etc. 🚀&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge: "AI-Powered" Join
&lt;/h3&gt;

&lt;p&gt;Consider a scenario where you have two tables - one with details about AirBnB listings in San Francisco and the other providing insights into the city's parks. Our objective is to identify Airbnb listings located in neighborhoods with a high concentration of nearby parks. These tables/datasets lack a common column for a straightforward join. The Airbnb dataset includes a &lt;code&gt;neighborhood&lt;/code&gt; column, while the parks dataset features a &lt;code&gt;zipcode&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wFZb1DyU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zibnztdahemkj2tp9o1i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wFZb1DyU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zibnztdahemkj2tp9o1i.png" alt="Cannot join tables with no common columns" width="800" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;EvaDB addresses this challenge by facilitating the merging operation using LLMs. Below is the key query to create a new reference table that can be joined with other tables easily.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;reference_table&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;parkname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parktype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
       &lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="nv"&gt;"Return the San Francisco neighborhood name when provided with a zipcode. The possible neighborhoods are: {neighbourhoods_str}. The response should be an item from the provided list. Do not add any more words."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;zipcode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;postgres_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;recreational_park_dataset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As depicted in the figure below, it generates a new table with the &lt;code&gt;neighborhood&lt;/code&gt; column corresponding to the &lt;code&gt;zipcode&lt;/code&gt;, enabling us to seamlessly join the two datasets using the &lt;code&gt;neighborhood&lt;/code&gt; column. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BSWQlFV3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3sn65jonfmtr2usaiq1p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BSWQlFV3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3sn65jonfmtr2usaiq1p.png" alt="LLM powered Ssoft join" width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How cool is this? 🤩 Mind-blown! 💥&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full Tutorial: &lt;a href="https://colab.research.google.com/github/georgia-tech-db/eva/blob/staging/tutorials/15-AI-powered-join.ipynb"&gt;Google Colab&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Show some ❤️❤️ to &lt;a href="https://github.com/georgia-tech-db/evadb"&gt;EvaDB&lt;/a&gt;!  Your support motivates me to keep the project going.
🤝&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>database</category>
      <category>openai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Stargazers Reloaded – LLM-Powered Analyses of Your GitHub Community</title>
      <dc:creator>Gaurav Tarlok Kakkar</dc:creator>
      <pubDate>Mon, 02 Oct 2023 17:44:25 +0000</pubDate>
      <link>https://dev.to/gaurav274/stargazers-reloaded-llm-powered-analyses-of-your-github-community-23cc</link>
      <guid>https://dev.to/gaurav274/stargazers-reloaded-llm-powered-analyses-of-your-github-community-23cc</guid>
      <description>&lt;p&gt;GitHub ⭐ symbolizes a repository's popularity in the developer community. Whether you're a developer, open-source enthusiast, or simply curious about tech trends, these stars provide insights into the coding community.&lt;/p&gt;

&lt;p&gt;What if we could delve into the minds of these star-givers, extracting insights from their profiles to understand their interests, locations, and more? &lt;a href="https://github.com/pchunduri6/stargazers-reloaded"&gt;Stargazers Reloaded&lt;/a&gt; makes it super easy to gain insights about your GitHub community using large language models (LLMs).&lt;/p&gt;

&lt;p&gt;It is powered under the hood by an emerging database, &lt;a href="https://github.com/georgia-tech-db/evadb"&gt;EvaDB&lt;/a&gt; tailored for AI apps. &lt;/p&gt;

</description>
      <category>chatgpt</category>
      <category>programming</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
