<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: hervé ishimwe</title>
    <description>The latest articles on DEV Community by hervé ishimwe (@ishzuzo).</description>
    <link>https://dev.to/ishzuzo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2518944%2F712b3261-0b61-4041-9dba-f93faff3e844.jpeg</url>
      <title>DEV Community: hervé ishimwe</title>
      <link>https://dev.to/ishzuzo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ishzuzo"/>
    <language>en</language>
    <item>
      <title>Best Open-Source AI Model: Experimenting With Phi-4 and Ollama in PostgreSQL</title>
      <dc:creator>hervé ishimwe</dc:creator>
      <pubDate>Fri, 24 Jan 2025 00:38:55 +0000</pubDate>
      <link>https://dev.to/tigerdata/best-open-source-ai-model-experimenting-with-phi-4-and-ollama-in-postgresql-5e5c</link>
      <guid>https://dev.to/tigerdata/best-open-source-ai-model-experimenting-with-phi-4-and-ollama-in-postgresql-5e5c</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8097rkq7ep8r1na0w2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8097rkq7ep8r1na0w2k.png" alt="open-source-ai-model-experimenting" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The emergence of lightweight, powerful open-source AI models like &lt;a href="https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090" rel="noopener noreferrer"&gt;&lt;u&gt;Microsoft’s Phi-4&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://ollama.com/library/llama3.2" rel="noopener noreferrer"&gt;&lt;u&gt;Meta’s Llama 3.2&lt;/u&gt;&lt;/a&gt; has transformed the AI landscape, making it more accessible to developers. &lt;/p&gt;

&lt;p&gt;Open-source AI models are artificial intelligence models whose code, weights, and architecture are publicly available for anyone to view, use, modify, and distribute. These open-source tools for AI applications are cost-effective, highly customizable, and allow complete control over the data flow, making them ideal for building privacy-focused, LLM-powered systems.&lt;/p&gt;

&lt;p&gt;(If you want to learn more about this, we recently built &lt;a href="https://www.timescale.com/blog/build-a-fully-local-rag-app-with-postgresql-mistral-and-ollama" rel="noopener noreferrer"&gt;&lt;u&gt;a fully local retrieval-augmented generation (RAG) application using Mistral&lt;/u&gt;&lt;/a&gt;, another leading open-source model.)&lt;/p&gt;

&lt;p&gt;However, identifying the best open-source embedding or generative model for your AI use case remains challenging. Running these models locally requires computational resources, technical expertise, and time to establish robust evaluation workflows. These hurdles can slow development progress and discourage adoption despite the advantages of open-source tools. &lt;/p&gt;

&lt;p&gt;But it doesn’t have to be this way. In this blog post, you’ll learn how to simplify this process using &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;&lt;u&gt;Ollama&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://github.com/timescale/pgai" rel="noopener noreferrer"&gt;&lt;u&gt;pgai&lt;/u&gt;&lt;/a&gt;. This will enable you to experiment with different models and quickly implement a RAG system using Microsoft’s Phi-4 in PostgreSQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exploring Open-Source AI Tools: Phi-4, Microsoft’s Cutting-Edge Open-Source LLM
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090" rel="noopener noreferrer"&gt;&lt;u&gt;Phi-4&lt;/u&gt;&lt;/a&gt;, developed by &lt;a href="https://azure.microsoft.com/en-us/products/phi/" rel="noopener noreferrer"&gt;&lt;u&gt;Microsoft&lt;/u&gt;&lt;/a&gt;, is a compact, open-source large language model with &lt;strong&gt;14.7 billion&lt;/strong&gt;  &lt;strong&gt;parameters&lt;/strong&gt; , a &lt;strong&gt;16K-token context window&lt;/strong&gt; , and &lt;strong&gt;a size of just 9.1 GB&lt;/strong&gt;. It is designed for research on large language models and use in general AI systems, &lt;a href="https://huggingface.co/microsoft/phi-4#intended-use" rel="noopener noreferrer"&gt;&lt;u&gt;primarily focusing on English&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Phi-4 excels in reasoning tasks, especially mathematics, outperforming even larger models like &lt;a href="https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/" rel="noopener noreferrer"&gt;&lt;u&gt;Gemini Pro 1.5&lt;/u&gt;&lt;/a&gt;. Its superior performance stems from its high-quality training data, including synthetic datasets, Q&amp;amp;A datasets, curated academic publications, and filtered public domain websites. This diversity makes Phi-4 a unique option for developers to experiment with, especially for those seeking accuracy and efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dxu9pj69yu7oswgglet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dxu9pj69yu7oswgglet.png" alt="Phi-4 performance on math competition problems. Source: Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning " width="800" height="350"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Phi-4 performance on math competition problems. Source: Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  AI Model Experimentation: Integrating Ollama and Pgai
&lt;/h2&gt;

&lt;p&gt;To fully appreciate the strength of combining Ollama and pgai, let’s first understand what each tool offers individually and explore the potential of their integration.&lt;/p&gt;
&lt;h3&gt;
  
  
  Ollama: Open-source models at your fingertips
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ollama/ollama" rel="noopener noreferrer"&gt;&lt;u&gt;Ollama&lt;/u&gt;&lt;/a&gt; (Omni-Layer Learning Language Acquisition Model) is an open-source tool that provides a unified interface for accessing and running embedding models and LLMs locally. Abstracting API complexities allows developers to focus on building applications without worrying about handling different model endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ollama.com/library" rel="noopener noreferrer"&gt;&lt;u&gt;With Ollama&lt;/u&gt;&lt;/a&gt;, you can easily access &lt;a href="https://ollama.com/library/phi4" rel="noopener noreferrer"&gt;&lt;u&gt;Phi-4&lt;/u&gt;&lt;/a&gt; and other models like &lt;a href="https://ollama.com/library/llama3.2" rel="noopener noreferrer"&gt;&lt;u&gt;Llama 3.2&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://ollama.com/library/mistral" rel="noopener noreferrer"&gt;&lt;u&gt;Mistral&lt;/u&gt;&lt;/a&gt;, making comparisons and experimenting straightforward. Additionally, Ollama simplifies downloading and managing embedding models, such as &lt;a href="https://ollama.com/library/nomic-embed-text" rel="noopener noreferrer"&gt;&lt;u&gt;nomic-embed-text&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://ollama.com/library/mxbai-embed-large" rel="noopener noreferrer"&gt;&lt;u&gt;mxbai-embed-large&lt;/u&gt;&lt;/a&gt;, which can be integrated seamlessly into your workflows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;"Downloading embeddings models for experimenting...."&lt;/span&gt;
&lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="n"&gt;pull&lt;/span&gt; &lt;span class="n"&gt;nomic&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="n"&gt;pull&lt;/span&gt; &lt;span class="n"&gt;mxbai&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;large&lt;/span&gt;

&lt;span class="n"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;"Downloading generative models for experimenting...."&lt;/span&gt;
&lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="n"&gt;pull&lt;/span&gt; &lt;span class="n"&gt;phi4&lt;/span&gt;
&lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="n"&gt;pull&lt;/span&gt; &lt;span class="n"&gt;llama3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pgai: The AI engine within PostgreSQL
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/timescale/pgai" rel="noopener noreferrer"&gt;&lt;u&gt;Pgai&lt;/u&gt;&lt;/a&gt; is an open-source PostgreSQL extension that integrates embedding generation and response workflows into the database. This approach eliminates the need for external pipelines, enabling seamless interaction with your data. Pgai supports various model providers, including &lt;a href="https://github.com/timescale/pgai/blob/main/docs/ollama.md" rel="noopener noreferrer"&gt;&lt;u&gt;Ollama&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://github.com/timescale/pgai/blob/main/docs/openai.md" rel="noopener noreferrer"&gt;&lt;u&gt;OpenAI&lt;/u&gt;&lt;/a&gt;, and &lt;a href="https://github.com/timescale/pgai/blob/main/docs/cohere.md" rel="noopener noreferrer"&gt;&lt;u&gt;Cohere&lt;/u&gt;&lt;/a&gt;, making it a versatile choice for AI development.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why use pgai?
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Familiarity and ease of use&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pgai leverages PostgreSQL, &lt;a href="https://survey.stackoverflow.co/2024/technology#most-popular-technologies-database" rel="noopener noreferrer"&gt;&lt;u&gt;a popular open-source database recognized as the “Best Database” in Stack Overflow’s Developer Survey for two consecutive years&lt;/u&gt;&lt;/a&gt;. Its SQL-based interface ensures that new or experienced developers can execute AI-related functions intuitively, smoothing the transition to AI development. Moreover, using PostgreSQL as your vector database ensures efficient data management by eliminating redundancy and allowing you to store your data alongside embeddings in a unified system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://github.com/timescale/pgai/blob/90e07b1ab45275215a69e7c190a23d08f1482666/docs/ollama.md" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;u&gt;Ollama API integration&lt;/u&gt;&lt;/strong&gt;&lt;/a&gt; &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pgai abstracts the complexities of &lt;a href="https://github.com/ollama/ollama/blob/main/docs/api.md" rel="noopener noreferrer"&gt;&lt;u&gt;Ollama’s API&lt;/u&gt;&lt;/a&gt;, allowing developers to focus on building applications without technical hurdles. This integration ensures steady progress, whether you’re a novice or a seasoned engineer.&lt;/p&gt;

&lt;p&gt;Before continuing, let’s set up our PostgreSQL and install the necessary extensions. This tutorial will use &lt;a href="https://huggingface.co/datasets/sgoel9/paul_graham_essays" rel="noopener noreferrer"&gt;&lt;u&gt;Paul Graham's essays&lt;/u&gt;&lt;/a&gt; as the source data. Pgai functions exist in the ai schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Install the pgai extension&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt; &lt;span class="k"&gt;CASCADE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Configure at the session level the host from which Ollama is served &lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai.ollama_host'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'http://host.for.ollama:port'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create the source table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;essays&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Load dataset from Hugging Face&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'sgoel9/paul_graham_essays'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'essays'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;if_table_exists&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'append'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pgai Vectorizer: Experiment with embedding models
&lt;/h3&gt;

&lt;p&gt;Pgai also includes &lt;a href="https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md#vectorizer-quick-start" rel="noopener noreferrer"&gt;&lt;u&gt;pgai Vectorizer&lt;/u&gt;&lt;/a&gt;, which &lt;a href="https://www.timescale.com/blog/how-to-automatically-create-update-embeddings-in-postgresql" rel="noopener noreferrer"&gt;&lt;u&gt;automates embedding generation and synchronization with your source data using a single SQL command&lt;/u&gt;&lt;/a&gt;. This feature is invaluable for testing multiple embedding models, saving time and computational resources. Hence, you can use it with Ollama to configure vectorizers that use open-source embedding models and then compare their individual performances and performances with different generative models in any AI-powered system. You can read about &lt;a href="https://www.timescale.com/blog/finding-the-best-open-source-embedding-model-for-rag" rel="noopener noreferrer"&gt;&lt;u&gt;the simple evaluation workflow we recently used to find the best open-source embedding model for RAG&lt;/u&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Here’s how to configure vectorizers for embedding models, &lt;code&gt;nomic-embed-text&lt;/code&gt; and &lt;code&gt;mxbai-embed-large&lt;/code&gt;. For more information about setting up the environment, please check out this &lt;a href="https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md" rel="noopener noreferrer"&gt;&lt;u&gt;pgai Vectorizer quick start with Ollama&lt;/u&gt;&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Configure vectorizer using nomic-embed-text&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'essays'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'essays_nomic_embed_embeddings'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_ollama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'nomic-embed-text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;chunking&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunking_recursive_character_text_splitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;formatting&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatting_python_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'title: $title $chunk'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Configure vectorizer using mxbai-embed-large&lt;/span&gt;
 &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'essays'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'essays_mxbai_embed_large_embeddings'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- name of the view&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_ollama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'mxbai-embed-large'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;chunking&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunking_recursive_character_text_splitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;formatting&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatting_python_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'title: $title $chunk'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, each vectorizer creates &lt;strong&gt;an embeddings table&lt;/strong&gt; , named “ &lt;strong&gt;destination_store&lt;/strong&gt; ,” which stores the generated vector embeddings. &lt;strong&gt;view&lt;/strong&gt; , whose name is the destination parameter, that &lt;strong&gt;combines information from the source table, essays, and this newly created embeddings table&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;The following SQL function, &lt;code&gt;generate_rag_response&lt;/code&gt;, facilitates vector search on a specified view linked to one of the generated embedding tables. It leverages the vector representation of the user’s query, created using the pgai function, &lt;a href="https://github.com/timescale/pgai/blob/main/docs/ollama.md#embed" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;ai.ollama_embed&lt;/code&gt;&lt;/u&gt;&lt;/a&gt;, and the same embedding model as the table it searches. This ensures consistency in representation and retrieves the most relevant chunks.&lt;/p&gt;

&lt;p&gt;Then, the retrieved chunks and the user’s query are passed to the generative model alongside the user’s query for response generation using another pgai function, &lt;a href="https://github.com/timescale/pgai/blob/main/docs/ollama.md#chat-complete" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;ai.ollama_chat_complete&lt;/code&gt;&lt;/u&gt;&lt;/a&gt;, which hits the &lt;a href="https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;/api/generate&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; endpoint from Ollama.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;generate_rag_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query_text&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embeddings_view&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'essays_nomic_embed_embeddings'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'nomic-embed-text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;generative_model&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'phi4'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_limit&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;DECLARE&lt;/span&gt;
   &lt;span class="n"&gt;context_chunks&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'You are a helpful assistant. Provide accurate, well-reasoned responses based on the given context. If the context is insufficient to answer the question, say so.'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
   &lt;span class="c1"&gt;-- Perform similarity search to find relevant text chunks&lt;/span&gt;
   &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;string_agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;': '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;context_chunks&lt;/span&gt;
   &lt;span class="k"&gt;FROM&lt;/span&gt;
   &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
       &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;embeddings_view&lt;/span&gt; 
       &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ollama_embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="n"&gt;chunk_limit&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;relevant_posts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

   &lt;span class="c1"&gt;-- Generate a chat response using Phi-4&lt;/span&gt;
   &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ollama_chat_complete&lt;/span&gt;
   &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;generative_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="n"&gt;jsonb_build_array&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'system'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;
       &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_text&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;Use the following context to respond.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;context_chunks&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'message'&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

   &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function is versatile and supports experimentation with both embedding and generative models. &lt;strong&gt;To experiment with embedding models&lt;/strong&gt; , you can fix the generative model while switching between different embedding-generated views. Conversely, &lt;strong&gt;to test various generative models&lt;/strong&gt; , you can fix the embedding model and then alternate between different generative models.&lt;/p&gt;

&lt;p&gt;Here are examples for each use case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Experimenting with embedding models (mxbai-embed-large and Phi-4)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_rag_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'Give me some startup advice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="n"&gt;embedding_view&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'essays_mxbai_embed_large_embeddings'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
     &lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'mxbai-embed-large'&lt;/span&gt;  
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Experimenting with generative models (nomic-embed-text and Llama 3.2)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_rag_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'Give me some startup advice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
     &lt;span class="n"&gt;generative_model&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'llama3.2'&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Open-source AI tools like Ollama and pgai make experimenting with embedding and generative models intuitive and efficient. By leveraging these AI tools alongside Microsoft’s Phi-4 and PostgreSQL, you can rapidly prototype and implement AI-powered applications while retaining complete control over your data. Whether comparing open-source AI models or deploying a robust RAG system, this stack allows you to innovate with ease and speed. &lt;/p&gt;

&lt;p&gt;If you’re building AI applications, explore &lt;a href="https://www.timescale.com/ai" rel="noopener noreferrer"&gt;&lt;u&gt;Timescale’s complete open-source AI stack&lt;/u&gt;&lt;/a&gt; or head to &lt;a href="https://github.com/timescale/pgai" rel="noopener noreferrer"&gt;&lt;u&gt;pgai&lt;/u&gt;&lt;/a&gt;’s GitHub repository to start bringing AI workflows into PostgreSQL—no need to leave your database. If you find it helpful, we would love your support—leave us a ⭐!  &lt;/p&gt;

&lt;h3&gt;
  
  
  Further reading
&lt;/h3&gt;

&lt;p&gt;Want to learn more? Check out these blog posts and resources about open-source AI models and tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.timescale.com/blog/the-emerging-open-source-ai-stack" rel="noopener noreferrer"&gt;&lt;u&gt;Stop Paying the OpenAI Tax: The Emerging Open-Source AI Stack&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.timescale.com/blog/open-source-vs-openai-embeddings-for-rag" rel="noopener noreferrer"&gt;&lt;u&gt;Evaluating Open-Source vs. OpenAI Embeddings for RAG: A How-To Guide&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.timescale.com/blog/which-openai-embedding-model-is-best" rel="noopener noreferrer"&gt;&lt;u&gt;Which OpenAI Embedding Model Is Best for Your RAG App With Pgvector?&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://youtu.be/-ikCYKcPoqU?si=vS_WE7-6UbGFRz3U" rel="noopener noreferrer"&gt;&lt;u&gt;Local RAG Using Llama 3, Ollama, and PostgreSQL&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction" rel="noopener noreferrer"&gt;&lt;u&gt;Vector Databases Are the Wrong Abstraction&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>postgres</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
