<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: dhritich20baruah</title>
    <description>The latest articles on DEV Community by dhritich20baruah (@dhritich20baruah).</description>
    <link>https://dev.to/dhritich20baruah</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2213000%2F27ddd65b-5e62-4e5d-8327-fae70b123f79.jpeg</url>
      <title>DEV Community: dhritich20baruah</title>
      <link>https://dev.to/dhritich20baruah</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhritich20baruah"/>
    <language>en</language>
    <item>
      <title>How I Built a Secure, 3,072-Dim AI Document Indexer Using Next.js &amp; Supabase.</title>
      <dc:creator>dhritich20baruah</dc:creator>
      <pubDate>Wed, 20 May 2026 03:20:19 +0000</pubDate>
      <link>https://dev.to/dhritich20baruah/how-i-built-a-secure-3072-dim-ai-document-indexer-using-nextjs-supabase-1bbf</link>
      <guid>https://dev.to/dhritich20baruah/how-i-built-a-secure-3072-dim-ai-document-indexer-using-nextjs-supabase-1bbf</guid>
      <description>&lt;p&gt;Building a production-ready RAG (Retrieval-Augmented Generation) application from scratch is a very difficult and time consuming.&lt;/p&gt;

&lt;p&gt;You first get a new idea for a project or business but before you can write a single line of your actual core business logic, you find yourself spending weeks fighting with multimodal file parsing, configuring vector extensions, designing complex database architecture, and securing multi-tenant storage.&lt;/p&gt;

&lt;p&gt;The idea behind building DocuIntel was to solve that exact infrastructure headache. I wanted a template where I could just drop in my API keys, run a single setup script, and have a fully functioning, secure AI document portal ready to go.&lt;/p&gt;

&lt;p&gt;Here is a deep dive into the architecture, some of the technical hurdles I ran into, and how I solved them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Multimodal Architecture&lt;/strong&gt;&lt;br&gt;
DocuIntel needs to process a wide variety of inputs—PDFs, Word docs, images (JPG/PNG), and even audio transcriptions (MP3s).&lt;/p&gt;

&lt;p&gt;At first I was routing different file types through different third party parsing libraries like I was using tesseract OCR for extracting text from from images and pdf-parser for PDF files.&lt;br&gt;
But instead of doing that, I offloaded the heavy lifting directly to Gemini 2.0 Flash. Gemini’s native multimodal capabilities handle OCR and layout analysis beautifully.&lt;/p&gt;

&lt;p&gt;To ensure consistent, high-quality search results, I hardcoded a rigorous master SYSTEM_PROMPT inside the backend processing route (/api/process/route.ts). It forces the AI to behave like a clean data architect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
  You are an expert document parser for DocuIntel. 
  Your goal is to extract text with 100% accuracy for semantic search indexing.

  RULES:
  1. PRESERVE STRUCTURE: Use Markdown for headings, subheadings, and lists.
  2. TABLES: Convert all data tables into clean Markdown table format.
  3. NO CHAT: Do not say "Here is the text" or "I have processed the file." 
  4. NOISE REDUCTION: Ignore headers, footers, and page numbers.
  5. SMART METADATA: At the very end of your output, add a section called '---METADATA---' 
     and list 5-10 key entities or topics (e.g., 'Company: Acme Corp', 'Date: 2024-01-01').
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why the Automated Metadata Tag Matters?&lt;br&gt;
By forcing Gemini to extract smart metadata tags right into the text, the vector embedding model (gemini-embedding-2) captures high-value concepts. If a user searches for a specific company or date, the semantic search will surface the document even if that detail only appeared once in fine print.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Navigating the New Supabase Security Standards&lt;/strong&gt;&lt;br&gt;
Supabase has shifted to a "Secure by Default" model which will come into force by 30th May 2026.&lt;/p&gt;

&lt;p&gt;Previously, creating a table in the public schema automatically exposed it to the Data API. Now, new tables require explicit grants, or your frontend client libraries (supabase-js) will get a 42501 permission error—even if Row-Level Security (RLS) is enabled.&lt;/p&gt;

&lt;p&gt;To future-proof the setup, my SQL initialization script applies explicit grants directly to the authenticated user roles right after creating the tables and vector search RPC functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create the documents table with 3,072-dimensional vector support&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;uuid_generate_v4&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;file_name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;file_url&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;               
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;references&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;user_email&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3072&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;     &lt;span class="c1"&gt;-- Optimized for high-res gemini-embedding-2&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;zone&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'utc'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- EXPLICIT GRANTS (Fixes 42501 Permission Errors)&lt;/span&gt;
&lt;span class="k"&gt;grant&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;delete&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;authenticated&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;grant&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;service_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hardened Multi-Tenant Storage Policies&lt;/strong&gt;&lt;br&gt;
Since the app deals with private user documents i.e. User A must never be able to discover or access User B's files, I have structured the Supabase Storage bucket so that every uploaded file is dynamically sandboxed into a folder named exactly after the user's unique authenticated ID (auth.uid()).&lt;/p&gt;

&lt;p&gt;Here are the folder-level RLS storage policies that enforce that rule on every single request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Restrict file uploads to the user's own UID folder&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="nv"&gt;"Users can upload their own documents"&lt;/span&gt;
&lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;authenticated&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;check&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;bucket_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'documents'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; 
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;foldername&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Restrict file reading to the user's own UID folder&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="nv"&gt;"Users can view their own documents"&lt;/span&gt;
&lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;authenticated&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;bucket_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'documents'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; 
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;foldername&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Keeping the Frontend Clean &amp;amp; Readable&lt;/strong&gt;&lt;br&gt;
On the frontend (Next.js 14 + TypeScript + Tailwind CSS), I wanted a highly scalable UI. A minor but common mess I see in boilerplate projects is massive, nested ternary operators in the JSX to handle dynamic file icons.&lt;/p&gt;

&lt;p&gt;To keep things clean and performant, I grouped the file extension arrays and used standard JavaScript .includes() methods to dynamically assign Lucide React icons using a single class utility string:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa167rkfwmmnep3bb8xdl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa167rkfwmmnep3bb8xdl.jpg" alt=" " width="799" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Wrapping Up:&lt;br&gt;
Building these layers from scratch took extensive testing, debugging environment variables, and reading through updated security documentation. But once it's configured, it works like magic: a user uploads an asset, Gemini reads and indexes it with a 3,072-dimension vector embedding, and you can instantly query your documents conceptually rather than just matching keywords.&lt;/p&gt;

&lt;p&gt;If you are planning to build an AI document product for a client or launching your own micro-SaaS, you don't have to spend your weekend configuring this infrastructure from zero.&lt;/p&gt;

&lt;p&gt;I've packaged this exact production-ready foundation—complete with the 1-click database initialization script, Next.js frontend, and pre-configured API routes—into a developer-friendly boilerplate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dhritiman.gumroad.com/l/docuintel" rel="noopener noreferrer"&gt;https://dhritiman.gumroad.com/l/docuintel&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nextjs</category>
      <category>rag</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
