<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Malik</title>
    <description>The latest articles on DEV Community by Malik (@malik_the_dev).</description>
    <link>https://dev.to/malik_the_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3947543%2F58e1be1d-03c7-4211-9e86-dff1ab76c116.png</url>
      <title>DEV Community: Malik</title>
      <link>https://dev.to/malik_the_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/malik_the_dev"/>
    <language>en</language>
    <item>
      <title>LocalFind Gemma — AI-Powered Semantic Search and Chat for Your Local Files</title>
      <dc:creator>Malik</dc:creator>
      <pubDate>Sat, 23 May 2026 19:29:07 +0000</pubDate>
      <link>https://dev.to/malik_the_dev/localfind-gemma-ai-powered-semantic-search-and-chat-for-your-local-files-4fi9</link>
      <guid>https://dev.to/malik_the_dev/localfind-gemma-ai-powered-semantic-search-and-chat-for-your-local-files-4fi9</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;LocalFind Gemma is a fully local, privacy-first semantic search engine for your own files — documents, images, and audio — powered by Gemma 4 running on Ollama.&lt;/p&gt;

&lt;p&gt;Most search tools match filenames or keywords. LocalFind Gemma understands &lt;em&gt;content&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Images indexed by what's in them&lt;/strong&gt; — Gemma 4 captions every image at sync time so you can
search "whiteboard with the system architecture diagram" or "receipt from the coffee shop" and
actually find it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent that reads images to answer questions&lt;/strong&gt; — ask "how much does that invoice say?" and the
agent finds the image, sends it to Gemma 4 vision, and gives you &lt;em&gt;the number&lt;/em&gt;, not a file path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio fully searchable&lt;/strong&gt; — Whisper transcribes recordings at index time so you can search
across hours of meetings by what was &lt;em&gt;said&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-lingual search&lt;/strong&gt; — the &lt;code&gt;nomic-embed-text-v2-moe&lt;/code&gt; embedding model supports ~100
languages in a shared vector space. Search in French, find English documents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Supported file types: PDF, DOCX, TXT, MD, CSV, JPG, PNG, GIF, BMP, WEBP, MP3, WAV, FLAC, M4A.&lt;/p&gt;

&lt;p&gt;Everything — Gemma 4, Whisper, the ChromaDB vector store — runs on your machine. No API keys, no cloud, no data leaving your device. There's also an optional Claude Desktop integration via MCP for files you're comfortable sharing with a third party.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpf4rahthf5imuwrhdeid.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpf4rahthf5imuwrhdeid.png" alt=" " width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/60K9hdldK0s"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/maliklovable1-spec/localfind-gemma" rel="noopener noreferrer"&gt;https://github.com/maliklovable1-spec/localfind-gemma&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;Gemma 4 isn't just the chat model here — it's active at three distinct points in the pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Index time: captioning every image&lt;/strong&gt;&lt;br&gt;
When you sync a folder, each image is sent to Gemma 4 via Ollama's vision API. The caption is embedded and stored permanently in ChromaDB. Future searches use the stored caption; the model isn't called again unless you re-sync. This means fast search without repeated inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Agent reasoning and tool use&lt;/strong&gt;&lt;br&gt;
The conversational agent runs on &lt;code&gt;gemma4:e4b&lt;/code&gt; (the recommended default). It decides when to search, what query to issue, and how to synthesise results into a direct answer rather than just returning file paths.&lt;/p&gt;

&lt;p&gt;I chose &lt;strong&gt;e4b&lt;/strong&gt; over e2b because it follows tool-use instructions more reliably — which matters a lot in an agentic loop where the model needs to decide between search, image reading, and response synthesis. e2b is also supported for users with less RAM (~12 GB vs 16 GB).&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmo8ft5mwxnjpr4ql56m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmo8ft5mwxnjpr4ql56m.png" alt=" " width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Live image reading&lt;/strong&gt;&lt;br&gt;
When the agent finds an image relevant to your question, it sends the image bytes directly to Ollama's native &lt;code&gt;/api/chat&lt;/code&gt; API with your question as context. Gemma 4 reads the image and the agent uses that to answer you. The bytes go from your disk to your local Ollama process —nowhere else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A note on audio&lt;/strong&gt;&lt;br&gt;
Gemma 4 E2B and E4B natively support audio transcription at the architecture level — multilingual, up to 30 seconds, built into the model. LocalFind Gemma currently uses Whisper for audio because&lt;br&gt;
Ollama doesn't expose audio input via its API yet. Once Ollama ships that support &lt;br&gt;
([issue #11798(&lt;a href="https://github.com/ollama/ollama/issues/11798)" rel="noopener noreferrer"&gt;https://github.com/ollama/ollama/issues/11798)&lt;/a&gt;), the transcription backend can&lt;br&gt;
switch to Gemma 4 — the architecture is already designed with that transition in mind, though it will require some code changes depending on how Ollama exposes the audio API.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
