<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Avraham Aminov</title>
    <description>The latest articles on DEV Community by Avraham Aminov (@avraham_aminov_542e8309b6).</description>
    <link>https://dev.to/avraham_aminov_542e8309b6</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3915837%2F03b23a7d-4827-4f2c-a09d-348b62e442d3.jpg</url>
      <title>DEV Community: Avraham Aminov</title>
      <link>https://dev.to/avraham_aminov_542e8309b6</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/avraham_aminov_542e8309b6"/>
    <language>en</language>
    <item>
      <title>Building a Local AI SEO Agent with Gemma, Ollama, Docker, and React</title>
      <dc:creator>Avraham Aminov</dc:creator>
      <pubDate>Thu, 07 May 2026 21:10:08 +0000</pubDate>
      <link>https://dev.to/avraham_aminov_542e8309b6/building-a-local-ai-seo-agent-with-gemma-ollama-docker-and-react-303j</link>
      <guid>https://dev.to/avraham_aminov_542e8309b6/building-a-local-ai-seo-agent-with-gemma-ollama-docker-and-react-303j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;For the Gemma 4 Challenge, I built &lt;strong&gt;Local AI SEO Agent&lt;/strong&gt;: a privacy-friendly SEO audit tool that runs AI analysis locally with Gemma.&lt;/p&gt;

&lt;p&gt;The app takes a public webpage URL, scans the page for technical SEO signals, sends a compact structured summary to Gemma through Ollama, validates the model response, and displays a practical SEO report.&lt;/p&gt;

&lt;p&gt;The main constraint was intentional: &lt;strong&gt;no cloud AI APIs&lt;/strong&gt;. The AI layer runs locally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j45sfovijnqot8zojqq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j45sfovijnqot8zojqq.png" alt=" " width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Local AI For SEO
&lt;/h2&gt;

&lt;p&gt;SEO audits often include page metadata, headings, links, schema, content structure, and recommendations. That data can be sensitive for businesses, agencies, and in-progress websites.&lt;/p&gt;

&lt;p&gt;Cloud AI can be useful, but for this project I wanted to avoid:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sending page audit data to an external AI provider&lt;/li&gt;
&lt;li&gt;paying per token or per request&lt;/li&gt;
&lt;li&gt;depending on a remote inference API&lt;/li&gt;
&lt;li&gt;building a demo that only works with a hosted service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Local AI fits this workflow well because the task is bounded. The backend extracts facts, then Gemma reasons over those facts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What The App Does
&lt;/h2&gt;

&lt;p&gt;The product flow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;URL + model mode -&amp;gt; SEO scan -&amp;gt; Gemma analysis -&amp;gt; validated JSON -&amp;gt; report UI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deterministic scanner extracts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;title and meta description&lt;/li&gt;
&lt;li&gt;canonical, robots, and viewport tags&lt;/li&gt;
&lt;li&gt;heading structure&lt;/li&gt;
&lt;li&gt;image alt coverage&lt;/li&gt;
&lt;li&gt;internal, external, and empty link counts&lt;/li&gt;
&lt;li&gt;Open Graph tags&lt;/li&gt;
&lt;li&gt;JSON-LD schema count&lt;/li&gt;
&lt;li&gt;visible text length and word count&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma generates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SEO score&lt;/li&gt;
&lt;li&gt;summary&lt;/li&gt;
&lt;li&gt;critical issues&lt;/li&gt;
&lt;li&gt;medium issues&lt;/li&gt;
&lt;li&gt;recommendations&lt;/li&gt;
&lt;li&gt;suggested title&lt;/li&gt;
&lt;li&gt;suggested meta description&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The UI also exposes product-oriented controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast / Quality model mode&lt;/li&gt;
&lt;li&gt;runtime metrics&lt;/li&gt;
&lt;li&gt;cache status&lt;/li&gt;
&lt;li&gt;SEO health badges&lt;/li&gt;
&lt;li&gt;Copy report&lt;/li&gt;
&lt;li&gt;Download JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p1mf116oadmhfl5em4z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p1mf116oadmhfl5em4z.png" alt=" " width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hvd1x0py5v7g78h6sbb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hvd1x0py5v7g78h6sbb.png" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Gemma Is Used
&lt;/h2&gt;

&lt;p&gt;I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemma4:e4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;through Ollama.&lt;/p&gt;

&lt;p&gt;Gemma is the reasoning layer of the product. It does not fetch websites and it does not parse HTML. Instead, it receives a structured SEO summary from the backend and converts those signals into a human-readable audit.&lt;/p&gt;

&lt;p&gt;That means the AI has a focused job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;structured SEO facts -&amp;gt; prioritized SEO recommendations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I selected &lt;code&gt;gemma4:e4b&lt;/code&gt; because it is stronger than the smallest edge variant while still being practical for local development. In my local Docker setup, a full audit generally takes around 1-2 minutes depending on whether the model is already loaded.&lt;/p&gt;

&lt;p&gt;I later added a mode selector:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Fast&lt;/code&gt; uses &lt;code&gt;gemma4:e2b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Quality&lt;/code&gt; uses &lt;code&gt;gemma4:e4b&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes the local AI tradeoff visible to users instead of hiding it.&lt;/p&gt;

&lt;p&gt;The report also shows the exact model, prompt version, cache state, scan time, Gemma time, and total runtime. That made performance more transparent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Mode: Fast
Model: gemma4:e2b
Cache: Miss
Scan: 946ms
Gemma: 59s
Total: 1m 0s
Prompt: seo-audit-v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For comparison, &lt;code&gt;Quality&lt;/code&gt; mode with &lt;code&gt;gemma4:e4b&lt;/code&gt; produced deeper analysis but took longer on my machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The app has three main parts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;React UI
  -&amp;gt; Express API
  -&amp;gt; SEO scanner
  -&amp;gt; prompt builder
  -&amp;gt; Ollama
  -&amp;gt; Gemma
  -&amp;gt; JSON validator
  -&amp;gt; report UI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontend never talks directly to Ollama. It only calls the backend.&lt;/p&gt;

&lt;p&gt;The backend owns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;URL validation&lt;/li&gt;
&lt;li&gt;website fetching&lt;/li&gt;
&lt;li&gt;HTML parsing&lt;/li&gt;
&lt;li&gt;prompt building&lt;/li&gt;
&lt;li&gt;Ollama communication&lt;/li&gt;
&lt;li&gt;AI response validation&lt;/li&gt;
&lt;li&gt;report caching&lt;/li&gt;
&lt;li&gt;report formatting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation made the project easier to reason about. The scanner extracts facts, Gemma interprets them, and the frontend presents the final report.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backend Scanner
&lt;/h2&gt;

&lt;p&gt;The scanner uses Axios to fetch the HTML and Cheerio to parse it.&lt;/p&gt;

&lt;p&gt;Example scanner summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Auto Locksmith London - 2,000+ Reviews | Car Key Replacement"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"metaDescriptionLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;155&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"headings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"counts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"h1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"h2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"images"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"missingAlt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"wordCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;878&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend also rejects risky input such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;localhost URLs&lt;/li&gt;
&lt;li&gt;loopback IP addresses&lt;/li&gt;
&lt;li&gt;private network IP addresses&lt;/li&gt;
&lt;li&gt;malformed URLs&lt;/li&gt;
&lt;li&gt;unsupported protocols&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because the backend fetches user-provided URLs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt And JSON Validation
&lt;/h2&gt;

&lt;p&gt;The prompt tells Gemma to return JSON only.&lt;/p&gt;

&lt;p&gt;Required output shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Short SEO summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"criticalIssues"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mediumIssues"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommendations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"suggestedTitle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"suggestedMetaDescription"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend validates the response with Zod before returning it to the frontend.&lt;/p&gt;

&lt;p&gt;If Gemma returns malformed JSON, missing required fields, or an invalid score, the API returns a clean error instead of rendering unreliable data.&lt;/p&gt;

&lt;p&gt;I also reduced the prompt size by sending a scanner summary instead of the full raw scan object. That made local inference more predictable.&lt;/p&gt;

&lt;p&gt;The API response includes runtime metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"runtime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"quality"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemma4:e4b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"localAi"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"scanDurationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"aiDurationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"totalDurationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;91200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cacheHit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"promptVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"seo-audit-v1"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07at9sawhipk0dbtg1gq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07at9sawhipk0dbtg1gq.png" alt=" " width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx4ni9tj6hpmqrqjieoxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx4ni9tj6hpmqrqjieoxl.png" alt=" " width="800" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frontend
&lt;/h2&gt;

&lt;p&gt;The frontend is built with React, TypeScript, Vite, and TailwindCSS.&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;URL input&lt;/li&gt;
&lt;li&gt;Fast / Quality model selector&lt;/li&gt;
&lt;li&gt;loading state with elapsed time&lt;/li&gt;
&lt;li&gt;SEO score card&lt;/li&gt;
&lt;li&gt;SEO health badges&lt;/li&gt;
&lt;li&gt;runtime badges&lt;/li&gt;
&lt;li&gt;summary panel&lt;/li&gt;
&lt;li&gt;issue lists&lt;/li&gt;
&lt;li&gt;recommendations&lt;/li&gt;
&lt;li&gt;suggested metadata&lt;/li&gt;
&lt;li&gt;scan highlights&lt;/li&gt;
&lt;li&gt;Copy report action&lt;/li&gt;
&lt;li&gt;Download JSON action&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The loading state is important because local inference can take time, especially on the first request when Ollama loads the model into memory.&lt;/p&gt;

&lt;p&gt;I also added in-memory caching by URL and model. When the same URL is analyzed again with the same mode, the app can return the previous report immediately and mark it as a cache hit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fibi2fxdlqaiuzsgqgz2y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fibi2fxdlqaiuzsgqgz2y.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey9s0pdk7ybes7cm3vo0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey9s0pdk7ybes7cm3vo0.png" alt=" " width="800" height="117"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3j7t4fyn02qveb8le5n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3j7t4fyn02qveb8le5n.png" alt=" " width="800" height="210"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker Setup
&lt;/h2&gt;

&lt;p&gt;The project runs with Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The services are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;frontend&lt;/li&gt;
&lt;li&gt;backend&lt;/li&gt;
&lt;li&gt;ollama&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;frontend: &lt;code&gt;http://localhost:5174&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;backend: &lt;code&gt;http://localhost:3001&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Ollama: &lt;code&gt;http://localhost:11435&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After starting the containers, pull the model into the Ollama service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull gemma4:e4b
docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull gemma4:e2b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rwq112h92qe4fxgpuek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rwq112h92qe4fxgpuek.png" alt=" " width="800" height="89"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges
&lt;/h2&gt;

&lt;p&gt;The biggest challenge was local model latency.&lt;/p&gt;

&lt;p&gt;The scanner is fast, but local inference with a 9.6GB model is hardware-dependent. The first request can be slow because Ollama needs to load the model into memory.&lt;/p&gt;

&lt;p&gt;I handled this by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;increasing the Ollama request timeout&lt;/li&gt;
&lt;li&gt;adding a clearer loading state&lt;/li&gt;
&lt;li&gt;reducing prompt size&lt;/li&gt;
&lt;li&gt;adding Fast and Quality model modes&lt;/li&gt;
&lt;li&gt;adding a short-lived in-memory report cache&lt;/li&gt;
&lt;li&gt;validating AI output carefully&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another challenge was keeping the AI output predictable. Asking for JSON is not enough by itself, so the backend validates the response and normalizes safe optional fields.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Local AI works well when the task is clearly bounded.&lt;/p&gt;

&lt;p&gt;For this project, Gemma does not need to browse the web or guess what is on the page. The scanner gives it structured facts, and the model focuses on interpretation.&lt;/p&gt;

&lt;p&gt;The pattern I liked most was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deterministic extraction + local AI reasoning + strict validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That feels like a practical way to use local models in developer tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;I intentionally kept the MVP focused on one-page analysis.&lt;/p&gt;

&lt;p&gt;Future improvements could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-page crawling&lt;/li&gt;
&lt;li&gt;sitemap support&lt;/li&gt;
&lt;li&gt;report history&lt;/li&gt;
&lt;li&gt;PDF export&lt;/li&gt;
&lt;li&gt;Lighthouse integration&lt;/li&gt;
&lt;li&gt;browser extension&lt;/li&gt;
&lt;li&gt;WordPress plugin&lt;/li&gt;
&lt;li&gt;persistent cache or saved reports&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Repository
&lt;/h2&gt;

&lt;p&gt;GitHub:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/avi-aminov/local-ai-seo-agent" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Local AI SEO Agent shows how Gemma can power a real developer tool without relying on cloud AI APIs.&lt;/p&gt;

&lt;p&gt;The project combines deterministic SEO scanning with local AI reasoning, validates the model output, and presents the result in a clean web UI.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
