<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yoges Mohan</title>
    <description>The latest articles on DEV Community by Yoges Mohan (@yoges_mohan_511bda5afbe7d).</description>
    <link>https://dev.to/yoges_mohan_511bda5afbe7d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3034195%2F441825f1-811e-48cf-91b5-8d8a7c724621.jpg</url>
      <title>DEV Community: Yoges Mohan</title>
      <link>https://dev.to/yoges_mohan_511bda5afbe7d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yoges_mohan_511bda5afbe7d"/>
    <language>en</language>
    <item>
      <title>Finishing What I Started: AutoDoc: AI-Powered OpenAPI Docs from C# 🤖</title>
      <dc:creator>Yoges Mohan</dc:creator>
      <pubDate>Thu, 28 May 2026 07:55:03 +0000</pubDate>
      <link>https://dev.to/yoges_mohan_511bda5afbe7d/finishing-what-i-started-autodoc-ai-powered-openapi-docs-from-c-25g2</link>
      <guid>https://dev.to/yoges_mohan_511bda5afbe7d/finishing-what-i-started-autodoc-ai-powered-openapi-docs-from-c-25g2</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-05-21"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built:
&lt;/h2&gt;

&lt;p&gt;AutoDoc is a tool that automatically generates enriched OpenAPI 3.0.3 documentation from raw C# ASP.NET Core controller code using a local AI model - no manual annotations, no attribute decorators, no maintenance required.&lt;/p&gt;

&lt;p&gt;The idea came from a real frustration I noticed during my software engineering internship. Swagger documentation in .NET projects only documents what developers explicitly declare. Miss an attribute, forget an error response, skip a summary and your docs silently drift from reality. Developers spend time writing documentation instead of writing code.&lt;/p&gt;

&lt;p&gt;AutoDoc fixes this by reading raw C# controller code directly and inferring everything automatically: routes, HTTP methods, parameters, response codes, error schemas, and operation summaries.&lt;/p&gt;

&lt;p&gt;The project was built in 3 phases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Phase 1 - Console App&lt;/td&gt;
&lt;td&gt;Paste controller code, get OpenAPI YAML in the terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 2 - Docker Web API&lt;/td&gt;
&lt;td&gt;Containerised REST backend, call via HTTP POST, get YAML back&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 3 - Playground UI&lt;/td&gt;
&lt;td&gt;Browser dashboard: paste code, see Raw YAML + Swagger Preview instantly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What started as a terminal script became a fully containerised, browser-accessible API documentation tool and this challenge is what pushed me to finish it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo:
&lt;/h2&gt;

&lt;p&gt;How it works: paste any C# ASP.NET Core controller, click generate, get full OpenAPI docs instantly.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 1 : Paste your controller and click Generate
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwvggs1otrh2u00ljb0v7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwvggs1otrh2u00ljb0v7.png" alt=" " width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AutoDoc Playground UI is a split-panel dashboard. On the left, you paste any ASP.NET Core controller code - no modifications needed. The sample &lt;code&gt;TodoController&lt;/code&gt; comes pre-filled so you can test immediately. Hit Generate OpenAPI Docs and AutoDoc sends the raw code to the AI backend running locally via Ollama.No attributes. No decorators. No &lt;code&gt;[ProducesResponseType]&lt;/code&gt;. Just raw C# code.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2 - Raw YAML is generated instantly
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Input: this is the only code AutoDoc received:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ApiController&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"api/[controller]"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TodoController&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ControllerBase&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;HttpGet&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;IActionResult&lt;/span&gt; &lt;span class="nf"&gt;GetAll&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"Task 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Task 2"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;[ProducesResponseType]&lt;/code&gt;. No &lt;code&gt;[SwaggerOperation]&lt;/code&gt;. No XML comments.&lt;br&gt;
No manual annotations of any kind. Just raw C# code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output: full OpenAPI 3.0.3 spec generated automatically:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d177fr1gxgqeb57wh9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d177fr1gxgqeb57wh9e.png" alt=" " width="800" height="816"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AI inferred everything automatically:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What was inferred&lt;/th&gt;
&lt;th&gt;How&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;operationId&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Unique names like &lt;code&gt;GetAllTodos&lt;/code&gt;, &lt;code&gt;CreateTodo&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tags&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Grouped by controller name &lt;code&gt;TodoController&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;summary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Human-readable descriptions per endpoint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;requestBody&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Schema inferred from method parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;responses&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;200, 201, 400, 404, 500 from code logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;components/schemas/ErrorResponse&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defined once, referenced everywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;none of this existed in the original controller code.&lt;/p&gt;


&lt;h3&gt;
  
  
  Step 3 - Swagger Preview: POST endpoint
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjx7mg7sqg3a0t6856obt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjx7mg7sqg3a0t6856obt.png" alt=" " width="800" height="780"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Switching to the Swagger Preview tab renders the YAML as a live interactive Swagger UI. The POST &lt;code&gt;/api/Todo&lt;/code&gt; endpoint shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A required request body with example JSON schema.&lt;/li&gt;
&lt;li&gt;201 Created response with the created resource structure.&lt;/li&gt;
&lt;li&gt;Internal Server Error fallback - inferred automatically, not declared anywhere in the controller.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 4 - Swagger Preview: GET endpoint
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafoktd7a54f7v8yshxm8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafoktd7a54f7v8yshxm8.png" alt=" " width="800" height="759"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The GET &lt;code&gt;/api/Todo&lt;/code&gt; endpoint shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;200 OK response returning an array - inferred from &lt;code&gt;Ok(new[] { "Task 1", "Task 2" })&lt;/code&gt;.
Internal Server Error fallback on every endpoint automatically.&lt;/li&gt;
&lt;li&gt;Try it out button - fully interactive, ready to fire real requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI understood what &lt;code&gt;Ok(new[] { ... })&lt;/code&gt; returns and documented it correctly as an array schema.&lt;/p&gt;


&lt;h3&gt;
  
  
  Step 5 - Auto-generated Schemas
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbte4fekce6nha2dsifwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbte4fekce6nha2dsifwi.png" alt=" " width="800" height="179"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the bottom of the Swagger Preview, the &lt;strong&gt;Schemas&lt;/strong&gt; section shows the auto-generated &lt;code&gt;ErrorResponse&lt;/code&gt; schema with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;code&lt;/code&gt; : integer&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;message&lt;/code&gt; : string&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This schema was never defined anywhere in the C# controller. AutoDoc inferred it from the pattern of error responses across all endpoints and generated it once, referenced consistently throughout the entire spec.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Comeback Story:
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Where it started
&lt;/h3&gt;

&lt;p&gt;AutoDoc began as a simple console application - a proof of concept to answer 1 question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Can an AI model read raw C# controller code and generate valid OpenAPI documentation without any manual annotations?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The answer was yes. But the Phase 1 console app was far from finished:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What existed&lt;/th&gt;
&lt;th&gt;What was missing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Console app that called Ollama locally&lt;/td&gt;
&lt;td&gt;No HTTP interface - unusable by others&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw YAML printed to terminal&lt;/td&gt;
&lt;td&gt;No output validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Basic Llama 3.2 prompt&lt;/td&gt;
&lt;td&gt;No error handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proof the concept worked&lt;/td&gt;
&lt;td&gt;No UI, no Docker, no way to share it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It worked on my machine. That was it.&lt;/p&gt;


&lt;h3&gt;
  
  
  What I changed:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 - From console app to Docker REST API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The console app was converted into a proper ASP.NET Core minimal API with 2 endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;POST /generate-openapi&lt;/code&gt; - receives controller code as JSON, returns enriched OpenAPI YAML.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /health&lt;/code&gt; - confirms the service is running.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;code&gt;Dockerfile&lt;/code&gt; was added so the entire service runs in a container with a single &lt;code&gt;docker run&lt;/code&gt; command. The Ollama host was made configurable via environment variable so it works both locally (&lt;code&gt;localhost:11434&lt;/code&gt;) and inside Docker (&lt;code&gt;host.docker.internal:11434&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3 - Playground UI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A browser-based dashboard was built and served directly from the Docker container via &lt;code&gt;wwwroot&lt;/code&gt;. No separate frontend deployment needed - the UI and API ship together.&lt;/p&gt;

&lt;p&gt;The playground features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Split-panel layout - controller input left, output right.&lt;/li&gt;
&lt;li&gt;Raw YAML tab - full generated spec in monospace.&lt;/li&gt;
&lt;li&gt;Swagger Preview tab - live interactive Swagger UI rendered from the YAML.&lt;/li&gt;
&lt;li&gt;Status pill - shows generation state in real time.&lt;/li&gt;
&lt;li&gt;Pre-filled sample controller - works out of the box with 0 setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The hardest part - taming Llama 3.2 output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Llama 3.2 is a small, free, local model. It does a remarkable job generating mostly correct OpenAPI YAML - but it makes small inconsistent mistakes. Each one broke the Swagger Preview with a parsing error.&lt;/p&gt;

&lt;p&gt;A full post-processing pipeline was built to fix every category of output error:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Helper function&lt;/th&gt;
&lt;th&gt;Problem it solves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MergeDuplicatePaths&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Llama repeating the same API path twice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MergeDuplicateMethods&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Llama repeating GET/POST under the same path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MergeDuplicateComponents&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Llama emitting &lt;code&gt;components:&lt;/code&gt; key twice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RemoveDuplicateSchemaKeys&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Llama defining &lt;code&gt;ErrorResponse&lt;/code&gt; twice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RemoveMalformedSecurity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Llama adding broken &lt;code&gt;security:&lt;/code&gt; blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;FixMalformedOperationIds&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Llama using &lt;code&gt;{verb}&lt;/code&gt; as a literal placeholder&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NormaliseOutput&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Schema name typos like &lt;code&gt;ErrorRespond&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each helper was born from a real parsing error seen in production output. The pipeline runs on every generation before the YAML is returned to the client.&lt;/p&gt;
&lt;h3&gt;
  
  
  Before and after:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Phase 1 - Console App&lt;/th&gt;
&lt;th&gt;Phase 3 - Full AutoDoc&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;Terminal only&lt;/td&gt;
&lt;td&gt;Browser dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Local .NET required&lt;/td&gt;
&lt;td&gt;Docker container&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;Raw YAML in terminal&lt;/td&gt;
&lt;td&gt;Raw YAML + live Swagger Preview&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validation&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;YAML cleaning pipeline + spec validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Usability&lt;/td&gt;
&lt;td&gt;Developer only&lt;/td&gt;
&lt;td&gt;Anyone with a browser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shareable&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes - 1 &lt;code&gt;docker run&lt;/code&gt; command&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What started as a terminal experiment is now a containerised, browser-accessible documentation tool that anyone can run and use.&lt;/p&gt;
&lt;h2&gt;
  
  
  My Experience with GitHub Copilot:
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot was my implementation partner throughout AutoDoc.I defined the problems and architecture,Copilot wrote the code.&lt;/p&gt;
&lt;h3&gt;
  
  
  The approach:
&lt;/h3&gt;

&lt;p&gt;I used Copilot not for autocomplete but as a problem-to-code translator. I described what I needed in plain English, Copilot read my existing code, and produced working implementations I could review, test, and keep.&lt;/p&gt;
&lt;h3&gt;
  
  
  Prompt 1 - Serving the Playground UI from the API
&lt;/h3&gt;

&lt;p&gt;The first key prompt connected the backend to the frontend:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I am building an ASP.NET Core 10 Minimal API called AutoDoc.I want to serve static files from a wwwroot folder and make sure it serves index.html as the default page. How should I update my Program.cs to allow this?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foojkc0fbmmzmhbi2xeoz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foojkc0fbmmzmhbi2xeoz.png" alt=" " width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copilot read my existing &lt;code&gt;Program.cs&lt;/code&gt; directly, understood the current pipeline, and knew exactly what to add without breaking anything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68hhhpd3x7bxkcvyr5c3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68hhhpd3x7bxkcvyr5c3.png" alt=" " width="648" height="720"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It explained what it was doing and why - then applied the change automatically:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e4hbg11bqymnl51035r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e4hbg11bqymnl51035r.png" alt=" " width="685" height="635"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UseDefaultFiles&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UseStaticFiles&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2 lines. Copilot explained that &lt;code&gt;UseDefaultFiles()&lt;/code&gt; rewrites &lt;code&gt;/&lt;/code&gt; to &lt;code&gt;/index.html&lt;/code&gt; and &lt;code&gt;UseStaticFiles()&lt;/code&gt; serves everything in &lt;code&gt;wwwroot&lt;/code&gt;. It then validated the file for compile errors before confirming. The change was applied in one shot - I clicked &lt;strong&gt;Keep&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firngnz4cwfywc3wmhbys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firngnz4cwfywc3wmhbys.png" alt=" " width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result: &lt;code&gt;index.html&lt;/code&gt; created in &lt;code&gt;wwwroot&lt;/code&gt;, &lt;code&gt;Program.cs&lt;/code&gt; updated, Copilot confirming &lt;strong&gt;1 file changed, +4 -0&lt;/strong&gt;. The Playground UI was now being served directly from the Docker container.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt 2 - Building the Playground UI from scratch
&lt;/h3&gt;

&lt;p&gt;With the static file serving in place, the next prompt built the entire frontend:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Build a dark-themed single-page HTML dashboard for AutoDoc.Left panel: controller name input and C# code textarea with a Generate button. Right panel: tabbed output with Raw YAML (monospace) and Swagger Preview rendered from the YAML using SwaggerUIBundle. Show a status pill in the header that updates during generation. No frameworks, vanilla JS only."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Copilot generated the complete &lt;code&gt;index.html&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dark theme with CSS variables.&lt;/li&gt;
&lt;li&gt;Split-panel responsive layout.&lt;/li&gt;
&lt;li&gt;Tab switching between Raw YAML and Swagger Preview.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fetch&lt;/code&gt; call to &lt;code&gt;/generate-openapi&lt;/code&gt; with full error handling.&lt;/li&gt;
&lt;li&gt;Swagger UI integration using &lt;code&gt;jsyaml&lt;/code&gt; for YAML-to-spec parsing.&lt;/li&gt;
&lt;li&gt;Real-time status pill updates during generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire Playground UI came from a single prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt 3 - YAML post-processing pipeline
&lt;/h3&gt;

&lt;p&gt;Every YAML cleaner in AutoDoc came from describing a real Llama output bug to Copilot:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Llama is generating the &lt;code&gt;components:&lt;/code&gt; key twice in the output YAML. Write a C# helper that scans line by line and removes duplicate &lt;code&gt;components:&lt;/code&gt; keys, keeping only the first."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Copilot wrote &lt;code&gt;MergeDuplicateComponents&lt;/code&gt;. I repeated this pattern for all seven helpers - each one targeting a specific category of Llama output error discovered during testing.&lt;/p&gt;




&lt;h3&gt;
  
  
  What I learned:
&lt;/h3&gt;

&lt;p&gt;Copilot was most valuable not when I said &lt;em&gt;"write this"&lt;/em&gt; but when I said &lt;em&gt;"here is the problem I am seeing, fix it."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It read my existing code before responding. It explained its reasoning. It validated changes before applying them. It felt less like autocomplete and more like a senior developer who already knew my codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limitations I noticed:
&lt;/h3&gt;

&lt;p&gt;Copilot is powerful but not perfect. Here is what it could not do:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;th&gt;What happened&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Could not predict Llama output bugs&lt;/td&gt;
&lt;td&gt;Every YAML cleaner was reactive - I discovered the bug first, then asked Copilot to fix it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Did not flag environment differences&lt;/td&gt;
&lt;td&gt;Generated &lt;code&gt;host.docker.internal&lt;/code&gt; hardcoded - broke local &lt;code&gt;dotnet run&lt;/code&gt;, I found the error myself&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Could not test browser output&lt;/td&gt;
&lt;td&gt;Generated the Playground UI confidently but could not see the broken Swagger Preview in a browser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt quality = output quality&lt;/td&gt;
&lt;td&gt;Vague prompts gave generic results - writing precise prompts was a skill I had to develop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern was consistent: Copilot excelled at implementation, but discovery and debugging required a human.It could not run the code, open the browser, or experience the errors - I had to do that and bring the findings back to Copilot to fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next:
&lt;/h2&gt;

&lt;p&gt;AutoDoc is functional but it is just the beginning.&lt;/p&gt;

&lt;h3&gt;
  
  
  The project:
&lt;/h3&gt;

&lt;p&gt;The current version uses &lt;strong&gt;Llama 3.2 locally via Ollama&lt;/strong&gt; which is free &lt;br&gt;
and private, but limited in output consistency. The natural next step is upgrading to a more capable model for cleaner, more reliable YAML generation.&lt;/p&gt;

&lt;p&gt;Here is what I hope to build next:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Support for larger Ollama models (Llama 3.1, Mistral)&lt;/td&gt;
&lt;td&gt;Better output quality, fewer post-processing fixes needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-controller batch generation&lt;/td&gt;
&lt;td&gt;Generate docs for an entire project at once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Actions CI/CD integration&lt;/td&gt;
&lt;td&gt;Auto-generate docs on every push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diff-based incremental generation&lt;/td&gt;
&lt;td&gt;Only regenerate endpoints that changed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Export to JSON&lt;/td&gt;
&lt;td&gt;Support both YAML and JSON OpenAPI formats&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The biggest hope is this: AI-generated documentation that stays in sync with code automatically - no manual maintenance, no drift, no outdated Swagger specs.&lt;/p&gt;

&lt;p&gt;Developers should write code. The AI should write the docs.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Copilot:
&lt;/h3&gt;

&lt;p&gt;Working with GitHub Copilot on AutoDoc changed how I think about &lt;br&gt;
building software. Before this project, I used it for small autocomplete suggestions. After this project, I use it as a genuine collaborator.&lt;/p&gt;

&lt;p&gt;What I hope to do differently next time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start with Copilot earlier&lt;/strong&gt; - involve it in architecture decisions before writing the first line, not just implementation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better prompt engineering&lt;/strong&gt; - the more precise and constrained the prompt, the better the output. This is a skill worth developing deliberately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust it more on boilerplate&lt;/strong&gt; - every time I wrote something myself that I could have asked Copilot to write, I lost time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most important thing I learned: Copilot is only as good as the problem you give it.vague problems get vague code. Clear problems get working code.&lt;/p&gt;

&lt;p&gt;AutoDoc taught me to think more clearly about problems because I had to describe them precisely enough for an AI to solve them. That skill makes me a better developer with or without Copilot.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Try it yourself! clone the repo, run Ollama, and paste your 1st C# controller. I'd love to see what AutoDoc generates for your API. Drop a comment with your results.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/jmy744/AutoDoc" rel="noopener noreferrer"&gt;https://github.com/jmy744/AutoDoc&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
    </item>
    <item>
      <title>Beyond Alt-Text: Building a Personalized AI Narrator for Accessibility</title>
      <dc:creator>Yoges Mohan</dc:creator>
      <pubDate>Sun, 13 Apr 2025 09:44:41 +0000</pubDate>
      <link>https://dev.to/yoges_mohan_511bda5afbe7d/beyond-alt-text-building-a-personalized-ai-narrator-for-accessibility-2hof</link>
      <guid>https://dev.to/yoges_mohan_511bda5afbe7d/beyond-alt-text-building-a-personalized-ai-narrator-for-accessibility-2hof</guid>
      <description>&lt;p&gt;What if the way visually impaired users experienced images online wasn't just about basic identification, but about genuine understanding tailored to their world? Imagine, instead of simply hearing 'painting of a woman', an art student could get insights into the brushstrokes and historical context relevant to their studies. Or picture a botanist learning the specific species of a flower in a photo, not just 'flower'.&lt;/p&gt;

&lt;p&gt;Currently, standard image descriptions and alt-text, while essential, often provide only these basic labels. This limits deeper engagement and can create unequal access to the rich information embedded in visual content, especially for individuals with specialized knowledge or passions. Why should their experience be less informative just because the default description is generic?&lt;/p&gt;

&lt;p&gt;In this project, I explore that 'What if?'. I introduce the novel &lt;strong&gt;Personalized AI Narrator&lt;/strong&gt;,a prototype I built using Google Cloud's powerful Vertex AI Gemini models. My goal is to move beyond generic alt-text by automatically generating image descriptions dynamically tailored to an individual's unique interests, aiming for a future where everyone has the opportunity to connect with visual information in a way that truly resonates with them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introducing the 'Personalized AI Narrator'&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, how did I bridge this gap? The solution I explored in this project is the Personalized AI Narrator prototype. Instead of just one standard description for everyone, the aim is to generate a narration that specifically highlights what you, as an individual user, might find most relevant or interesting in an image.&lt;/p&gt;

&lt;p&gt;The process involves several steps, orchestrating different AI capabilities on Vertex AI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqditvy4hemwoat3tt2q3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqditvy4hemwoat3tt2q3.png" alt="Image description" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Detailed Image Understanding:&lt;/strong&gt; An advanced multimodal Gemini model (gemini-1.5-pro-002) first analyzes the image to generate a rich, detailed base description.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text &amp;amp; Interest Representation:&lt;/strong&gt; This base description is broken down into sentence chunks. Both chunks and the user's pre-defined interests are then converted into numerical embeddings using a Vertex AI embedding model (text-embedding-004), capturing their semantic meaning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Semantic Relevance Matching:&lt;/strong&gt; The system calculates cosine similarity between the user's interest embeddings and the description chunk embeddings to find the parts of the description most relevant to the user.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context Selection:&lt;/strong&gt; The text of the Top N most relevant chunks is selected.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tailored Synthesis (Controlled Generation):&lt;/strong&gt; Finally, this selected relevant_context and the user's interests are fed to a Gemini text model &lt;em&gt;(gemini-2.0-flash)&lt;/em&gt;. Guided by a specific prompt I engineered, the model synthesizes these relevant excerpts into a concise, new narrative tailored to the user, ensuring it remains grounded in the selected information.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result? A narration designed to provide deeper insight and a more engaging, informative experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Powered by Vertex AI Gemini&lt;/strong&gt;&lt;br&gt;
At the heart of this prototype lies Google Cloud's Vertex AI platform, enabling seamless integration of cutting-edge AI. I leveraged the Gemini models for powerful Image understanding &lt;em&gt;(using gemini-1.5-pro-002)&lt;/em&gt; and versatile Controlled generation &lt;em&gt;(using gemini-2.0-flash).&lt;/em&gt; Vertex AI's Embeddings API &lt;em&gt;(text-embedding-004)&lt;/em&gt; provided the numerical representations for semantic matching, and I also used Gemini again for automated Gen AI evaluation. The Vertex AI SDK for Python made orchestrating these capabilities straightforward in my Kaggle Notebook environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From Generic to Personal: Seeing the Difference&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But does it truly make a difference? Let's see it in action with an example image:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frub46y6ufm1qgmr3d2jh.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frub46y6ufm1qgmr3d2jh.jpeg" alt="Image description" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
          A vibrant butterfly on a red and yellow flower.&lt;br&gt;
[&lt;a href="https://images.pexels.com/photos/462118/pexels-photo-462118.jpeg" rel="noopener noreferrer"&gt;https://images.pexels.com/photos/462118/pexels-photo-462118.jpeg&lt;/a&gt;]&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Base Description (from Gemini Vision) identified key details like:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...Plain Tiger butterfly (Danaus chrysippus)... perched on a Blanket flower (Gaillardia)... wings are a mix of orange, black, and white... flower... exhibits a gradient of red and yellow petals... background... bokeh effect... green stems and leaves... bright and even lighting... natural scene of pollination...
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;em&gt;(A detailed starting point, rich with potential context).&lt;/em&gt;&lt;/p&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Personalized for Evelyn (Botanist) (Using Top 5 relevant sentences):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Evelyn, the image features Blanket flowers (likely Gaillardia) displaying a fascinating gradient of red and yellow petals, with the red deepening towards the brownish-red center of the blooms. Note how one flower, slightly behind the main subject, shows a more even split of red and yellow hues, while glimpses of green stems and leaves add ecological context to the vibrant blossoms.
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;em&gt;(Result: Excellent focus on flower specifics, color gradients, and ecological context – details directly present in relevant base description chunks).&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;AI Evaluation Score:&lt;/strong&gt; An automated AI check rated this &lt;strong&gt;Score: 4/5&lt;/strong&gt;, justifying that it effectively highlighted grounded details like the flower's color variations and ecological context relevant to botany.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Personalized for Anya (Art Student) (Using Top 5 relevant sentences):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Anya, imagine a vibrant close-up photograph reminiscent of Renaissance portraiture, where a butterfly takes center stage against a softly blurred background of flowers. The bright, even lighting enhances the vivid colors, creating a depth of field that subtly emphasizes the delicate interaction between the butterfly and its floral perch, much like the sfumato technique used to soften edges and focus attention.
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;em&gt;(Result: Attempts to link visual elements like lighting and depth of field to Renaissance concepts).&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;AI Evaluation Score:&lt;/strong&gt; The AI evaluation gave this a lower &lt;strong&gt;Score: 2/5&lt;/strong&gt;. The justification highlighted that the connections to Renaissance art were superficial and weakly supported by the base description's mention of 'blurred background' and 'vivid colors.'&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Code Snippet - The Prompt's Core:&lt;/strong&gt; How did I guide the AI for personalization? Through prompt engineering based on the selected context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;personalization_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Act as [Role: Expert Narrator]...
User Profile: Name: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;persona_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Interests: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;interests_string&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Relevant Context:
---
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;relevant_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; #&amp;lt;-- The Top 5 selected sentences
---
Task: Synthesize context concisely focusing on interests, grounded ONLY in context provided...&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(This snippet demonstrates the core instruction guiding the controlled generation based on selected context).&lt;/p&gt;

&lt;p&gt;This comparison using the actual results clearly shows the potential for tailoring with the semantic approach (getting a good score for Evelyn where relevant details existed) but also honestly demonstrates the current limitations related to grounding when source details are sparse (reflected in Anya's score), a finding objectively confirmed by the AI evaluation step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Notebook Output:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Evelyn:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71ahhjrwfuyp257llz8n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71ahhjrwfuyp257llz8n.png" alt="Image description" width="800" height="86"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8fpsv367woe4eqqau7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8fpsv367woe4eqqau7o.png" alt="Image description" width="800" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbpovn2gfya9mdcm70ju.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbpovn2gfya9mdcm70ju.png" alt="Image description" width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Anya :&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8dbfjh0eqrbtcno80fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8dbfjh0eqrbtcno80fo.png" alt="Image description" width="800" height="88"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiady2pi42aj6vub0ejje.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiady2pi42aj6vub0ejje.png" alt="Image description" width="800" height="152"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favqa0dokd9h4a4jmij3k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favqa0dokd9h4a4jmij3k.png" alt="Image description" width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Potential: Towards Richer Digital Accessibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The need for better digital accessibility is immense. The &lt;strong&gt;&lt;a href="https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment" rel="noopener noreferrer"&gt;World Health Organization (WHO)&lt;/a&gt;&lt;/strong&gt; estimates that for &lt;strong&gt;at least 1 billion people&lt;/strong&gt; globally, existing vision impairment could have been prevented or has yet to be addressed. This staggering figure underscores the urgency for innovative solutions. Tools like the Personalized AI Narrator demonstrate how AI can contribute, aiming to create more inclusive and equitable digital experiences. By generating descriptions that resonate with individual interests, I believe this approach can help users move beyond basic labeling towards deeper understanding and engagement with visual content.&lt;/p&gt;

&lt;p&gt;As my results showed, a key limitation of the current grounded approach (even with semantic chunk selection) is its dependency on the initial image analysis. To overcome this and provide truly rich context, future work should focus on integrating external knowledge sources (using RAG - Retrieval Augmented Generation). Developing seamless integration with screen readers is another vital next step for real-world usability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion: A Step Towards More Personal AI Narration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Personalized AI Narrator prototype I built showcases a novel application of Vertex AI Gemini for enhancing accessibility. By tailoring image descriptions to individual interests, it offers a glimpse into a future where visual content is not just described, but truly brought to life for everyone, respecting individual perspectives. While challenges remain, particularly around balancing relevance with groundedness when source details are sparse, the potential for using AI to foster greater digital inclusion is immense.&lt;/p&gt;




&lt;p&gt;Thanks for reading, and have a great day!&lt;/p&gt;

&lt;p&gt;Explore the full implementation I developed and try it yourself in the Kaggle Notebook here:[&lt;a href="https://www.kaggle.com/code/yogesmohan/personalized-ai-narrator-for-visual-accessibility" rel="noopener noreferrer"&gt;https://www.kaggle.com/code/yogesmohan/personalized-ai-narrator-for-visual-accessibility&lt;/a&gt;]&lt;/p&gt;

</description>
      <category>ai</category>
      <category>a11y</category>
      <category>googlecloud</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
