<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: syamaner</title>
    <description>The latest articles on DEV Community by syamaner (@syamaner).</description>
    <link>https://dev.to/syamaner</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F851470%2F28232910-3c25-4488-ac27-a360494dcfc8.jpeg</url>
      <title>DEV Community: syamaner</title>
      <link>https://dev.to/syamaner</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/syamaner"/>
    <language>en</language>
    <item>
      <title>Part 1: The Architecture &amp; The Agent - Spec-Driven ML Development With Warp/Oz</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Tue, 14 Apr 2026 07:55:00 +0000</pubDate>
      <link>https://dev.to/syamaner/part-1-the-architecture-the-agent-spec-driven-ml-development-with-warpoz-3al6</link>
      <guid>https://dev.to/syamaner/part-1-the-architecture-the-agent-spec-driven-ml-development-with-warpoz-3al6</guid>
      <description>&lt;p&gt;Last year I built a prototype coffee first crack detector and wrote about it in a 3-part series. The prototype works - I have been running it on my own roasts since November - but it carries the technical debt of something built to prove a concept rather than to last.&lt;/p&gt;

&lt;p&gt;This series is the production rebuild. The outcome: an Audio Spectrogram Transformer at &lt;strong&gt;97.4% accuracy and 100% precision&lt;/strong&gt; on first crack detection, running on a Raspberry Pi 5 at 2.09 seconds per 10-second window. The full pipeline - data preparation, training, evaluation, ONNX INT8 export, edge validation, and a Gradio UI - shipped in two evenings.&lt;/p&gt;

&lt;p&gt;I didn't build this by brute-forcing the codebase myself. I acted strictly as the engineering lead, while Warp and its AI agent, Oz, handled the implementation from inside my terminal. My responsibilities were entirely architectural:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Designing the workflow:&lt;/strong&gt; Setting the strict rules of engagement between the agent and the codebase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defining the science:&lt;/strong&gt; Dictating the specs, testing strategy, evaluation metrics, and dataset annotation approach.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Directing the execution:&lt;/strong&gt; Guiding the agent through the implementation and reviewing the output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Operating this way over the weekend, Warp/Oz executed an 18-story (at the time) epic across 10 pull requests. That resulted in 11,087 lines of Python across 75 files, with 52 of those commits explicitly co-authored by the agent. Copilot reviewed every PR, flagging 111 individual issues across 28 review batches. The model is &lt;a href="https://huggingface.co/syamaner/coffee-first-crack-detection" rel="noopener noreferrer"&gt;published on Hugging Face&lt;/a&gt;, the dataset is &lt;a href="https://huggingface.co/datasets/syamaner/coffee-first-crack-audio" rel="noopener noreferrer"&gt;open-sourced&lt;/a&gt;, and the source is on &lt;a href="https://github.com/syamaner/coffee-first-crack-detection" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This post is about the system that made that possible - not the model itself. The ML science comes in Posts 2 and 3. Here, I want to show the exact architecture I used to direct an AI agent through a complex, multi-phase ML project without losing control of the engineering decisions that matter.&lt;/p&gt;

&lt;p&gt;Before the agent could train anything, I had to build the training data from scratch. There is no public audio dataset for coffee roasting first crack - not on Hugging Face, not on Kaggle, not in academic literature. That meant recording roasting sessions, annotating them in Label Studio, and architecting a recording-level data pipeline to prevent the chunk-level leakage that silently inflates test metrics in time-series audio ML. The full data engineering story is in Post 2 (coming soon).&lt;/p&gt;

&lt;h2&gt;
  
  
  From Prototype to Production
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/syamaner/part-1-training-a-neural-network-to-detect-coffee-first-crack-from-audio-an-agentic-development-1jei"&gt;prototype&lt;/a&gt; had accumulated real technical debt. The code was monolithic, the model had no reusable packaging, the MCP server architecture had flaws I had been working around, and nothing ran on edge hardware. I had to use my laptop for every roast.&lt;/p&gt;

&lt;p&gt;This series covers the production rebuild. Same domain, completely new architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A standalone, Hugging Face-native training repository.&lt;/li&gt;
&lt;li&gt;Strict data engineering to prevent audio leakage.&lt;/li&gt;
&lt;li&gt;ONNX INT8 quantization for Raspberry Pi 5 edge deployment.&lt;/li&gt;
&lt;li&gt;A live Gradio Space for public inference.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Director/Coder Dynamic
&lt;/h2&gt;

&lt;p&gt;The core pattern was a strict, enforced separation of concerns between three actors:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I (the human) owned:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Architecture:&lt;/strong&gt; Defining repository structure, module boundaries, and enforcing Hugging Face's &lt;code&gt;save_pretrained&lt;/code&gt;/&lt;code&gt;from_pretrained&lt;/code&gt; as the standard packaging contract.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The ML Science:&lt;/strong&gt; Model selection (AST over CNN), data split strategy (recording-level to prevent leakage), class weighting, and hyperparameter math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Workflow Constraints:&lt;/strong&gt; Defining the project rules, writing the parameterised skills, and managing the state of the epic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Quality Gates:&lt;/strong&gt; Reviewing every PR, interpreting the evaluation metrics, and deciding when to retrain versus when to ship.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Oz (Warp's terminal-native agent) owned:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terminal Execution:&lt;/strong&gt; Running training loops, evaluations, ONNX exports, and SSH sessions directly on the Raspberry Pi.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Generation:&lt;/strong&gt; Writing the boilerplate-&lt;code&gt;WeightedLossTrainer&lt;/code&gt; subclasses, CLI argument parsers, pytest scaffolds, and audio data loaders.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill Invocation:&lt;/strong&gt; Executing parameterised skill files (e.g., &lt;code&gt;.claude/skills/train-model/SKILL.md&lt;/code&gt;) that encoded exact command sequences and validation checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State Management:&lt;/strong&gt; Reading the epic document, updating context, and checking off stories after completing a phase.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GitHub Copilot owned:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Async Code Review:&lt;/strong&gt; Flagging type safety issues, API misuse, missing error handling, and dependency hygiene across all 10 PRs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Reality Check:&lt;/strong&gt; Copilot never once caught a machine learning logic error. Every data leakage fix, hyperparameter correction, and precision/recall tradeoff decision came from me. &lt;em&gt;Copilot acts as an aggressive linter for code, not a reviewer for ML science.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This three-way split wasn't a gentleman's agreement-it was hardcoded into the project via an &lt;code&gt;AGENTS.md&lt;/code&gt; file. Whenever Oz started a task, it was forced to read this rulebook first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agentic Setup: AGENTS.md, Epics, and Skills
&lt;/h2&gt;

&lt;p&gt;Three files controlled the entire project.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;AGENTS.md&lt;/code&gt; - The Rulebook
&lt;/h3&gt;

&lt;p&gt;This file sits at the repository root. The agent is instructed to read it before starting any task. It contains the project rules, quick commands, codebase architecture, and platform-specific constraints. Here is the exact rules section from this project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Python 3.11+ with full type hints on all public functions and methods
&lt;span class="p"&gt;-&lt;/span&gt; Google-style docstrings
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`ruff check`&lt;/span&gt; and &lt;span class="sb"&gt;`ruff format`&lt;/span&gt; must pass before marking code complete
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`pyright`&lt;/span&gt; must pass with no errors on new code
&lt;span class="p"&gt;-&lt;/span&gt; All dependencies declared in &lt;span class="sb"&gt;`pyproject.toml`&lt;/span&gt; - never install ad-hoc
&lt;span class="p"&gt;-&lt;/span&gt; Large files (WAV, checkpoints, ONNX models) go to Hugging Face Hub - never commit to git
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`data/`&lt;/span&gt;, &lt;span class="sb"&gt;`experiments/`&lt;/span&gt;, and &lt;span class="sb"&gt;`exports/`&lt;/span&gt; are &lt;span class="sb"&gt;`.gitignore`&lt;/span&gt;'d - keep them that way
&lt;span class="p"&gt;-&lt;/span&gt; Seed all RNG using &lt;span class="sb"&gt;`configs/default.yaml`&lt;/span&gt; seed value
&lt;span class="p"&gt;-&lt;/span&gt; One PR per story, branch: &lt;span class="sb"&gt;`feature/{issue-number}-{slug}`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Before starting a task: read &lt;span class="sb"&gt;`docs/state/registry.md`&lt;/span&gt; → open epic file → check GitHub issue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That last line is the critical one. It forces the agent into a state-reading loop before writing any code. Without it, the agent starts generating based on stale context.&lt;/p&gt;

&lt;p&gt;The file also includes a codebase architecture map, quick commands for every operation (training, evaluation, export, benchmarking), and platform-specific notes for MPS, CUDA, and the RPi5. The &lt;a href="https://github.com/syamaner/coffee-first-crack-detection/blob/main/AGENTS.md" rel="noopener noreferrer"&gt;full file is on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Epic State Management - The Checklist
&lt;/h3&gt;

&lt;p&gt;A registry file (&lt;code&gt;docs/state/registry.md&lt;/code&gt;) points to the active epic. The epic file itself (&lt;code&gt;docs/state/epics/coffee-first-crack-detection.md&lt;/code&gt;) contains 18 stories grouped into 6 phases, each linked to a GitHub issue. Before and after every task, the agent reads the epic state and updates it according to this protocol:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before starting any task:
1. Read docs/state/registry.md to find the active epic
2. Open the epic file - check story status
3. Open the GitHub story issue - read comments for latest requirements
4. Work on a branch: feature/{issue-number}-{slug}

After completing a story:
1. Check off the story in the epic doc
2. Update Active Context section with what was built
3. Comment on the GitHub story issue, then close it
4. Tick the checkbox in GitHub epic issue #1
5. Open a PR referencing the story issue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is how 18 stories were delivered without losing track of what was done, what was next, or what had changed. The agent maintained its own project state.&lt;/p&gt;

&lt;p&gt;Here is Oz running the full data preparation pipeline - chunking 973 audio segments, performing the recording-level split, and then invoking the &lt;code&gt;/train-model&lt;/code&gt; skill:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zv6zedgjlmthxwiwx6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zv6zedgjlmthxwiwx6a.png" alt="Oz Train model skill invocation" width="625" height="713"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Parameterised Skills - The Playbooks
&lt;/h3&gt;

&lt;p&gt;Skills are markdown files under &lt;code&gt;.claude/skills/&lt;/code&gt; that encode exact command sequences for common operations. Each skill defines the prerequisites, the commands, and the validation steps. I wrote four:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;train-model/SKILL.md&lt;/code&gt; - End-to-end training with data validation and checkpoint saving.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;evaluate-model/SKILL.md&lt;/code&gt; - Test-set evaluation with metrics report generation.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;export-onnx/SKILL.md&lt;/code&gt; - ONNX export (FP32 + INT8) with size and latency benchmarking.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;push-to-hub/SKILL.md&lt;/code&gt; - Publish model and dataset to the Hugging Face Hub.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I told Oz to "train the model," it didn't improvise. It read the skill file and followed the exact sequence I defined. This eliminated an entire class of errors where the agent guesses at flags, skips validation steps, or forgets to save the feature extractor configuration alongside the model weights.&lt;/p&gt;

&lt;p&gt;Here is Oz chaining the &lt;code&gt;/export-onnx&lt;/code&gt; and &lt;code&gt;/push-to-hub&lt;/code&gt; skills to export the model and publish everything to Hugging Face Hub in a single sequence:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7jyz7muoaz142yj4htf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7jyz7muoaz142yj4htf.png" alt="Oz Export ONNX and Push to HF Hub Skill" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  A Generalised AGENTS.md Template
&lt;/h3&gt;

&lt;p&gt;Here is a stripped-down version you can drop into any project. Replace the placeholders with your domain-specific rules.&lt;/p&gt;

&lt;p&gt;This file is not documentation for humans. It is a &lt;strong&gt;system prompt for your codebase&lt;/strong&gt;. Every rule you omit is a decision the agent will make on its own-and it will make it differently every time.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# AGENTS.md - [Project Name]&lt;/span&gt;

Project rules and context for AI coding agents.

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [Language] [version]+ with [typing/linting requirements]
&lt;span class="p"&gt;-&lt;/span&gt; [Formatter] and [linter] must pass before marking code complete
&lt;span class="p"&gt;-&lt;/span&gt; All dependencies declared in [manifest file] - never install ad-hoc
&lt;span class="p"&gt;-&lt;/span&gt; Large files go to [remote storage] - never commit to git
&lt;span class="p"&gt;-&lt;/span&gt; Before starting a task: read &lt;span class="sb"&gt;`docs/state/registry.md`&lt;/span&gt; → open epic → check issue

&lt;span class="gu"&gt;## Quick Commands&lt;/span&gt;
&lt;span class="gu"&gt;### Setup&lt;/span&gt;
[environment setup commands]

&lt;span class="gu"&gt;### Build / Test / Deploy&lt;/span&gt;
[the exact commands for each operation]

&lt;span class="gu"&gt;## Codebase Architecture&lt;/span&gt;
[directory tree with one-line descriptions per module]

&lt;span class="gu"&gt;## Epic State Management&lt;/span&gt;
Before starting any task:
&lt;span class="p"&gt;1.&lt;/span&gt; Read docs/state/registry.md
&lt;span class="p"&gt;2.&lt;/span&gt; Check story status in the epic file
&lt;span class="p"&gt;3.&lt;/span&gt; Read the GitHub issue for latest requirements
&lt;span class="p"&gt;4.&lt;/span&gt; Branch: feature/{issue-number}-{slug}

After completing a story:
&lt;span class="p"&gt;1.&lt;/span&gt; Check off the story in the epic doc
&lt;span class="p"&gt;2.&lt;/span&gt; Update Active Context
&lt;span class="p"&gt;3.&lt;/span&gt; Close the GitHub issue
&lt;span class="p"&gt;4.&lt;/span&gt; Open a PR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  The Build &amp;amp; The Fails
&lt;/h2&gt;

&lt;p&gt;The first commit after the initial scaffold was &lt;code&gt;feat(S5/S6/S8): implement train.py, evaluate.py, inference.py&lt;/code&gt;. In a single pass, Oz generated the training pipeline, evaluation harness, and sliding-window inference module. It followed the &lt;code&gt;AGENTS.md&lt;/code&gt; rules, used the correct base model (&lt;code&gt;MIT/ast-finetuned-audioset-10-10-0.4593&lt;/code&gt;), and wired up the &lt;code&gt;WeightedLossTrainer&lt;/code&gt; subclass with class-weighted &lt;code&gt;CrossEntropyLoss&lt;/code&gt; exactly as I specified.&lt;/p&gt;

&lt;p&gt;Then training failed.&lt;/p&gt;
&lt;h3&gt;
  
  
  The &lt;code&gt;input_features&lt;/code&gt; vs &lt;code&gt;input_values&lt;/code&gt; Bug
&lt;/h3&gt;

&lt;p&gt;Oz wrote the dataset adapter to return &lt;code&gt;input_features&lt;/code&gt; as the tensor key - a reasonable guess if you have seen other Hugging Face audio pipelines. But &lt;code&gt;ASTFeatureExtractor&lt;/code&gt; returns &lt;code&gt;input_values&lt;/code&gt;, not &lt;code&gt;input_features&lt;/code&gt;. The model silently received no input and the loss exploded.&lt;/p&gt;

&lt;p&gt;Here is the exact diff from the fix commit (&lt;a href="https://github.com/syamaner/coffee-first-crack-detection/commit/75bbb4b" rel="noopener noreferrer"&gt;&lt;code&gt;75bbb4b&lt;/code&gt;&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;# src/coffee_first_crack/train.py - _HFDatasetAdapter.__getitem__
&lt;span class="gd"&gt;-            "input_features": inputs["input_features"].squeeze(0),
&lt;/span&gt;&lt;span class="gi"&gt;+            "input_values": inputs["input_values"].squeeze(0),
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;It was a one-line bug. The kind of bug that costs you an hour of staring at training logs if you don't know what to look for. Oz pattern-matched from Whisper examples - the most common audio model in Hugging Face tutorials - where &lt;code&gt;input_features&lt;/code&gt; is correct. For &lt;code&gt;ASTFeatureExtractor&lt;/code&gt;, the key is &lt;code&gt;input_values&lt;/code&gt;. This is a &lt;a href="https://github.com/huggingface/transformers/issues/20470" rel="noopener noreferrer"&gt;known, unresolved inconsistency&lt;/a&gt; in the Hugging Face audio API.&lt;/p&gt;

&lt;p&gt;The same commit also added &lt;code&gt;accelerate&amp;gt;=0.26.0&lt;/code&gt; to &lt;code&gt;pyproject.toml&lt;/code&gt; - a dependency the Hugging Face &lt;code&gt;Trainer&lt;/code&gt; requires at runtime but doesn't explicitly import at the top level. Oz didn't catch it during code generation because it never triggered an &lt;code&gt;ImportError&lt;/code&gt; until actual training.&lt;/p&gt;

&lt;p&gt;Here is the model evaluated on a Raspberry Pi 5 - 191 test samples, INT8 quantised, 4 threads, via SSH from Warp:&lt;/p&gt;


&lt;div class="ltag__warp"&gt;
  &lt;iframe src="https://app.warp.dev/block/embed/VrKfC5EyxNPSooEFJeRjr1" title="Warp Terminal Block" width="100%" height="400"&gt;
  &lt;/iframe&gt;
&lt;/div&gt;



&lt;p&gt;This is what the validation loop looks like in practice - Oz hitting a &lt;code&gt;pyright&lt;/code&gt; failure, diagnosing the type issues, fixing them, then running the full &lt;code&gt;ruff check&lt;/code&gt; → &lt;code&gt;ruff format&lt;/code&gt; → &lt;code&gt;pyright&lt;/code&gt; → &lt;code&gt;pytest&lt;/code&gt; chain until all checks pass:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvgkz48xufqhetb1qgd9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvgkz48xufqhetb1qgd9.png" alt="Static code checking using Pyright and Ruff" width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Copilot as the Third Actor
&lt;/h2&gt;

&lt;p&gt;Across the 10 PRs in this project, Copilot submitted 28 review batches containing 111 individual comments. Here is how they broke down by PR:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PR #23 (RPi5 ONNX validation):&lt;/strong&gt; 36 comments across 6 review rounds - the most reviewed PR by far.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR #17 (Export, scripts, tests):&lt;/strong&gt; 26 comments across 5 rounds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR #27 (Data prep + mic-2 expansion):&lt;/strong&gt; 16 comments across 3 rounds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR #16 (Train, eval, inference):&lt;/strong&gt; 10 comments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR #28 (Gradio Space):&lt;/strong&gt; 10 comments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern was consistent. Copilot caught:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type safety:&lt;/strong&gt; Missing type hints, incorrect return types, untyped function signatures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unused imports:&lt;/strong&gt; Dead code left behind after refactoring.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API misuse:&lt;/strong&gt; Deprecated parameters, missing synchronisation calls, incorrect exception handling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency hygiene:&lt;/strong&gt; Missing explicit dependencies, version pinning issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs and copy:&lt;/strong&gt; Misleading docstrings, inaccurate UI text in the Gradio Space.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, Copilot did not catch the core machine learning logic issues. To be fair, this is largely because my workflow required me to intercept them before they ever reached a PR:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;input_features&lt;/code&gt; vs &lt;code&gt;input_values&lt;/code&gt; key mismatch:&lt;/strong&gt; This was fixed locally during the active dev loop before opening the PR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data leakage from chunk-level splitting:&lt;/strong&gt; This is the biggest ML risk in this project, but this was addressed architecturally during the setup phase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hyperparameter choices:&lt;/strong&gt; Overfitting issues were identified and corrected interactively by reading the local training logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The precision/recall tradeoff:&lt;/strong&gt; The class weighting strategy was a deliberate human decision delivered prior to code review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a criticism of Copilot. It is doing exactly what it should: catching code-level defects at review time. But if you are relying on AI code review to validate your ML pipeline logic, you will ship broken models with clean code.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wall-clock time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Two evenings (Fri → Sat)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stories completed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;18 across 6 phases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pull requests&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10 merged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total commits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~65 (55 non-merge)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Oz co-authored&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;52 commits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;11,087 insertions across 75 files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot reviews&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;28 batches, 111 individual comments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;97.4% test / 100% precision&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edge latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.09s per 10s window (RPi5, INT8, 4 threads)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dataset&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;First public coffee roasting audio dataset - 973 chunks, 15 roasts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model is live at &lt;a href="https://huggingface.co/syamaner/coffee-first-crack-detection" rel="noopener noreferrer"&gt;huggingface.co/syamaner/coffee-first-crack-detection&lt;/a&gt;. The dataset is at &lt;a href="https://huggingface.co/datasets/syamaner/coffee-first-crack-audio" rel="noopener noreferrer"&gt;huggingface.co/datasets/syamaner/coffee-first-crack-audio&lt;/a&gt;. The source is on &lt;a href="https://github.com/syamaner/coffee-first-crack-detection" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The piece of this workflow I wouldn't give up: the state-reading loop in &lt;code&gt;AGENTS.md&lt;/code&gt;. Without it, agent context drifts within two or three tasks and it starts generating against stale assumptions. If you've run a long-form agentic project and solved the context problem differently, I'd be interested in the specifics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next up:&lt;/strong&gt; Post 2 - The Data(coming soon) covers how I built the first public audio dataset for coffee roasting first crack detection, and the data engineering decisions that got us to zero false positives.&lt;/p&gt;




&lt;p&gt;Try it - upload a 10-second roasting clip or use an existing sample:&lt;/p&gt;


&lt;div class="ltag__huggingface"&gt;
  &lt;iframe src="https://syamaner-coffee-first-crack-detection.hf.space" title="Hugging Face Space" width="100%" height="600"&gt;
  &lt;/iframe&gt;
&lt;/div&gt;





&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/coffee-first-crack-detection" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/syamaner/coffee-first-crack-detection" rel="noopener noreferrer"&gt;Hugging Face Model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/datasets/syamaner/coffee-first-crack-audio" rel="noopener noreferrer"&gt;Hugging Face Dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/spaces/syamaner/coffee-first-crack-detection" rel="noopener noreferrer"&gt;Live Gradio Space&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.warp.dev/" rel="noopener noreferrer"&gt;Warp - The Agentic Development Environment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.warp.dev/ai" rel="noopener noreferrer"&gt;Oz - Warp's AI Agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.warp.dev/features/blocks" rel="noopener noreferrer"&gt;Warp Block Sharing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>warp</category>
      <category>audio</category>
    </item>
    <item>
      <title>Part 3: From Neural Networks to Autonomous Coffee Roasting - Orchestrating MCP Servers with .NET Aspire 13 and n8n Agents</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Sun, 16 Nov 2025 18:48:48 +0000</pubDate>
      <link>https://dev.to/syamaner/part-3-from-neural-networks-to-autonomous-coffee-roasting-orchestrating-mcp-servers-with-net-58pd</link>
      <guid>https://dev.to/syamaner/part-3-from-neural-networks-to-autonomous-coffee-roasting-orchestrating-mcp-servers-with-net-58pd</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/syamaner/part-1-training-a-neural-network-to-detect-coffee-first-crack-from-audio-an-agentic-development-1jei"&gt;Part 1&lt;/a&gt;, we have fine tuned a neural network to detect coffee first crack from audio using PyTorch and the Audio Spectrogram Transformer. In &lt;a href="https://dev.to/syamaner/part-2-building-mcp-servers-to-control-a-home-coffee-roaster-an-agentic-development-journey-with-58ik"&gt;Part 2&lt;/a&gt;, we have built two MCP (Model Context Protocol) servers - one to control my Hottop KN-8828B-2K+ roaster and another to detect first crack using a microphone in real-time. &lt;/p&gt;

&lt;p&gt;This is where put it all together. But first: &lt;strong&gt;can .NET Aspire orchestrate Python MCP servers and n8n workflows to autonomously roast coffee?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Spoiler alert: &lt;strong&gt;Yes, it can.&lt;/strong&gt; And the coffee tastes spot on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Autonomous coffee roasting isn't just about detecting when first crack happens. It's a complex orchestration problem involving:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multiple systems&lt;/strong&gt;: Python MCP servers to interact with hardware, an agent layer for orchestration (n8n workflows to begin with), containerised services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time decision making&lt;/strong&gt;: Monitoring sensors every few seconds and deciding on actions depending on the status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety-critical control&lt;/strong&gt;: Managing heat and fan speed to avoid burning / wasting green beans.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precise timing&lt;/strong&gt;: Detecting bean charge event (when beans were added during pre heating stage), first crack, and hitting target development time percentage by adjusting controls available.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt;: Tracking telemetry across Python, n8n, and .NET components.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The solution? &lt;strong&gt;.NET Aspire 13&lt;/strong&gt; orchestrating everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Aspire 13?
&lt;/h2&gt;

&lt;p&gt;Aspire 13.0 (released with .NET 10) brings significant improvements for Python integration and container orchestration—perfect for this use case:&lt;/p&gt;

&lt;h3&gt;
  
  
  Simplified Python Hosting with &lt;code&gt;AddPythonModule&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Aspire 13 replaces the old &lt;code&gt;AddPythonApp&lt;/code&gt; API with three specialized methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AddPythonModule&lt;/code&gt;&lt;/strong&gt;: Runs Python modules with &lt;code&gt;-m&lt;/code&gt; flag (e.g., &lt;code&gt;python -m src.mcp_servers.roaster_control.sse_server&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AddPythonScript&lt;/code&gt;&lt;/strong&gt;: Runs standalone Python scripts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AddPythonExecutable&lt;/code&gt;&lt;/strong&gt;: Runs executables from virtual environments (e.g., &lt;code&gt;uvicorn&lt;/code&gt;, &lt;code&gt;gunicorn&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For MCP servers running as modules, &lt;code&gt;AddPythonModule&lt;/code&gt; is cleaner and more explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Old way (Aspire 9)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPythonApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"roaster-control"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;projectRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;venvPath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"src.mcp_servers.roaster_control.sse_server"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// New way (Aspire 13)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPythonModule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"roaster-control"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;projectRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"src.mcp_servers.roaster_control.sse_server"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithVirtualEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;venvPath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cleaner AppHost Project Structure
&lt;/h3&gt;

&lt;p&gt;The new &lt;code&gt;Aspire.AppHost.Sdk/13.0.0&lt;/code&gt; simplifies project files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No separate &lt;code&gt;&amp;lt;Sdk Name="..." /&amp;gt;&lt;/code&gt; element needed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Aspire.Hosting.AppHost&lt;/code&gt; package included automatically.&lt;/li&gt;
&lt;li&gt;Removes &lt;code&gt;IsAspireHost&lt;/code&gt; property (implicit).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enhanced Container Orchestration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Better lifecycle management for containers (n8n in this project).&lt;/li&gt;
&lt;li&gt;Improved health check support.&lt;/li&gt;
&lt;li&gt;More granular control over container runtime arguments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Built-in OpenTelemetry Integration
&lt;/h3&gt;

&lt;p&gt;Out-of-the-box observability with &lt;code&gt;.WithOtlpExporter()&lt;/code&gt; for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured logging from Python processes.&lt;/li&gt;
&lt;li&gt;Distributed tracing across MCP calls.&lt;/li&gt;
&lt;li&gt;Real-time metrics in the Aspire dashboard.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Here's what .NET Aspire orchestrates:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08s6i4d8kto8qwdvftns.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08s6i4d8kto8qwdvftns.png" alt="Architecture Overview" width="578" height="742"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Aspire?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single command startup&lt;/strong&gt;: &lt;code&gt;dotnet run&lt;/code&gt; starts all 3 services with proper dependency ordering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared configuration&lt;/strong&gt;: Environment variables, Auth0 credentials, OpenTelemetry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python support&lt;/strong&gt;: Built-in virtual environment management with &lt;code&gt;AddPythonModule&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container orchestration&lt;/strong&gt;: Manages n8n container&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt;: Unified dashboard with logs, traces, and metrics from all components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development velocity&lt;/strong&gt;: Changes to Python code auto-reload, no container rebuilds needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The n8n Autonomous Roasting Workflow
&lt;/h2&gt;

&lt;p&gt;As a first step, n8n is selected for the agent layer. The visual workflow setup and the constructs provided by n8n allowed for radio verification of the &lt;code&gt;agentic roasting process&lt;/code&gt;. The heart of the system is an n8n workflow that acts as the "roasting brain." Here's what it does:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F827dnzzmh6pon8dpuz1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F827dnzzmh6pon8dpuz1q.png" alt="N8N Workflow" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Initialisation &amp;amp; Preheating (Preheating Agent)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Start → Read Roaster Status → Start Roaster → Monitor Temperature
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connects to both MCP servers via SSE (Server-Sent Events).&lt;/li&gt;
&lt;li&gt;Starts the roaster at 100% heat, 30% fan.&lt;/li&gt;
&lt;li&gt;Monitors bean temperature rising toward ~170°C during preheating.&lt;/li&gt;
&lt;li&gt;Uses an AI Agent node (Preheating Agent) with custom instructions to detect preheating completion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key metrics tracked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bean temperature.&lt;/li&gt;
&lt;li&gt;Rate of Rise (°C/min).&lt;/li&gt;
&lt;li&gt;Fan speed (%).&lt;/li&gt;
&lt;li&gt;Heat level (%).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: Bean Charge Detection (Preheating Agent)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Monitor Temp → Detect Temperature Delta threshold → Mark T0 Timestamp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When green beans are added to the hot roaster, the temperature suddenly drops (e.g., from 170+°C → less than 90°C). Then the workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tracks rolling temperature averages.&lt;/li&gt;
&lt;li&gt;Detects sudden drops &amp;gt; 40°C.&lt;/li&gt;
&lt;li&gt;Marks "T0" - the beginning of roast time.&lt;/li&gt;
&lt;li&gt;All subsequent metrics are relative to T0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;From the logs:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"t0_detected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"beans_added_temp_c"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;96&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"t0_timestamp_utc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-11-15T21:21:56.490259+00:00"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 3: First Crack Detection (Roast Agent)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Loop: Poll First Crack MCP → Check Status → Wait
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The workflow continuously calls the First Crack Detection MCP server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Streams microphone audio to the PyTorch model.&lt;/li&gt;
&lt;li&gt;Uses sliding window inference (10-second windows).&lt;/li&gt;
&lt;li&gt;Implements "pop-confirmation" logic (minimum 3 pops within 30 seconds).&lt;/li&gt;
&lt;li&gt;Reports when first crack is confirmed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detection event:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"first_crack_temp_c"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;184.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"first_crack_time_display"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"08:42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"roast_elapsed_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;522&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 4: Development Time Management (Roast Agent)
&lt;/h3&gt;

&lt;p&gt;This phase is important as it can lead to under-roasted or over-roasted beans. The agent's objective is to adjust fan and heat to extend development time.&lt;/p&gt;

&lt;p&gt;Development time percentage is the percentage of the time spent between first crack and end of roast compared to the overall roasting time. The goal is to get this period around 15-20%. On my machine, I have noticed that this needs to be achieved before bean temperatures go above 196°C to get the results I prefer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Loop: Adjust Heat/Fan → Monitor Development % → Check Target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once first crack is detected, the &lt;strong&gt;critical development phase&lt;/strong&gt; begins. The workflow's AI agent:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitors:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Current bean temperature&lt;/li&gt;
&lt;li&gt;Rate of Rise (to prevent stalling or rushing)&lt;/li&gt;
&lt;li&gt;Development time percentage (target: 15-20%)&lt;/li&gt;
&lt;li&gt;Time since first crack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Controls:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces heat (100% → 60% → 40%)&lt;/li&gt;
&lt;li&gt;Increases fan speed (30% → 50% → 70%)&lt;/li&gt;
&lt;li&gt;Slows the roast to extend development time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Decision logic&lt;/strong&gt; (via AI agent):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IF development_time_percent &amp;gt;= 15% AND development_time_percent &amp;lt;= 20%:
    IF bean_temp_c &amp;gt;= 190 AND bean_temp_c &amp;lt;= 195:
        → DROP BEANS (optimal light roast)
    ELSE IF bean_temp_c &amp;gt; 195:
        → DROP BEANS (approaching medium roast)
ELSE:
    → CONTINUE MONITORING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Actual output from workflow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"development"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"monitor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"bean_temp_c"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;191&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Development: 191°C, 8.9%"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then moments later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cooling"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"drop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"bean_temp_c"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;193&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Optimal! Dropping beans."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 5: Completion
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Drop Beans → Set Cooling Fan to 100% → Stop Heat → Cool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Commands the roaster to drop beans into cooling tray.&lt;/li&gt;
&lt;li&gt;Sets the cooling fan to 100% for maximum cooling.&lt;/li&gt;
&lt;li&gt;Cuts heat to 0%.&lt;/li&gt;
&lt;li&gt;Records final metrics for analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Final roast profile:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"roast_elapsed_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;584&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"roast_elapsed_display"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"09:44"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"beans_added_temp_c"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;175.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"first_crack_temp_c"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;184.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"first_crack_time_display"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"08:42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"development_time_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;62&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"development_time_display"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"01:02"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"development_time_percent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_roast_duration_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;584&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Aspire Orchestration Code
&lt;/h2&gt;

&lt;p&gt;Here's how .NET Aspire 13 makes this all work (from &lt;code&gt;Program.cs&lt;/code&gt;):&lt;/p&gt;

&lt;h3&gt;
  
  
  Python MCP Servers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Roaster Control MCP Server&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;roasterControl&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPythonModule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"roaster-control"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;projectRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"src.mcp_servers.roaster_control.sse_server"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithVirtualEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sharedVenvPath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithHttpEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5002&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ROASTER_CONTROL_PORT"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AUTH0_DOMAIN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth0Domain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AUTH0_AUDIENCE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth0Audience&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"USE_MOCK_HARDWARE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;useMockHardware&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OTEL_EXPORTER_OTLP_PROTOCOL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"grpc"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithOtlpExporter&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// First Crack Detection MCP Server&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;firstCrackDetection&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPythonModule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"first-crack-detection"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;projectRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"src.mcp_servers.first_crack_detection.sse_server"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithVirtualEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sharedVenvPath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithHttpEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"FIRST_CRACK_DETECTION_PORT"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AUTH0_DOMAIN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth0Domain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AUTH0_AUDIENCE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth0Audience&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OTEL_EXPORTER_OTLP_PROTOCOL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"grpc"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithOtlpExporter&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's happening here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AddPythonModule&lt;/code&gt;: New in Aspire 13, replaces the old &lt;code&gt;AddPythonApp&lt;/code&gt; API&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WithVirtualEnvironment&lt;/code&gt;: Points to shared Python 3.11 venv at repo root&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WithHttpEndpoint&lt;/code&gt;: Configures SSE endpoints for n8n to connect&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WithOtlpExporter&lt;/code&gt;: Sends telemetry to Aspire dashboard&lt;/li&gt;
&lt;li&gt;Modules run with &lt;code&gt;-m&lt;/code&gt; flag implicitly (e.g., &lt;code&gt;python -m src.mcp_servers.roaster_control.sse_server&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Container Services
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// n8n Workflow Engine&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;n8n&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"n8n"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"n8nio/n8n"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"latest"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithHttpEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5678&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;targetPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5678&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"n8n-ui"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithBindMount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"./n8n-data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/home/node/.n8n"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"N8N_HOST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"0.0.0.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"N8N_PORT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"5678"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"WEBHOOK_URL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:5678/"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"N8N_METRICS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bind mount for persisting workflows and credentials.&lt;/li&gt;
&lt;li&gt;Exposes port 5678 for web UI.&lt;/li&gt;
&lt;li&gt;Metrics enabled for observability.&lt;/li&gt;
&lt;li&gt;Auto-restarts on failure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The First Autonomous Roast
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stats:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total roast time: 9:44 (584 seconds)&lt;/li&gt;
&lt;li&gt;First crack: 8:42 at 184°C
&lt;/li&gt;
&lt;li&gt;Development time: 1:02 (10.6% - slightly under target but acceptable)&lt;/li&gt;
&lt;li&gt;Final temperature: 193°C&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Light roast, consistent colour and smooth taste.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What worked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temperature drop detection caught bean addition instantly..&lt;/li&gt;
&lt;li&gt;First crack detection was accurate (within 20 seconds of my ears). This is why 10% development percentage is not an issue.&lt;/li&gt;
&lt;li&gt;Heat/fan adjustments prevented burning.&lt;/li&gt;
&lt;li&gt;Development % monitoring kept roast in safe zone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What could improve:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Development time was 10.6% instead of target 15-20%.&lt;/li&gt;
&lt;li&gt;Could start reducing heat earlier after first crack.&lt;/li&gt;
&lt;li&gt;Rate of Rise could be smoother in final phase.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Aspire Dashboard Experience
&lt;/h3&gt;

&lt;p&gt;The unified Aspire dashboard shows:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Services:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;roaster-control (Python) - Running&lt;/li&gt;
&lt;li&gt;first-crack-detection (Python) - Running
&lt;/li&gt;
&lt;li&gt;n8n (Container) - Running&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8qopy9tbvau2z540l6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8qopy9tbvau2z540l6a.png" alt="Rate of Rise" width="800" height="605"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjtaufe1iz7ceikrmap9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjtaufe1iz7ceikrmap9.png" alt="Bean Temperature" width="800" height="639"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. MCP Server Design Matters
&lt;/h3&gt;

&lt;p&gt;Current design has two MCP servers. One for roaster control, one for first crack detection. The original idea was, the roaster control MCP server could run on a low powered device connected to the roaster and the First Crack Detector could run on the laptop due to hardware requirements. &lt;/p&gt;

&lt;p&gt;This design adds coordination overhead to the agent and makes it more complicated than necessary. A unified MCP server that returns all metrics in a single call would simplify the agent logic and likely lead to more predictable behaviour. Before moving onto multiple agent framework comparison, this will be one area to improve. &lt;/p&gt;

&lt;h3&gt;
  
  
  2. Aspire's Python Support is Production-Ready
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before Aspire:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple terminal windows or docker compose&lt;/li&gt;
&lt;li&gt;Manual venv activation&lt;/li&gt;
&lt;li&gt;Additional effort to add open telemetry collectors and dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With Aspire:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One command: &lt;code&gt;dotnet run&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Automatic venv management.&lt;/li&gt;
&lt;li&gt;Shared configuration.&lt;/li&gt;
&lt;li&gt;Structured logging and tracing.&lt;/li&gt;
&lt;li&gt;Custom metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. n8n is Powerful for Agent Orchestration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why n8n worked well:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual debugging: See workflow execution in real-time.&lt;/li&gt;
&lt;li&gt;Built-in AI Agent node: Uses OpenAI with tool calling.&lt;/li&gt;
&lt;li&gt;MCP client support: Native SSE connections.&lt;/li&gt;
&lt;li&gt;Error handling: Built-in retry logic and error branches.&lt;/li&gt;
&lt;li&gt;State management: Workflow variables persist between runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. MCP Protocol Makes Tool Integration a Breeze
&lt;/h3&gt;

&lt;p&gt;The MCP servers exposed simple HTTP/SSE endpoints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Roaster Control Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;read_roaster_status&lt;/code&gt; → Returns current sensors + metrics&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;adjust_heat(level: int)&lt;/code&gt; → Sets heat 0-100%&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;adjust_fan(speed: int)&lt;/code&gt; → Sets fan 0-100%&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stop_roaster()&lt;/code&gt; → Emergency stop&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;start_roaster()&lt;/code&gt; → Begin roast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;First Crack Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;start_first_crack_detection()&lt;/code&gt; → Start audio monitoring&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_first_crack_status()&lt;/code&gt; → Check if first crack detected&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stop_first_crack_detection()&lt;/code&gt; → Stop monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The n8n AI Agent called these tools naturally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent: "I need to check the roaster status"
→ Calls read_roaster_status
→ Receives JSON with temp, fan, heat, metrics
→ Makes decision
→ Calls adjust_heat(60) to reduce heat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Observability is Critical
&lt;/h3&gt;

&lt;p&gt;When the roast is in progress, you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time monitoring&lt;/strong&gt;: See temperature changing every 2 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error visibility&lt;/strong&gt;: Know immediately if MCP server crashes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance metrics&lt;/strong&gt;: Ensure control commands complete in &amp;lt;500ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical data&lt;/strong&gt;: Review roast profile after completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aspire's OpenTelemetry integration gave us all of this for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Development Experience with Warp Agent
&lt;/h2&gt;

&lt;p&gt;Throughout this project, I used Warp Agent extensively:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For Aspire upgrade (9 → 13):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Warp Agent searched Microsoft Learn docs via MCP&lt;/li&gt;
&lt;li&gt;Found breaking changes in &lt;code&gt;AddPythonApp&lt;/code&gt; → &lt;code&gt;AddPythonModule&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Generated migration plan with test steps&lt;/li&gt;
&lt;li&gt;Verified builds and runtime behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For n8n workflow debugging:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzed MCP server logs to diagnose connection issues&lt;/li&gt;
&lt;li&gt;Suggested retry logic for transient network errors&lt;/li&gt;
&lt;li&gt;Helped structure AI agent prompts for decision-making&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For Python model optimization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Profiled inference latency&lt;/li&gt;
&lt;li&gt;Suggested caching strategies for feature extraction&lt;/li&gt;
&lt;li&gt;Optimized sliding window parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What made Warp Agent effective:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context awareness&lt;/strong&gt;: Understood the full stack (C#, Python, n8n)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP integration&lt;/strong&gt;: Could fetch latest Microsoft docs and Context 7 for n8n.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterative debugging&lt;/strong&gt;: Quickly test → analyse → fix cycles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code generation&lt;/strong&gt;: Created boilerplate while I focused on logic.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Short Term
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Roast profile tuning&lt;/strong&gt;: Adjust heat/fan curves to hit 15-20% development consistently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data collection&lt;/strong&gt;: Log every roast for analysis (temp curves, timestamps, outcomes).

&lt;ul&gt;
&lt;li&gt;Add support for automatically exporting roast statistics and ability to rate roasts later.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Improved first crack detection: Capture manual recording sessions using different environmental setup to improve detection. What we have is impressive given we only had 9 roasting sessions for fine tuning. But we can do better.&lt;/li&gt;
&lt;li&gt;Implement multiple agent frameworks and compare pros and cons.&lt;/li&gt;
&lt;li&gt;Test MCP servers running on a Raspberry PI 5.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Medium Term
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Train an emulation roast model using historical roast logs.

&lt;ul&gt;
&lt;li&gt;This will allow experimentation without using actual hardware and will also allow realistic response taking heat, fan and time variables to emulate roaster heating process. &lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Machine learning on roast profiles&lt;/strong&gt;: Train model to predict optimal heat/fan adjustments once there is enough roast samples and ratings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom UI&lt;/strong&gt;: Build dedicated roasting interface (replace n8n for end users) to allow unified experience across agent frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-origin support&lt;/strong&gt;: Adjust profiles based on bean origin (Kenya vs Brazil)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can .NET Aspire roast coffee?&lt;/strong&gt; Absolutely.&lt;/p&gt;

&lt;p&gt;More importantly, it provided:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unified orchestration&lt;/strong&gt; for polyglot services (C#, Python, Node.js containers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer productivity&lt;/strong&gt; with single-command startup and hot reload&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production observability&lt;/strong&gt; with unified logs, traces, and metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt; to iterate quickly on both code and workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PyTorch model for first crack detection (Part 1)&lt;/li&gt;
&lt;li&gt;MCP servers for hardware control and detection (Part 2)
&lt;/li&gt;
&lt;li&gt;.NET Aspire orchestration with n8n workflows (Part 3)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...resulted in a &lt;strong&gt;fully autonomous coffee roasting system&lt;/strong&gt; that produces genuinely good coffee.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;From 9 raw audio recordings to autonomous coffee roasting—all orchestrated with a single command: dotnet run&lt;/p&gt;

&lt;p&gt;The coffee tastes great. The code is open source. And yes, .NET Aspire can definitely roast coffee.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For reference, Today's roast incurred $0.76 OpenAI API usage cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Code and Articles
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code repository:&lt;/strong&gt; &lt;a href="https://github.com/syamaner/bean-agent" rel="noopener noreferrer"&gt;Bean Agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/syamaner/part-1-training-a-neural-network-to-detect-coffee-first-crack-from-audio-an-agentic-development-1jei"&gt;&lt;strong&gt;Part 1:&lt;/strong&gt; Training the Audio Detection Model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/syamaner/part-2-building-mcp-servers-to-control-a-home-coffee-roaster-an-agentic-development-journey-with-58ik"&gt;&lt;strong&gt;Part 2:&lt;/strong&gt; Building MCP Servers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  .NET Aspire Documentation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Aspire Overview:&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/" rel="noopener noreferrer"&gt;.NET Aspire documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upgrade to Aspire 13:&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/get-started/upgrade-to-aspire-13" rel="noopener noreferrer"&gt;Upgrade guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python Hosting in Aspire:&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/get-started/build-aspire-apps-with-python" rel="noopener noreferrer"&gt;Orchestrate Python apps&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tools and Protocols
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;n8n Workflow Automation:&lt;/strong&gt; &lt;a href="https://n8n.io" rel="noopener noreferrer"&gt;n8n.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Context Protocol:&lt;/strong&gt; &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;modelcontextprotocol.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenTelemetry:&lt;/strong&gt; &lt;a href="https://opentelemetry.io" rel="noopener noreferrer"&gt;opentelemetry.io&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Model &amp;amp; ML:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593" rel="noopener noreferrer"&gt;Audio Spectrogram Transformer (AST)&lt;/a&gt; - Pre-trained model&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/transformers/en/model_doc/audio-spectrogram-transformer" rel="noopener noreferrer"&gt;AST Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/fine-tune-the-audio-spectrogram-transformer-with-transformers-73333c9ef717/" rel="noopener noreferrer"&gt;Fine-Tuning AST Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2104.01778" rel="noopener noreferrer"&gt;Original AST Paper&lt;/a&gt; - Gong et al., 2021&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The first roast:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04p6g4kmxbc8pq39b2dg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04p6g4kmxbc8pq39b2dg.png" alt="First Roast" width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>aspire</category>
      <category>agents</category>
    </item>
    <item>
      <title>Part 2: Building MCP Servers to Control a Home Coffee Roaster - An Agentic Development Journey with Warp Agent</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Sun, 02 Nov 2025 14:58:02 +0000</pubDate>
      <link>https://dev.to/syamaner/part-2-building-mcp-servers-to-control-a-home-coffee-roaster-an-agentic-development-journey-with-58ik</link>
      <guid>https://dev.to/syamaner/part-2-building-mcp-servers-to-control-a-home-coffee-roaster-an-agentic-development-journey-with-58ik</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this 3-part series, we are building an autonomous coffee roasting agent with Warp. The first part covered how we fine-tuned a model to detect first crack — a critical phase in the roasting process. This was a nice warm-up implementing a key component for our end goal, but detection alone isn't enough. &lt;strong&gt;Now we need to expose this functionality so the agent we'll build can both detect first crack and control the roasting process&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This post focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The objective&lt;/strong&gt;: Turning ML predictions into real-world roaster control actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution overview&lt;/strong&gt;: Model Context Protocol (MCP) servers as the bridge between AI agents and hardware&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation&lt;/strong&gt;: The two MCP servers we built—First Crack Detector MCP + Hottop Controller MCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;📊 TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Connect trained ML model to physical roaster control using an agent to achieve autonomous coffee roasting&lt;/li&gt;
&lt;li&gt; Build two MCP servers — FirstCrackDetector + HottopController

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Stack&lt;/strong&gt;: Python MCP SDK, pyserial, pyhottop, Auth0 authentication&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt; Real-time detection + safe roaster control via AI agents (208+ tests passing)&lt;/li&gt;

&lt;li&gt; Using Warp Agent mode, Context7 MCP Server, Auth0 MCP Server during development&lt;/li&gt;

&lt;li&gt; Next Part: Part 3 orchestrates both servers with Microsoft Agent Framework&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Integrating software, hardware and agents without reinventing the wheel
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Traditional approach: Build custom APIs, handle authentication, manage state
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt; Write integration code using imperative/declarative patterns to manage task lifecycles.&lt;/li&gt;
&lt;li&gt; Homegrown specifications make it harder to leverage emerging ML/AI technologies.&lt;/li&gt;
&lt;li&gt; Each new AI model or agent requires custom integration work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The MCP option: Standardised protocol for AI &amp;lt;-&amp;gt; tool communication
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Provides deterministic tools for non-deterministic AI systems&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Benefits: Discoverability, type safety, streaming support, composability, and interoperability across AI models and agents.&lt;/li&gt;
&lt;li&gt; Write once, connect to MCP-compatible AI (Claude, ChatGPT, custom agents).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🤖 Warp Agent Contributions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Provided MCP server scaffolding and tool definitions using standard MCP SDK.&lt;/li&gt;
&lt;li&gt; Helped integrate pyhottop library for Hottop serial protocol communication.&lt;/li&gt;
&lt;li&gt; Debugged serial communication timing issues and state synchronization.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Generated Auth0 authentication middleware&lt;/strong&gt; with role-based access control.&lt;/li&gt;
&lt;li&gt; Created comprehensive test suites (&lt;strong&gt;208+ tests&lt;/strong&gt; across both MCP servers).&lt;/li&gt;
&lt;li&gt; Suggested testing strategies including simulation mode for hardware free development.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is MCP (Model Context Protocol) and Why Use It?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is MCP?
&lt;/h3&gt;

&lt;p&gt;MCP is a protocol that enables AI assistants like Claude, ChatGPT, Warp Agent Mode to connect to external resources through a client-server architecture following standard protocols. This allows using the same MCP server with various agent technologies without having to modify the code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Client-Server Concept
&lt;/h4&gt;

&lt;h5&gt;
  
  
  MCP Server
&lt;/h5&gt;

&lt;p&gt;A program that exposes specific data and tools (functionality) that an AI Agents can use. For example, a server might provide access to a database, file system, or even as in our case specific hardware.&lt;/p&gt;

&lt;h5&gt;
  
  
  MCP Client
&lt;/h5&gt;

&lt;p&gt;The application that connects to MCP servers and makes their capabilities available to the AI. As an example, Claude /ChatGPT acts as an MCP client.&lt;/p&gt;

&lt;h4&gt;
  
  
  Transport Types
&lt;/h4&gt;

&lt;p&gt;MCP supports different ways for clients and servers to communicate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stdio (Standard Input/Output)

&lt;ul&gt;
&lt;li&gt;Most common for local integrations&lt;/li&gt;
&lt;li&gt;Server runs as a subprocess, communicating via stdin/stdout&lt;/li&gt;
&lt;li&gt;Simple and works well for local tools&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;SSE (Server-Sent Events)
-Used for remote servers over HTTP

&lt;ul&gt;
&lt;li&gt;Server pushes updates to client&lt;/li&gt;
&lt;li&gt;Good for web-based integrations&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Custom transports can also be implemented

&lt;ul&gt;
&lt;li&gt;The protocol is designed to be transport-agnostic&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h5&gt;
  
  
  Flow
&lt;/h5&gt;

&lt;p&gt;When user asks Claude (MCP Client) something that is exposed by an MCP Server, the client can call the MCP server to retrieve data or execute tools, then use that information in its response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F753jtozj6702t5mhezsw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F753jtozj6702t5mhezsw.png" alt="MCP Overview" width="337" height="648"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Server 1: First Crack Detector MCP
&lt;/h2&gt;

&lt;p&gt;In this section, we will briefly cover how the detector we have trained in the previous article is exposed as an MCP Server.&lt;/p&gt;

&lt;p&gt;The following diagram illustrates how the components are exposed as an MCP Server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgvfjy6afzrvr0dzlnze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgvfjy6afzrvr0dzlnze.png" alt="First Crack Detection MCP Server" width="441" height="753"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation Details
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; MCP SDK setup: Server initialisation using standard Python MCP SDK with stdio and SSE transports.&lt;/li&gt;
&lt;li&gt; Tool definitions: start_detection, stop_detection, get_status with Auth0 role-based authorisation.&lt;/li&gt;
&lt;li&gt; Session management: Thread-safe singleton pattern with idempotency enforcement.&lt;/li&gt;
&lt;li&gt; Real-time monitoring: Streaming detection events via SSE for live status updates.&lt;/li&gt;
&lt;li&gt; Error handling: Audio device enumeration failures, model loading issues, thread crashes, timeout scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Walkthrough
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Key implementation components:
# - MCP SDK decorators for tool registration
# - Auth0 JWT validation middleware
# - Session manager with thread-safe state
# - OpenTelemetry tracing integration
&lt;/span&gt;
&lt;span class="c1"&gt;# First Crack Detection MCP Server setup
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TextContent&lt;/span&gt;

&lt;span class="n"&gt;mcp_server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;first-crack-detection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp_server.list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_first_crack_detection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Start monitoring audio for first crack events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;inputSchema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio_source_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usb_microphone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;builtin_microphone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio_source_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@mcp_server.call_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_first_crack_detection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Thread-safe session management
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;audio_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;AudioConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing Approach
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Custom test scripts: Python scripts using stdio communication (test_mcp_roaster.py)&lt;/li&gt;
&lt;li&gt;  Shell integration tests: Bash scripts for end-to-end workflows (test_roaster_server.sh)&lt;/li&gt;
&lt;li&gt; Unit test coverage: 86 passing tests with pytest, mocking audio devices and model inference&lt;/li&gt;
&lt;li&gt; Manual hardware testing: Real USB microphone validation with live roasting sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Server 2: Hottop Roaster Controller MCP
&lt;/h2&gt;

&lt;p&gt;The Second MCP Server is to expose the status of the roaster and also to run commands to set heat, fan as well as start / stop commands for the roast. &lt;/p&gt;

&lt;p&gt;This is implemented as a separate MCP server due to different hardware requirements which might require running these on different hosts.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hottop KN-8828B-2K+ Protocol
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Initial approach with pyhottop library&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Initially, we have attempted to use the &lt;a href="https://github.com/splitkeycoffee/pyhottop" rel="noopener noreferrer"&gt;&lt;code&gt;pyhottop&lt;/code&gt;&lt;/a&gt; library for serial communication with the Hottop KN-8828B-2K+ roaster. However, we encountered compatibility issues that prevented reliable operation and had to consider an alternative approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adapting to Artisan's protocol&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After &lt;code&gt;pyhottop&lt;/code&gt; proved unreliable, we analyzed the &lt;a href="https://artisan-roasterscope.blogspot.com/" rel="noopener noreferrer"&gt;Artisan roasting software&lt;/a&gt; source code — a mature, widely-used open-source application for coffee roaster control. Being a user of Artisan's Hottop integration, I knew it has been working well for me. Given it has been battle-tested by the roasting community for a long time, it was an obvious next choice.&lt;/p&gt;

&lt;p&gt;Warp Agent Mode has successfully analysed and adapted Artisan's serial protocol implementation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecmewzo4la7ovyuf1lwo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecmewzo4la7ovyuf1lwo.png" alt="Roaster MCP Server" width="447" height="994"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation Details
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serial connection management&lt;/strong&gt;: USB serial at 115200 baud, continuous 0.3s command intervals (required by Hottop)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Command encoding&lt;/strong&gt;: Artisan-compatible 36-byte protocol with checksums&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Status parsing&lt;/strong&gt;: Real-time temperature readings (bean + chamber) from serial responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input validation&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;Heat/fan values: 0-100% in 10% increments&lt;/li&gt;
&lt;li&gt;Connection state checks before commands&lt;/li&gt;
&lt;li&gt;Thread-safe state management with locks&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Authentication&lt;/strong&gt;: Auth0 JWT with role-based access control

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;read:roaster&lt;/code&gt; - Status monitoring only&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;write:roaster&lt;/code&gt; - Full hardware control&lt;/li&gt;
&lt;li&gt;Per-user audit logging&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Key implementation layers:
# 1. MCP Server (sse_server.py) - Auth0 + tool definitions
# 2. SessionManager - Thread-safe orchestration
# 3. HardwareInterface - Artisan serial protocol
# 4. Continuous command loop - 0.3s intervals with temperature polling
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjrybrvcvuwn3pw9itfm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjrybrvcvuwn3pw9itfm.png" alt="Roaster MCP Server Overview" width="712" height="1316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Testing Strategy
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt; MockRoaster: Realistic thermal simulation for development without hardware. Unfortunately, due to time restrictions, this has provided limited utility.&lt;/li&gt;
&lt;li&gt; Hardware verification: Validated with physical Hottop KN-8828B-2K+ (October 2025).&lt;/li&gt;
&lt;li&gt; Test coverage: 122 passing unit tests.&lt;/li&gt;
&lt;li&gt; Manual test scripts: test_hottop_interactive.py, test_hottop_auto.py.&lt;/li&gt;
&lt;li&gt; Integration tests: SSE transport, Auth0 authentication, command sequences.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Hardware verification
&lt;/h4&gt;

&lt;p&gt;The implementation was verified with physical Hottop KN-8828B-2K+ hardware on October 25, 2025:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Drum motor control&lt;/li&gt;
&lt;li&gt;Heat control (0-100%)&lt;/li&gt;
&lt;li&gt;Fan control (0-100%)&lt;/li&gt;
&lt;li&gt;Bean drop sequence&lt;/li&gt;
&lt;li&gt;Cooling system&lt;/li&gt;
&lt;li&gt;Continuous temperature readings (Bean &amp;amp; Chamber)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security and Safety Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Transport and Security Architecture
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Why SSE (Server-Sent Events)?
&lt;/h4&gt;

&lt;p&gt;Although both MCP servers currently run on the same machine as the agent &lt;em&gt;(hint hint: stdio transport would be simpler)&lt;/em&gt;, we designed for a distributed architecture from the start. The plan is to eventually deploy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Roasting MCP servers need to run close to the hardware - Running on the machine physically connected to the roaster and microphone.&lt;/li&gt;
&lt;li&gt;Agent on a separate device - A cloud server or different local machine for orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach means we could potentially use a low powered computer (such as RaspBerry PI) for the servers and also make it easier to plugh and start instead of having to use a laptop next to roaster every time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SSE benefits: Real-time streaming, works over standard HTTP/HTTPS, firewall-friendly&lt;/li&gt;
&lt;li&gt;MCP compatibility: Follows MCP specification for HTTP+SSE transport&lt;/li&gt;
&lt;li&gt;Future-proof: Easy transition from localhost to remote deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Authentication and Authorisation with Auth0
&lt;/h4&gt;

&lt;p&gt;Once MCP servers are exposed over the Internet / network, security becomes critical—especially for hardware control. This section briefly covers our approach for integrating Auth0 for authentication and authorisation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Auth0?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ease of integration: Well-documented SDKs and middleware&lt;/li&gt;
&lt;li&gt;OAuth 2.0 Client Credentials: Perfect for machine-to-machine authentication&lt;/li&gt;
&lt;li&gt;Role-based access control: Granular permissions via scopes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bonus&lt;/strong&gt;: Auth0 MCP Server for Warp Agent Mode to perform configuration tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Security implementation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JWT validation: Every MCP request validates Auth0 JWT tokens&lt;/li&gt;
&lt;li&gt;Scope-based authorization:

&lt;ul&gt;
&lt;li&gt;read:roaster - Status monitoring only (observer role)&lt;/li&gt;
&lt;li&gt;write:roaster - Full hardware control (operator role)&lt;/li&gt;
&lt;li&gt;admin:roaster - Administrative functions (future)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Token expiration: JWTs expire, requiring regular re-authentication&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfnkkc84t5n9vlzh3f0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfnkkc84t5n9vlzh3f0u.png" alt="Authorisation overview" width="800" height="899"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This architecture ensures that only authorised clients can control the roaster, with full traceability of who did what and when—essential for safety-critical hardware operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;It has been a fun side project seeing how far Warp Agent Mode (Coding Agent) and emerging MCP Servers for developer documentation (Context 7) and service access (Auth0 MCP Server) cam in terms of speeding up development process. &lt;/p&gt;

&lt;p&gt;It is still required have a clear architecture, requirements and a final picture before starting. When these are used in conjunction with Agentic development tools, tasks could take several days can be completed in a day or so.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Worked Well and Lessons Learned
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard MCP SDK&lt;/strong&gt;: Provided solid foundation for both stdio and SSE transports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSE transport&lt;/strong&gt;: Future-proofed the architecture for distributed deployment.

&lt;ul&gt;
&lt;li&gt;Tested working with N8N, LangFlow and Python based local agents.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Auth0 integration&lt;/strong&gt;: Straightforward OAuth 2.0 implementation with role-based access control.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Warp Agent Mode assistance&lt;/strong&gt;: Accelerated MCP protocol understanding, test generation, and Auth0 middleware implementation.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Context7 and Auth0 MCP servers&lt;/strong&gt;: Using MCP servers to build MCP servers (via Warp) streamlined development.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;pyhottop library&lt;/strong&gt;: Initial attempt failed, but pivoting to Artisan's proven protocol worked as expected.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;MockRoaster simulation&lt;/strong&gt;: Due to time restrictions, this was not explored sufficiently and ended up manually testing using local agents.

&lt;ul&gt;
&lt;li&gt;This is needs to be revisited in the future especially to be able to test agent loop.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  MCP-Specific Insights
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transport choice matters&lt;/strong&gt;: SSE enables remote deployment but requires careful auth implementation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool vs resource patterns&lt;/strong&gt;: Tools for actions (hardware control), resources for data streams (status monitoring) so far working well with agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency is critical&lt;/strong&gt;: Start/stop commands must be safe to call multiple times.

&lt;ul&gt;
&lt;li&gt;Initial attempts caused hardware to go into start stop loop and required manual intervention.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Scope-based authorization&lt;/strong&gt;: Fine-grained permissions (read:roaster vs write:roaster) essential for hardware safety.

&lt;ul&gt;
&lt;li&gt;Although in the current integration mode, only a single client is used.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Observability&lt;/strong&gt;: Using Open Telemetry helped troubleshooting issues quickly.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Testing strategies&lt;/strong&gt;: Combination of unit tests (208 total), MockRoaster simulation, and manual hardware validation.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Usage Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Warp Agnt with Roaster Control (initial testing)
&lt;/h3&gt;

&lt;p&gt;The following are sample prompts intended for testing the MCP server from Warp.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Start the roaster"
Warp Agent Mode: [calls roaster_control MCP: start_roaster()]
         "Roaster started. Ready to add beans."

User: "Check the roaster status"
Warp Agent Mode: [calls roaster_control MCP: read_roaster_status()]
         "Status:
         • Drum: Running
         • Bean temp: 172°C
         • Chamber temp: 185°C
         • Heat: 0%
         • Fan: 0%"

User: "Stop the drum"
Warp Agent Mode: [calls roaster_control MCP: stop_roaster()]
         "Roaster drum stopped. Heat automatically turned off for safety."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next: Part 3 Preview
&lt;/h2&gt;

&lt;p&gt;Now that we have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A trained first crack detection model (Part 1).&lt;/li&gt;
&lt;li&gt;Built two MCP servers with hardware control + ML inference (Part 2).&lt;/li&gt;
&lt;li&gt;Used Auth0 for security and SSE transport for distributed deployment (Part 2).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In &lt;strong&gt;Part 3: Building an Autonomous Roasting Agent&lt;/strong&gt;, we'll bring it all together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comparing multiple Agent frameworks and assessing their suitability for such long running workflows.

&lt;ul&gt;
&lt;li&gt;N8N&lt;/li&gt;
&lt;li&gt;LangFlow&lt;/li&gt;
&lt;li&gt;Python based&lt;/li&gt;
&lt;li&gt;Using Microsoft Agent Framework&lt;/li&gt;
&lt;li&gt;Using OpenAI Python SDK&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Real-time decision making&lt;/strong&gt;: Agent analyses temperature trends, RoR, and first crack to adjust roast.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Extending OpenTelemetry observability&lt;/strong&gt;: Distributed tracing across agent + MCP servers + hardware.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Safety systems&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;Temperature bounds monitoring.&lt;/li&gt;
&lt;li&gt;Emergency stop on anomalies.&lt;/li&gt;
&lt;li&gt;Human override via UI.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Full end-to-end test&lt;/strong&gt;: Press start -&amp;gt; Preheat -&amp;gt; Add beans -&amp;gt; Hands off -&amp;gt; First crack -&amp;gt; Development -&amp;gt; Drop -&amp;gt; (hope for)  Perfect roast :)&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The goal&lt;/strong&gt;: An AI that roasts coffee consistently, safely, and (hopefully) better than manual control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MCP &amp;amp; Standards:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol Spec&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/modelcontextprotocol/python-sdk" rel="noopener noreferrer"&gt;Official MCP Python SDK&lt;/a&gt; - Used in this project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project Code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/syamaner/bean-agent" rel="noopener noreferrer"&gt;Coffee Roasting Repository&lt;/a&gt; - Complete source code&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/bean-agent/tree/main/src/mcp_servers/first_crack_detection" rel="noopener noreferrer"&gt;First Crack Detector MCP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/bean-agent/tree/main/src/mcp_servers/roaster_control" rel="noopener noreferrer"&gt;Roaster Control MCP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hardware &amp;amp; Serial Communication:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://artisan-roasterscope.blogspot.com/" rel="noopener noreferrer"&gt;Artisan Roaster Scope&lt;/a&gt; - Source of Hottop protocol implementation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pyserial.readthedocs.io/" rel="noopener noreferrer"&gt;pyserial Documentation&lt;/a&gt; - USB serial communication&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.hottopamericas.com/KN-8828B-2Kplus.html" rel="noopener noreferrer"&gt;Hottop KN-8828B-2K+&lt;/a&gt; - Roaster hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Authentication:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://auth0.com/docs" rel="noopener noreferrer"&gt;Auth0 Documentation&lt;/a&gt; - Identity provider&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://auth0.com/docs/get-started/authentication-and-authorization-flow/client-credentials-flow" rel="noopener noreferrer"&gt;OAuth 2.0 Client Credentials&lt;/a&gt; - Machine-to-machine auth&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://warp.dev" rel="noopener noreferrer"&gt;Warp Terminal&lt;/a&gt; - AI-assisted development environment&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://auth0.com/docs/get-started/auth0-mcp-server" rel="noopener noreferrer"&gt;Auth0 MCP Server&lt;/a&gt; - Used via Warp for Auth0 configuration&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.warp.dev/knowledge-and-collaboration/mcp" rel="noopener noreferrer"&gt;Warp MCP Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.warp.dev/university/mcp/using-context7-mcp-server" rel="noopener noreferrer"&gt;Context7 MCP Server&lt;/a&gt; - Used for documentation lookup during development&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>mcp</category>
      <category>agents</category>
      <category>ai</category>
      <category>warpdev</category>
    </item>
    <item>
      <title>Part 1: Training a Neural Network to Detect Coffee First Crack from Audio - An Agentic Development Journey with Warp Agent</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Mon, 27 Oct 2025 20:59:05 +0000</pubDate>
      <link>https://dev.to/syamaner/part-1-training-a-neural-network-to-detect-coffee-first-crack-from-audio-an-agentic-development-1jei</link>
      <guid>https://dev.to/syamaner/part-1-training-a-neural-network-to-detect-coffee-first-crack-from-audio-an-agentic-development-1jei</guid>
      <description>&lt;p&gt;When it comes to coffee, everyone has their preferences. I usually prefer smooth, naturally sweet coffee with nice fragrance - no bitter or smoky flavours.&lt;/p&gt;

&lt;p&gt;There is a challenge though: achieving that perfect roast at home requires split-second timing. Miss the "first crack" by 30 seconds? You've got bitter, over-roasted beans. Finish the roast early? Enjoy your grassy / earthy tasting coffee.&lt;/p&gt;

&lt;p&gt;This post is about teaching a neural network to detect that critical moment from audio alone.&lt;/p&gt;

&lt;p&gt;While home roasting has been niche, over the recent years there are more options available for roasting coffee at home. These devices usually have smaller capacity ~ 250 / 500g and compact and lightweight enough to run over a counter.&lt;/p&gt;

&lt;p&gt;To achieve my desired roast level I generally aim for a light / medium roast which requires development phase to be about 10% - 15% of the roast time. Development phase is the duration from the first crack starting until the end of roast where beans are ejected from the roaster.&lt;/p&gt;

&lt;p&gt;First crack is the audible popping sound that occurs when coffee beans rapidly expand and release moisture and CO2 due to the buildup of internal pressure during roasting. Many light roast profiles end just after first crack begins, while medium roasts continue for 1-3 minutes beyond this point. On my setup, first crack typically begins around 170°C-180°C, and I aim to finish the roast at approximately 195°C. This gives me 1-3 minutes of development time after first crack starts. This value is based on my observations on a Hottop KN8828B-2K+ home roaster. &lt;/p&gt;

&lt;p&gt;Detecting the First Crack event is important for the end goal as we need to adjust heat and fan from that point to slow down the roast and stretch the development phase.&lt;/p&gt;

&lt;p&gt;The current series of posts will cover the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training a Neural Network to Detect Coffee First Crack from Audio - An Agentic Development Journey&lt;/li&gt;
&lt;li&gt;Part 2: Building an MCP server to control a home coffee roaster&lt;/li&gt;
&lt;li&gt;Part 3: Building a Coffee roasting Agent with Aspire to automate coffee roasting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I have been recording coffee roasting audio during summer and have been looking into fine-tuning an existing model to be able to train and run inference on an arm based laptop. The task is performing binary classification on an audio stream to identify either first crack happened in the sample or not. For example a common baseline benchmark is making a random choice to predict class a or b (coin toss) which makes the baseline random performance 50% for any binary classification problem when using random guessing. Our goal is to beat this with minimal data available for fine-tuning.&lt;/p&gt;

&lt;p&gt;My initial objective has been utilising a pre-trained AST (Audio Spectrogram Transformer) model from Hugging face that was originally trained on AudioSet and fine-tuning it for first crack vs not first crack binary classification task. In this approach, the model architecture remains the same, but we are updating the weights through training on our coffee roasting audio data.&lt;/p&gt;

&lt;p&gt;To tackle this challenge systematically, I decided to leverage modern development tools and adopted an AI-first development approach. In the next section, the details of the setup will be discussed.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;📊 TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem&lt;/strong&gt;: Detect coffee "first crack" from audio to optimize roast profiles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Fine-tune MIT's AST model on 9 recording sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Results&lt;/strong&gt;: 93.3% accuracy, 0.986 ROC-AUC with minimal data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt;: Warp Agent, Label Studio, PyTorch, Hugging Face&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next&lt;/strong&gt;: Part 2 builds MCP servers, Part 3 creates autonomous roasting agent&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Automated First Crack Detection?
&lt;/h2&gt;

&lt;p&gt;Manual first crack detection requires constant attention during a 10-12 minute roast. Environmental factors (noisy extractors, ambient sounds) can mask the cracks and pops. &lt;/p&gt;

&lt;p&gt;This project aims to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free the roaster to multitask during the roast&lt;/li&gt;
&lt;li&gt;Provide consistent detection regardless of ambient noise&lt;/li&gt;
&lt;li&gt;Enable data-driven roast profile development&lt;/li&gt;
&lt;li&gt;Lay groundwork for fully autonomous roasting (Part 3)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🤖 Warp Agent Contributions
&lt;/h3&gt;

&lt;p&gt;Throughout development, Warp's Agent Mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suggested Label Studio over manual Audacity annotation&lt;/li&gt;
&lt;li&gt;Generated data preprocessing pipeline architecture&lt;/li&gt;
&lt;li&gt;Created train, test eval split logic&lt;/li&gt;
&lt;li&gt;Debugged overfitting with annotation strategy advice&lt;/li&gt;
&lt;li&gt;Auto-generated evaluation scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting Up the Development Environment with Warp Agent
&lt;/h2&gt;

&lt;p&gt;Having used Warp Agent Mode at work for the past few months and how it transformed my development flow, it was a natural choice for this project.&lt;/p&gt;

&lt;p&gt;I have started with creating a &lt;a href="https://github.com/syamaner/bean-agent/blob/main/README.md" rel="noopener noreferrer"&gt;readme file&lt;/a&gt; and shared my starting requirements and setup. I have included links to the tutorials of interest, the libraries I intend to use and the model I would like to use for fine-tuning. &lt;/p&gt;

&lt;p&gt;Warp's Agent Mode helped me structure the project, suggest tools, and iterate on the implementation approach from training scripts, evaluation to inference and manual testing scripts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Evolution and Documentation
&lt;/h2&gt;

&lt;p&gt;The readme above was pretty much all I shared with Warp and then asked to focus on Phase 1 and create an &lt;a href="https://github.com/syamaner/bean-agent/blob/main/PHASE1_PLAN.md" rel="noopener noreferrer"&gt;implementation plan for Phase 1&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I have lost the recordings I made over summer and therefore I had to start with minimal data - only 4 recording sessions ~10 minutes each. This was enough to build the initial workflow with Warp. &lt;/p&gt;

&lt;h2&gt;
  
  
  Data Collection Strategy
&lt;/h2&gt;

&lt;p&gt;For data collection, I have used a USB microphone pointed towards the roaster and recording each roasting session. A session takes about 10 - 12 minutes. At the time of starting, I only had 4 recording sessions available. &lt;/p&gt;

&lt;p&gt;Recordings have the following properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sample rate: 44.1kHz (recommended for compatibility)&lt;/li&gt;
&lt;li&gt;Format: WAV (uncompressed)&lt;/li&gt;
&lt;li&gt;Bit depth: 16-bit minimum&lt;/li&gt;
&lt;li&gt;Channels: Mono sufficient&lt;/li&gt;
&lt;li&gt;Recording duration: Full roast cycle (10-15 minutes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9whxv14zk5lmdb0la5pp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9whxv14zk5lmdb0la5pp.jpg" alt="Roasting session recording" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Annotation and Labeling
&lt;/h2&gt;

&lt;p&gt;When I started, I was intending to do manual annotation using a free and open source audio editor and recording application Audacity. However Warp Agent pointed me towards Label Studio and even provided the configuration snippets and described how to use it.&lt;/p&gt;

&lt;p&gt;With the initial 4 recordings, I have used sparse labels and proceeded to training and evaluation. This has led to overfitting and the results were not reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Initial Results with Sparse Labeling
&lt;/h3&gt;

&lt;p&gt;With only 4 recording sessions and sparse annotation (marking only obvious first crack events), the model showed signs of overfitting:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Validation Accuracy&lt;/td&gt;
&lt;td&gt;100% (epochs 2-7)&lt;/td&gt;
&lt;td&gt;Perfect scores = memorisation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training Accuracy&lt;/td&gt;
&lt;td&gt;100% (epochs 3-7)&lt;/td&gt;
&lt;td&gt;No learning after epoch 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test Precision&lt;/td&gt;
&lt;td&gt;75%&lt;/td&gt;
&lt;td&gt;High false positive rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Class Imbalance&lt;/td&gt;
&lt;td&gt;15% / 85%&lt;/td&gt;
&lt;td&gt;Severe imbalance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;: The model memorised the limited training data rather than learning generalisable acoustic features of first crack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The solution&lt;/strong&gt;: Expanding to 9 sessions with balanced annotation (equal first_crack and no_first_crack samples) dramatically improved precision from 75% → 95.2% while maintaining excellent recall.&lt;/p&gt;

&lt;p&gt;As I had increased the recordings to 9, I spent more time annotating and aimed at building a balanced data with enough samples for first crack and no first crack. Each annotated sample was 3 - 6 seconds. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2495dlcmnjc0snqkrt6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2495dlcmnjc0snqkrt6r.png" alt="annotation example" width="720" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Warp Agent Mode has also provided the configuration snippet for Label Studio&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;View&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Header&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;"Coffee Roast First Crack Detection"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Text&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"instructions"&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;"Listen to the audio and mark regions where first crack occurs. Mark other regions as no_first_crack."&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Audio&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"audio"&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;"$audio"&lt;/span&gt; &lt;span class="na"&gt;zoom=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt; &lt;span class="na"&gt;speed=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Labels&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"label"&lt;/span&gt; &lt;span class="na"&gt;toName=&lt;/span&gt;&lt;span class="s"&gt;"audio"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Label&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;"no_first_crack"&lt;/span&gt; &lt;span class="na"&gt;background=&lt;/span&gt;&lt;span class="s"&gt;"#3498db"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Label&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;"first_crack"&lt;/span&gt; &lt;span class="na"&gt;background=&lt;/span&gt;&lt;span class="s"&gt;"#e74c3c"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/Labels&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;TextArea&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"notes"&lt;/span&gt; &lt;span class="na"&gt;toName=&lt;/span&gt;&lt;span class="s"&gt;"audio"&lt;/span&gt; 
            &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Optional notes about this region (e.g., 'very clear pops', 'subtle', etc.)"&lt;/span&gt;
            &lt;span class="na"&gt;editable=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/View&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Data Preprocessing Pipeline
&lt;/h2&gt;

&lt;p&gt;Coffee roasting is driven by many variables ranging from ambient temperature to the bean type, machine, Heating type and so on. A basic electric roaster like the one used here is slow to respond to change in controls as the heating element needs to warm up and cool down depending on the command. Accurately identifying the current phase of the roast is crucial and this can be done by audio analysis, visual, a combination of time and temperature to varying degree of success. In my manual roasts, recently I have been getting better results by adjusting the parameters once first crack is reached and therefore decided to fine tune a model to detect these. &lt;/p&gt;

&lt;p&gt;So given we have a microphone pointing at the roaster during roasting process and a relatively controlled environment, how do we get the recording and convert it into the format needed to support our fine-tuning process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenges:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Raw audio files are captured from multiple roasting sessions of varying length.

&lt;ul&gt;
&lt;li&gt;Additionally ~ 10 previously recorded sessions lost accidentally.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;First crack events are sparse. Happens around about 12-25% of the whole duration. And they are also not continuous.

&lt;ul&gt;
&lt;li&gt;This leads to an imbalance in samples.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;We need a workflow and a pipeline to process these and end up with a balance dates for training evaluation and test.&lt;/li&gt;

&lt;li&gt;At the beginning we also have a limited number of sessions recorded (9 at the time of writing)&lt;/li&gt;

&lt;li&gt;Labelling should be easy and repeatable to avoid user errors.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Labelling Process
&lt;/h3&gt;

&lt;p&gt;While the fine tuning approach, and the base model was instructed to Warp, Label Studio was not in the original requirements. And Warp has not only recommended using Label Studio but also provided detailed steps about running and configuring and get going. These worked out of the box.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────┐
│                    Label Studio (Web UI)                        │
│          Manually annotate audio files                          │
│          Mark "first crack" / "not first crack" time regions    |
└────────────────────────┬────────────────────────────────────────┘
                         │
                         │ Export JSON
                         ▼
        📄 project-1-at-2025-10-18-20-44-9bc9cd1d.json
                         │
                         │
    ╔════════════════════▼═══════════════════════════════════════╗
    ║  STEP 1: convert_labelstudio_export.py                     ║
    ║  • Strip hash prefixes from filenames                      ║
    ║  • Compute audio durations                                 ║
    ║  • Extract labeled time regions from the raw files         ║
    ║  • Output one JSON per audio file                          ║
    ╚════════════════════╦═══════════════════════════════════════╝
                         │
                         ▼
              📁 data/labels/
              ├── roast-1.json
              ├── roast-2.json
              └── roast-3.json
                         │
                         │
    ╔════════════════════▼═══════════════════════════════════════╗
    ║  STEP 2: audio_processor.py                                ║
    ║  • Read annotation JSONs                                   ║
    ║  • Load raw audio files (44.1kHz mono)                     ║
    ║  • Extract time segments (start→end)                       ║
    ║  • Save chunks as WAV files by label                       ║
    ║  • Generate processing_summary.md                          ║
    ╚════════════════════╦═══════════════════════════════════════╝
                         │
                         ▼
              📁 data/processed/
              ├── first_crack/
              │   ├── roast-1_chunk_000.wav
              │   └── roast-1_chunk_001.wav
              └── no_first_crack/
                  ├── roast-1_chunk_002.wav
                  └── roast-2_chunk_000.wav
                         │
                         │
    ╔════════════════════▼═══════════════════════════════════════╗
    ║  STEP 3: dataset_splitter.py                               ║
    ║  • Collect all chunks by label                             ║
    ║  • Train, validation and test split                        ║
    ║    (70% train, 15% val, 15% test)                          ║ 
    ║  • Copy files to split directories                         ║
    ║  • Generate split_report.md                                ║
    ╚════════════════════╦═══════════════════════════════════════╝
                         │
                         ▼
              📁 data/splits/
              ├── train/     (70%)
              │   ├── first_crack/
              │   └── no_first_crack/
              ├── val/       (15%)
              │   ├── first_crack/
              │   └── no_first_crack/
              └── test/      (15%)
                  ├── first_crack/
                  └── no_first_crack/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the steps above are complete, we are ready for training and evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dataset Overview
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Total Samples:&lt;/strong&gt; 298 chunks from 9 roasting sessions&lt;/p&gt;

&lt;h3&gt;
  
  
  Overall Class Balance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Class&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Percentage&lt;/th&gt;
&lt;th&gt;Avg Duration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;first_crack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;145&lt;/td&gt;
&lt;td&gt;48.7%&lt;/td&gt;
&lt;td&gt;4.5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;no_first_crack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;153&lt;/td&gt;
&lt;td&gt;51.3%&lt;/td&gt;
&lt;td&gt;4.0s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Split Distribution
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Split&lt;/th&gt;
&lt;th&gt;Total Samples&lt;/th&gt;
&lt;th&gt;first_crack&lt;/th&gt;
&lt;th&gt;no_first_crack&lt;/th&gt;
&lt;th&gt;Split Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Train&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;208&lt;/td&gt;
&lt;td&gt;101 (48.6%)&lt;/td&gt;
&lt;td&gt;107 (51.4%)&lt;/td&gt;
&lt;td&gt;69.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;22 (48.9%)&lt;/td&gt;
&lt;td&gt;23 (51.1%)&lt;/td&gt;
&lt;td&gt;15.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;22 (48.9%)&lt;/td&gt;
&lt;td&gt;23 (51.1%)&lt;/td&gt;
&lt;td&gt;15.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Class Balance Across Splits
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Class&lt;/th&gt;
&lt;th&gt;Train&lt;/th&gt;
&lt;th&gt;Validation&lt;/th&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;first_crack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;145&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;no_first_crack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;107&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;153&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Per-Session Breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Recording Session&lt;/th&gt;
&lt;th&gt;first_crack&lt;/th&gt;
&lt;th&gt;no_first_crack&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;th&gt;Balance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;25-10-19_1103-costarica-hermosa-5&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;48.1% / 51.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25-10-19_1136-brazil-1&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;50.0% / 50.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25-10-19_1204-brazil-2&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;57.1% / 42.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25-10-19_1236-brazil-3&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;51.4% / 48.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25-10-19_1315-brazil4&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;51.7% / 48.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;roast-1-costarica-hermosa-hp-a&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;48.5% / 51.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;roast-2-costarica-hermosa-hp-a&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;45.7% / 54.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;roast-3-costarica-hermosa-hp-a&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;40.6% / 59.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;roast-4-costarica-hermosa-hp-a&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;44.1% / 55.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Observations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nearly balanced dataset (48.7% vs 51.3%)&lt;/li&gt;
&lt;li&gt;Stratified split maintains balance across train/val/test&lt;/li&gt;
&lt;li&gt;9 recording sessions, mix of Costa Rica and Brazil beans&lt;/li&gt;
&lt;li&gt;Average chunk duration: 4.2 seconds&lt;/li&gt;
&lt;li&gt;Total annotated audio: ~21 minutes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Evaluation Metrics
&lt;/h2&gt;

&lt;p&gt;For this binary classification task, we use multiple metrics to evaluate the model performance:&lt;/p&gt;

&lt;h3&gt;
  
  
  Accuracy
&lt;/h3&gt;

&lt;p&gt;The proportion of correct predictions (true positives and true negatives) among all predictions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Accuracy = (TP + TN) / (TP + TN + FP + FN)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This metric provides an overall sense of model correctness. However, accuracy alone can be misleading with imbalanced datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Precision
&lt;/h3&gt;

&lt;p&gt;Of all samples predicted as &lt;code&gt;first_crack&lt;/code&gt;, what proportion actually were first crack events?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Precision = TP / (TP + FP)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;High precision means fewer false alarms. Critical when we don't want to prematurely adjust roaster settings based on incorrect detections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recall (Sensitivity)
&lt;/h3&gt;

&lt;p&gt;Of all actual &lt;code&gt;first_crack&lt;/code&gt; events, what proportion did the model correctly identify?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recall = TP / (TP + FN)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;High recall means we catch most first crack events. Missing first crack (false negative) is likely to result in over-roasting.&lt;/p&gt;

&lt;h3&gt;
  
  
  F1 Score
&lt;/h3&gt;

&lt;p&gt;The harmonic mean of precision and recall, providing a single balanced metric.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;F1 = 2 × (Precision × Recall) / (Precision + Recall)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Balances precision and recall. Useful when both false positives and false negatives are costly.&lt;br&gt;
In case of roasting, these could mean under roasted or dark roast which is not desirable from this project perspective.&lt;/p&gt;
&lt;h3&gt;
  
  
  ROC-AUC (Area Under the Receiver Operating Characteristic Curve)
&lt;/h3&gt;

&lt;p&gt;Measures the model's ability to distinguish between classes across all classification thresholds.&lt;/p&gt;
&lt;h3&gt;
  
  
  Confusion Matrix
&lt;/h3&gt;

&lt;p&gt;The confusion matrix visualises the model's predictions versus actual labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    Predicted
                    first_crack  no_first_crack
Actual  first_crack      TP            FN
        no_first_crack   FP            TN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TP (True Positive):&lt;/strong&gt; Correctly predicted first crack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TN (True Negative):&lt;/strong&gt; Correctly predicted no first crack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FP (False Positive):&lt;/strong&gt; Predicted first crack, but was actually no first crack (false alarm)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FN (False Negative):&lt;/strong&gt; Predicted no first crack, but was actually first crack (missed detection)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Training and Evaluation
&lt;/h2&gt;

&lt;p&gt;With our dataset properly split and balanced and or metrics defined, we're ready to fine-tune the Audio Spectrogram Transformer (AST) model for first crack detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Architecture
&lt;/h3&gt;

&lt;p&gt;The project uses MIT's pre-trained AST model (&lt;code&gt;MIT/ast-finetuned-audioset-10-10-0.4593&lt;/code&gt;) from Hugging Face, which was originally trained on AudioSet. The model architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt;: Audio spectrograms (16kHz, 10-second windows)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture&lt;/strong&gt;: Vision Transformer adapted for audio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transfer Learning&lt;/strong&gt;: We keep the pre-trained weights and fine-tune for binary classification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Two classes - &lt;code&gt;first_crack&lt;/code&gt; vs &lt;code&gt;no_first_crack&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Training Configuration
&lt;/h3&gt;

&lt;p&gt;The training process uses the following configuration (defined in &lt;code&gt;models/config.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TRAINING_CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;batch_size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_epochs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mps&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Apple Silicon GPU
&lt;/span&gt;    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sample_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;target_length_sec&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key training features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Class-weighted loss&lt;/strong&gt;: Addresses class imbalance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AdamW optimizer&lt;/strong&gt;: With cosine annealing learning rate schedule&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Early stopping&lt;/strong&gt;: Based on validation F1 score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TensorBoard logging&lt;/strong&gt;: Real-time metrics visualization&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Training Process
&lt;/h3&gt;

&lt;p&gt;To start training:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./venv/bin/python src/training/train.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data-dir&lt;/span&gt; data/splits &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--experiment-name&lt;/span&gt; baseline_v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The training script:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Loads train/val data using &lt;code&gt;AudioDataset&lt;/code&gt; (automatic resampling to 16kHz)&lt;/li&gt;
&lt;li&gt;Applies class weights to handle imbalance&lt;/li&gt;
&lt;li&gt;Trains with early stopping (patience: 10 epochs)&lt;/li&gt;
&lt;li&gt;Saves best model based on validation F1 score&lt;/li&gt;
&lt;li&gt;Writes checkpoints to &lt;code&gt;experiments/runs/&amp;lt;experiment_name&amp;gt;/&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Results: Exceeding Expectations
&lt;/h3&gt;

&lt;p&gt;With only &lt;strong&gt;9 recording sessions&lt;/strong&gt; (~21 minutes of annotated audio):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Baseline (Random)&lt;/th&gt;
&lt;th&gt;Our Model&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;50.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;93.3%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+86.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Precision&lt;/td&gt;
&lt;td&gt;50.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;95.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+90.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall&lt;/td&gt;
&lt;td&gt;50.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;90.9%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+81.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;F1 Score&lt;/td&gt;
&lt;td&gt;50.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;93.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+86.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ROC-AUC&lt;/td&gt;
&lt;td&gt;0.50&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.986&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+97.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Translation&lt;/strong&gt;: The model correctly identifies first crack 93 times out of 100, &lt;br&gt;
with only 1 false alarm and 2 missed detections across the test set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Confusion Matrix
                    Predicted
                    no_first_crack  first_crack
Actual  no_first_crack     22            1
        first_crack         2           20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is excellent performance for a model trained on just 9 recording sessions! The higher overlap (70% vs previous experiments) likely contributed to the improved results. This demonstrates the power of transfer learning with pre-trained audio models.&lt;/p&gt;

&lt;p&gt;Performance breakdown:&lt;br&gt;
•  Only 1 false alarm (FP) - down from 2&lt;br&gt;
•  Only 2 missed detections (FN) - same as before&lt;br&gt;
•  22/23 correct no_first_crack predictions (95.7%)&lt;br&gt;
•  20/22 correct first_crack predictions (90.9%)&lt;/p&gt;

&lt;p&gt;This balanced performance is crucial for real-time roasting control where both missing first crack and triggering false adjustments have consequences.&lt;/p&gt;
&lt;h3&gt;
  
  
  Evaluation on Test Set
&lt;/h3&gt;

&lt;p&gt;To evaluate the final model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./venv/bin/python src/training/evaluate.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--checkpoint&lt;/span&gt; experiments/final_model/model.pt &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--test-dir&lt;/span&gt; data/splits/test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This generates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Classification report with per-class metrics&lt;/li&gt;
&lt;li&gt;Confusion matrix visualization&lt;/li&gt;
&lt;li&gt;ROC curve analysis&lt;/li&gt;
&lt;li&gt;Detailed results saved to text files&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Learnings
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What Worked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transfer learning from AudioSet significantly reduced data requirements&lt;/li&gt;
&lt;li&gt;Balanced annotation (equal first_crack/no_first_crack samples) improved performance&lt;/li&gt;
&lt;li&gt;10-second windows captured enough context for accurate detection&lt;/li&gt;
&lt;li&gt;Class-weighted loss handled remaining imbalance effectively&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Challenges:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial sparse labelling with only 4 sessions led to overfitting&lt;/li&gt;
&lt;li&gt;Limited training data (9 sessions) required careful annotation strategy&lt;/li&gt;
&lt;li&gt;Environmental noise kept to a minimum under a controlled environment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Future Improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collect more diverse roasting sessions (different beans, temperatures, extractor configuration)&lt;/li&gt;
&lt;li&gt;Experiment with data augmentation (time stretching, pitch shifting)&lt;/li&gt;
&lt;li&gt;Test shorter inference windows for faster real-time detection&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-Time Inference
&lt;/h2&gt;

&lt;p&gt;The trained model can now detect first crack in real-time from either audio files or live microphone input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# File-based detection&lt;/span&gt;
./venv/bin/python src/inference/first_crack_detector.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--audio&lt;/span&gt; data/raw/roast-1.wav &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--checkpoint&lt;/span&gt; experiments/final_model/model.pt

&lt;span class="c"&gt;# Live microphone detection&lt;/span&gt;
./venv/bin/python src/inference/first_crack_detector.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--microphone&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--checkpoint&lt;/span&gt; experiments/final_model/model.pt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The detector uses sliding window inference with "pop-confirmation" logic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzes 10-second audio windows with 70% overlap (3-second hop between windows)&lt;/li&gt;
&lt;li&gt;Requires minimum of 3 positive detections (pops) within a 30-second confirmation window&lt;/li&gt;
&lt;li&gt;Maintains detection history to filter false positives&lt;/li&gt;
&lt;li&gt;Returns timestamp in MM:SS format when first crack is confirmed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This forms the foundation for Part 2, where we'll wrap this detector in an MCP server for integration with AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-Time Performance
&lt;/h3&gt;

&lt;p&gt;Hardware: Apple M3 Max (MPS - Metal Performance Shaders)&lt;/p&gt;

&lt;p&gt;Speed Metrics&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Real-Time Factor (RTF)&lt;/td&gt;
&lt;td&gt;87.64x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-window Latency&lt;/td&gt;
&lt;td&gt;70-90ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;~18 windows/second&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Processing Speed&lt;/td&gt;
&lt;td&gt;1 hour of audio in ~41 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Batch Inference Results&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;Processing Time&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;roast-1&lt;/td&gt;
&lt;td&gt;10:39 (639.7s)&lt;/td&gt;
&lt;td&gt;7.67s&lt;/td&gt;
&lt;td&gt;83.46x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;roast-2&lt;/td&gt;
&lt;td&gt;10:16 (616.6s)&lt;/td&gt;
&lt;td&gt;6.92s&lt;/td&gt;
&lt;td&gt;89.06x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;roast-3&lt;/td&gt;
&lt;td&gt;10:25 (625.6s)&lt;/td&gt;
&lt;td&gt;7.05s&lt;/td&gt;
&lt;td&gt;88.74x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;roast-4&lt;/td&gt;
&lt;td&gt;9:44 (584.8s)&lt;/td&gt;
&lt;td&gt;6.55s&lt;/td&gt;
&lt;td&gt;89.29x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total&lt;/td&gt;
&lt;td&gt;41.1 min&lt;/td&gt;
&lt;td&gt;28.2s&lt;/td&gt;
&lt;td&gt;87.64x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Latency Breakdown (per 10s window)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Audio loading: 1-2ms&lt;/li&gt;
&lt;li&gt; Feature extraction: 20-30ms&lt;/li&gt;
&lt;li&gt; Model inference: 50-60ms&lt;/li&gt;
&lt;li&gt; Total: ~70-90ms per window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Resource Usage&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; CPU Usage: 5-10% during inference&lt;/li&gt;
&lt;li&gt; Memory: ~1.5GB for model + 100MB working&lt;/li&gt;
&lt;li&gt; GPU Memory: ~2GB on MPS&lt;/li&gt;
&lt;li&gt; Latency overhead: 0.9% (90ms used / 10,000ms available)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key Insight: The model processes audio 87x faster than real-time, providing a 111x headroom for real-time streaming detection. A 10-minute roast is fully processed in just ~7 seconds, making real-time monitoring easily achievable even with additional processing overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Warp Agent Advantage
&lt;/h2&gt;

&lt;p&gt;Throughout this project, Warp's Agent Mode was instrumental in:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rapid Prototyping&lt;/strong&gt; - From idea to working pipeline in hours, not days&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Best Practice Guidance&lt;/strong&gt; - Suggested Label Studio and evaluation workflows&lt;br&gt;
&lt;strong&gt;Code Generation&lt;/strong&gt; - Created complete scripts for data processing, training, and inference&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Iterative Refinement&lt;/strong&gt; - Helped debug overfitting issues and improve annotation strategy&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Documentation&lt;/strong&gt; - Generated summaries, reports, and README documentation automatically&lt;/p&gt;

&lt;p&gt;The development workflow felt more like pair programming with an engineer who knew PyTorch and audio processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;The first crack detector is working well, but it's just the beginning.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Part 2: Building MCP Servers for Coffee Roasting&lt;/strong&gt;, we'll:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrap the detector in an MCP server for real-time streaming&lt;/li&gt;
&lt;li&gt;Build a second MCP server to control the Hottop roaster (heat, fan, cooling)&lt;/li&gt;
&lt;li&gt;Implement authentication and safety controls&lt;/li&gt;
&lt;li&gt;Test end-to-end detection → action loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In &lt;strong&gt;Part 3: Creating an Autonomous Roasting Agent&lt;/strong&gt;, we'll bring it all together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use .NET Aspire to orchestrate multiple MCP servers&lt;/li&gt;
&lt;li&gt;Build AI agents that make real-time roasting decisions&lt;/li&gt;
&lt;li&gt;Implement safety rails and human override&lt;/li&gt;
&lt;li&gt;Roast a batch fully autonomously and compare against manual profiles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The goal&lt;/strong&gt;: Press start, add beans when ready, hand off and observe and enjoy perfectly roasted coffee.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow along on &lt;a href="https://github.com/syamaner/bean-agent" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; or subscribe for Part 2!&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project &amp;amp; Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/syamaner/bean-agent" rel="noopener noreferrer"&gt;Project Repository&lt;/a&gt; - Complete code and documentation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://warp.dev" rel="noopener noreferrer"&gt;Warp Terminal&lt;/a&gt; - AI-assisted development environment&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://labelstud.io/" rel="noopener noreferrer"&gt;Label Studio&lt;/a&gt; - Audio annotation tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Model &amp;amp; ML:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593" rel="noopener noreferrer"&gt;Audio Spectrogram Transformer (AST)&lt;/a&gt; - Pre-trained model&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/transformers/en/model_doc/audio-spectrogram-transformer" rel="noopener noreferrer"&gt;AST Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/fine-tune-the-audio-spectrogram-transformer-with-transformers-73333c9ef717/" rel="noopener noreferrer"&gt;Fine-Tuning AST Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2104.01778" rel="noopener noreferrer"&gt;Original AST Paper&lt;/a&gt; - Gong et al., 2021&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Audio Processing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://librosa.org/" rel="noopener noreferrer"&gt;LibROSA&lt;/a&gt; - Audio analysis library&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pytorch.org/audio/stable/index.html" rel="noopener noreferrer"&gt;PyTorch Audio&lt;/a&gt; - Audio I/O and transforms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Coffee Roasting Context:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://library.sweetmarias.com/first-crack-faq-what-is-first-crack-what-is-second-crack/" rel="noopener noreferrer"&gt;First Crack Explained&lt;/a&gt; - For readers unfamiliar with roasting&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>agents</category>
      <category>warpdev</category>
      <category>python</category>
    </item>
    <item>
      <title>Beyond Basic RAG: Measuring Embedding and Generation Performance with RAGAS</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Sat, 12 Apr 2025 14:33:03 +0000</pubDate>
      <link>https://dev.to/syamaner/beyond-basic-rag-measuring-embedding-and-generation-performance-with-ragas-ddk</link>
      <guid>https://dev.to/syamaner/beyond-basic-rag-measuring-embedding-and-generation-performance-with-ragas-ddk</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the previous post, we looked at a basic Retrieval Augmented Generation (RAG) example using .Net for both retrieval and generation. This is built using out of the box components offered by Semantic Kernel and used an out of the box chunking approach. &lt;/p&gt;

&lt;p&gt;The barriers of entry to achieve this is low which helps to democratise access to Large Language Models (LLMs) in wider ecosystems and drive innovation. For instance; in .Net, it is possible to use &lt;code&gt;Microsoft Semantic Kernel&lt;/code&gt; or &lt;code&gt;Aspire.Azure.AI.OpenAI&lt;/code&gt; (OpenAI, Azure OpenAI as well as compatible local options such as Ollama). There is even an emerging open source .NET port of Langchain with JetBrains being an official supporter. For those who would like to run inference in process (CPU or GPU) without HTTP APIs, there is also LLamaSharp which is a .Net wrapper around llama.cpp supporting CPU and GPU inference.&lt;/p&gt;

&lt;p&gt;However, given that there are many parameters / tweaks to ingestion, retrieval and generation, how can we measure the quality and outcome when building such applications?&lt;/p&gt;

&lt;p&gt;The following Google Trends chart compares search terms LLM (Blue), RAG (green), RAG Evaluation (Red) and langchain (Yellow) between January 2023 and April 2025. &lt;/p&gt;

&lt;p&gt;We observe that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Earlier in 2023, RAG was a more popular search term.&lt;/li&gt;
&lt;li&gt;Around January 2024, LLMs started to takeover in popularity.&lt;/li&gt;
&lt;li&gt;langchain remained in a steady position during the time frame.&lt;/li&gt;
&lt;li&gt;RAG Evaluation has negligible existence in the trends.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given Google is a public and general purpose search engine, the results do not mean there is no interest in evaluation but the general public may not be thinking about these aspects yet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yqw108szrr23y2ctoaw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yqw108szrr23y2ctoaw.png" alt="Comparison of LLM, RAG Evaluation, langchain, RAG Google trends" width="800" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another look from academic papers perspective comparing "RAG" and "RAG Evaluation" yields different results. The publications has grown from 14 papers on "RAG" / 3 papers on "RAG Evaluation" during 2022 to 1041 on "RAG" / 454 on "RAG Evaluation" in 2024 (Source: ArXiv Trends, 2025). As RAG topic becomes mainstream and popular, the evaluation methods also become a popular topic of research.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtt5c783qmefb9ygqir1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtt5c783qmefb9ygqir1.png" alt="ArXiv Trends - RAG vs RAG Evaluation" width="736" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This post will cover the following sections:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RAG Evaluation&lt;/li&gt;
&lt;li&gt;System under Evaluation&lt;/li&gt;
&lt;li&gt;RAGAS&lt;/li&gt;
&lt;li&gt;Evaluation Approach&lt;/li&gt;
&lt;li&gt;Results&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  RAG Evaluation
&lt;/h2&gt;

&lt;p&gt;Evaluating Retrieval-Augmented Generation (RAG) systems is a crucial aspect of solutions that incorporate such technologies.&lt;/p&gt;

&lt;p&gt;Unlike traditional software where testing involves a deterministic process (given the input, we know the expected outcome), RAG outputs depend on two probabilistic / non deterministic components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval accuracy&lt;/strong&gt; (finding relevant source data, rewriting user query, and similar approaches)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation quality&lt;/strong&gt; (producing coherent, factual responses)&lt;/li&gt;
&lt;li&gt;Variation in ingestion and generation: Chunking strategies, using metadata or not, tweaking / versioning prompts or inference parameters.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without systematic evaluation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How can we tell the difference in results from random noise?
&lt;/li&gt;
&lt;li&gt;Hallucinations and irrelevant answers can go undetected.
&lt;/li&gt;
&lt;li&gt;Optimisation by guesswork: Let me change this parameter and see.&lt;/li&gt;
&lt;li&gt;How do we deal with regressions?&lt;/li&gt;
&lt;li&gt;Cost / benefits. 

&lt;ul&gt;
&lt;li&gt;Given runtime costs include input / output tokens, for production applications, these are also crucial metrics. &lt;/li&gt;
&lt;li&gt;If these are not included evaluation and comparison, a well performing system that might be too expensive to run could be the end result. Given this is a hobby project, this step is excluded from the current experiments.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;There are established data sets and benchmarks for Retrieval Augmented Generation systems. These benchmarks typically include datasets for tests including the query, expected answer, expected context. These are then used against the RAG system under test to evaluate the results using various metrics as defined below. &lt;code&gt;Google Frames Benchmark&lt;/code&gt; is one example that provides dataset based on Wikipedia Articles to evaluate. metrics such as factuality, retrieval accuracy, and reasoning.&lt;/p&gt;

&lt;p&gt;These approaches introduce some challenges as following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Such datasets are generic and do not necessarily factor domain specific nuances in the target use case.&lt;/li&gt;
&lt;li&gt;It is possible that the test data might have been included in the training data set.&lt;/li&gt;
&lt;li&gt;There can be bias agains specific metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this post, we will focus on how to measure both retrieval performance (e.g., Context Precision) and generation quality (e.g., faithfulness, semantic similarity) using RAGAS evaluation framework.&lt;/p&gt;

&lt;p&gt;We will be utilising RAGAS and LLM as judge approach to generate evaluation data from our documents and then run evaluation using Jupyter Notebooks running on Aspire to see the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  System under Evaluation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;We are ingesting Markdown documentation from official Microsoft .NET Aspire repository. &lt;/li&gt;
&lt;li&gt;Using Semantic Kernel for ingestion and a very simple chunking approach.&lt;/li&gt;
&lt;li&gt;Using Semantic Kernel for search.&lt;/li&gt;
&lt;li&gt;We register a dedicated Qdrant Vector store for each embedding model we use for evaluation. We also register an embedding model with semantic kernel for each model we will evaluate.&lt;/li&gt;
&lt;li&gt;Lastly we also register chat completion models for each LLM we are evaluating using model name as key.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach allows us to use correct vector store at runtime for ingestion and retrieval depending on the request parameters used. This also ensures the request can select the LLMs for generation aspect when we are running evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c47ynpi271ewa8qlfoe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c47ynpi271ewa8qlfoe.png" alt="Sytem Overview" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Ingestion and Query
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxbtgqp56xim3eqil34j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxbtgqp56xim3eqil34j.png" alt="Ingestion and Query" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  RAGAS
&lt;/h2&gt;

&lt;p&gt;Ragas is one of the libraries that simplify the evaluation of Large Language Model (LLM) applications. It provides the necessary tools to generate test data as well as evaluate the results using various approaches and metrics. &lt;/p&gt;

&lt;h3&gt;
  
  
  RAGAS Metrics
&lt;/h3&gt;

&lt;p&gt;In this section a brief overview of the metrics used in this post will be provided. For more details, please refer to RAGA Metrics documentation.&lt;/p&gt;

&lt;p&gt;The following metrics are summarised from official &lt;a href="https://docs.ragas.io/en/latest/concepts/metrics/" rel="noopener noreferrer"&gt;RAGAS documentation metrics section&lt;/a&gt;. &lt;/p&gt;

&lt;h4&gt;
  
  
  Semantic Similarity
&lt;/h4&gt;

&lt;p&gt;Measures the similarity between the answer from the LLM and the reference answer in the test dataset.&lt;/p&gt;

&lt;p&gt;Starts with the answer embeddings and the reference embeddings. Then computes &lt;a href="https://en.wikipedia.org/wiki/Cosine_similarity" rel="noopener noreferrer"&gt;cosine similarity&lt;/a&gt; between the two vectors. &lt;/p&gt;

&lt;h4&gt;
  
  
  Answer Relevancy
&lt;/h4&gt;

&lt;p&gt;Answer relevancy measures how relevant the response is to the user input. This is calculated as the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using an LLM, and the response under evaluation, generate a set of (3) artificial questions.&lt;/li&gt;
&lt;li&gt;Compute cosine similarity between the embedding of the user input and the embedding of the generated questions.&lt;/li&gt;
&lt;li&gt;Average of the scores will determine the Answer Relevancy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Factual Correctness
&lt;/h4&gt;

&lt;p&gt;Factual Correctness metric is used to evaluate the factual accuracy of the generated response against the reference. This metric uses the LLM to first break down the response and reference into claims and then uses natural language comparison to determine the factual overlap between the response and the reference. This overlap is quantified using precision, recall, and F1 score.&lt;/p&gt;

&lt;p&gt;Precision: Measured number of positive predictions that were correct. This metric is higher when there are low false positive. &lt;/p&gt;

&lt;p&gt;Recall: Measured how many of the actual positives were correctly identified. Recall is higher when false negatives are low.&lt;/p&gt;

&lt;p&gt;F1: When the difference between Precision and Recall is large, F1 score can be used to balance. F1 score will be closer to the lower of two other metrics. F1 can only be high if both precision and recall are high.&lt;/p&gt;

&lt;h4&gt;
  
  
  Faithfulness
&lt;/h4&gt;

&lt;p&gt;Faithfulness metric can be used to measure how factually consistent a response is with the retrieved context.&lt;/p&gt;

&lt;p&gt;A faithful response is a response where all claims included in the response are consistent with the retrieved context from vector store. &lt;/p&gt;

&lt;p&gt;This is a measure that addresses the hallucination detection. If the provided answer can be backed up by context, then it means there are no additions from the generative model.&lt;/p&gt;

&lt;h4&gt;
  
  
  Context Recall
&lt;/h4&gt;

&lt;p&gt;Context recall measures the number of relevant documents retrieved from the vector store. If retrieval ensures that no important information is missed at this stage, then the Context Recall is considered high.&lt;/p&gt;

&lt;p&gt;First, the reference from the evaluation dataset is broken down into claims. Then each claim in the reference answer is analysed to determine whether it can be attributed to the retrieved context or not. Ideally, all claims in the reference answer should be attributable to the retrieved context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation Approach
&lt;/h2&gt;

&lt;p&gt;Our evaluation approach involves using multiple embedding and generation models. So the process goes as following:&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/syamaner/moonbeans/blob/bulk-performance_evaluation/src/AspireRagDemo.AppHost/Jupyter/Notebooks/gpt-4o_ReducedAspireDocs_100.csv" rel="noopener noreferrer"&gt;Full evaluation dataset&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Embedding models:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mxbai-embed-large&lt;/code&gt; (335M parameters, Ollama local)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;text-embedding-3-large&lt;/code&gt; OpenAI&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;For each embedding model, we evaluate using the following generative models:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;phi4&lt;/code&gt; (14B parameters)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;chatgpt-4o-latest&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;qwen2.5:32b&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deepseek-r1&lt;/code&gt; (7B parameters)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;llama3.2&lt;/code&gt; (3B parameters)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;llama3.3&lt;/code&gt; (70b parameter)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;deepseek-r1:70b&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma3:12b&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mistral-small3.1&lt;/code&gt; (24B parameters)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gemma3&lt;/code&gt; (4B parameters)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma3:27b&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;We then save evaluation results a &lt;a href="https://github.com/syamaner/moonbeans/blob/bulk-performance_evaluation/src/AspireRagDemo.AppHost/Jupyter/Notebooks/evaluation_results.csv" rel="noopener noreferrer"&gt;csv file&lt;/a&gt;.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test Data Generation (RAGAS, GPT-4o, Jupyter Notebooks)
&lt;/h3&gt;

&lt;p&gt;The selected approach is LLM As a judge and we use RAGAS to generate test dataset from our documents (.NET Aspire documentation)&lt;/p&gt;

&lt;p&gt;We use &lt;code&gt;TestsetGenerator&lt;/code&gt; class from RAGAS as documented in basic usage. The steps are: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load our documents.&lt;/li&gt;
&lt;li&gt;Define the personas to be used for generating queries (technical, novice, expert, ...)&lt;/li&gt;
&lt;li&gt;Declare the distribution for question types (simple, complex or reasoning)&lt;/li&gt;
&lt;li&gt;Generate test datasets.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We are using &lt;code&gt;GPT-4o&lt;/code&gt; for generative model and &lt;code&gt;text-embedding-ada-002&lt;/code&gt; (default) for the embedding model. This is based on the assumption that using state of the art models will provider better quality test data generation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# Initialise personas and generator
&lt;/span&gt;
&lt;span class="c1"&gt;#https://docs.ragas.io/en/stable/howtos/customizations/testgenerator/_persona_generator/#personas-in-testset-generation
&lt;/span&gt;
&lt;span class="n"&gt;personas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;Persona&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Technical Analyst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;role_description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Focuses on detailed system specifications and API documentation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Persona&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Novice User&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;role_description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Asks simple questions using layman terms and basic functionality&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Persona&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Security Auditor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;role_description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Focuses on compliance, data protection, and access control aspects&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Persona&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Docker expert&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;role_description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Has in depth experience with Docker and DSocker compose and expert at cloud native concepts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;generator_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LangchainLLMWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openai_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  
&lt;span class="n"&gt;generator_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LangchainEmbeddingsWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TestsetGenerator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generator_llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generator_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;persona_list&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;personas&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialise query distribution and generate dataset.
&lt;/span&gt;
&lt;span class="c1"&gt;# https://docs.ragas.io/en/stable/references/synthesizers/
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ragas.testset.synthesizers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;SingleHopSpecificQuerySynthesizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MultiHopAbstractQuerySynthesizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MultiHopSpecificQuerySynthesizer&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;query_distribution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SingleHopSpecificQuerySynthesizer&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# Simple questions
&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MultiHopSpecificQuerySynthesizer&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   &lt;span class="c1"&gt;# Complex questions
&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MultiHopAbstractQuerySynthesizer&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Reasoning questions
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_with_langchain_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;testset_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query_distribution&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_distribution&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generated test dataset contains the following columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user_input : The question we will pass to our RAG system.&lt;/li&gt;
&lt;li&gt;reference_contexts: The reference context that would be retrieved in the ideal case for the given question.&lt;/li&gt;
&lt;li&gt;reference: The ideal response to the user input.&lt;/li&gt;
&lt;li&gt;synthesizer_name: The type of synthesiser used to generate the row. (single hop, multiple hop abstract, multi hop specific.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  SingleHopSpecificQuerySynthesizer
&lt;/h4&gt;

&lt;p&gt;RAGAS uses a knowledge graph based approach to passing the documents to create test dataset. A single hop specific query synthesiser will use only one node from the graph(headlines or key phrases) to generate query. Single hop in this context means using a single node from a knowledge graph built from the input document. &lt;/p&gt;

&lt;p&gt;Example question: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"How does .NET Aspire manage launch profiles for ASP.NET Core service projects?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  MultiHopSpecificQuerySynthesizer
&lt;/h4&gt;

&lt;p&gt;Similar to previous synthesiser, this also uses specific properties. However this is achieved by using multiple chunks that overlap with each other. These would require the retrieval process to be able to retrieve multiple documents or sections to build the expected context.&lt;/p&gt;

&lt;p&gt;Example question: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How does the Azure SDK impact the ability to run Azure services locally in containers and provision infrastructure using .NET Aspire?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  MultiHopAbstractQuerySynthesizer
&lt;/h4&gt;

&lt;p&gt;Intends to provide generalised (abstract) queries using multiple notes of the document.&lt;/p&gt;

&lt;p&gt;Example question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How can a custom command be created and tested in .NET Aspire to clear the cache of a Redis resource?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64okvnr4z591xfb2f20f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64okvnr4z591xfb2f20f.png" alt="RAGAS generated dataset using OpenAI" width="800" height="155"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Generated test data available via &lt;a href="https://github.com/syamaner/moonbeans/blob/bulk-performance_evaluation/src/AspireRagDemo.AppHost/Jupyter/Notebooks/gpt-4o_ReducedAspireDocs_100.csv" rel="noopener noreferrer"&gt;Github Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The notebook to generate the test data is also accessible via &lt;a href="https://github.com/syamaner/moonbeans/blob/bulk-performance_evaluation/src/AspireRagDemo.AppHost/Jupyter/Notebooks/generate_eval_data.ipynb" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG Pipeline (.NET)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ingestion: 

&lt;ul&gt;
&lt;li&gt;For each Embedding model:&lt;/li&gt;
&lt;li&gt;Create a vector store for persistence.&lt;/li&gt;
&lt;li&gt;Generate embeddings using the current Embedding model and add them to the given Vector store matching the embedding model name.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Retrieval and Generation

&lt;ul&gt;
&lt;li&gt;Request specifies embedding model and generation model&lt;/li&gt;
&lt;li&gt;Retrieve using the vector store named after the requested embedding model&lt;/li&gt;
&lt;li&gt;Use the contest with the desired Generative model from the request. &lt;/li&gt;
&lt;li&gt;Semantic Kernel simplified this by allowing keyed registration support multiple embedding and chat models via .NET Dependency Injection. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The vector stores for embedding models can be seen below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fasi4hfy7fxjysjukxtn9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fasi4hfy7fxjysjukxtn9.png" alt="Vector stores for given embedding models in Qdrant" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluation Run
&lt;/h3&gt;

&lt;p&gt;This is achieved using Python as RAGAS is a Python library. Evaluation is performed as following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pick n (5) random entries from eval dataset&lt;/li&gt;
&lt;li&gt;For each embedding model

&lt;ul&gt;
&lt;li&gt;For each generative model&lt;/li&gt;
&lt;li&gt;Call our API to get embeddings (context) using vector search&lt;/li&gt;
&lt;li&gt;Call our API to run RAG query and return answer.&lt;/li&gt;
&lt;li&gt;Set the &lt;code&gt;retrieved_contexts&lt;/code&gt; and &lt;code&gt;response&lt;/code&gt; in the eval dataset.&lt;/li&gt;
&lt;li&gt;Generate the data and run evaluation.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;RAG Evaluation Dataset Structure&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Column Name&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;user_input&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Generated question used to query the system&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;reference_contexts&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Reference context documents generated prior to evaluation&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;retrieved_contexts&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Actual context documents returned from the API during runtime&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;reference&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Reference answer generated prior to evaluation (ground truth)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;response&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Actual response generated by the RAG system during evaluation&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;embedding_model&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;The embedding model used for retrieval (e.g., text-embedding-3-large)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;chat_model&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;The generative model used to produce the final response (e.g., ChatGPT-4o)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;code example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Retrieve context for evaluation (vector search) using eval input from the current dataset row.
&lt;/span&gt;
&lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/vector-search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddingModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# RAG query using the eval query, current embedding model and current generative model
&lt;/span&gt;
&lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/chat-with-context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddingModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chat_model&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full code available at GitHub Repository](&lt;a href="https://github.com/syamaner/moonbeans/blob/bulk-performance_evaluation/src/AspireRagDemo.AppHost/Jupyter/Notebooks/evaluation.ipynb" rel="noopener noreferrer"&gt;https://github.com/syamaner/moonbeans/blob/bulk-performance_evaluation/src/AspireRagDemo.AppHost/Jupyter/Notebooks/evaluation.ipynb&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Embedding and LLM Model Performance Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Edit 14/04/2025 - Results for the full dataset with 101 questions and answers&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As seen below, when using the full evaluation dataset (101 questions and answers), top performers for faithfulness are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;text-embedding-3-large + gemma3: 80.42%&lt;/li&gt;
&lt;li&gt;mxbai-embed-large + chatgpt-4o-latest: 79.37%&lt;/li&gt;
&lt;li&gt;text-embedding-3-large + phi4: 79.28%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In above cases, we have at least one component that is not open source. However, this also means we can optimise our choices on models if needed finding a hybrid approach.&lt;/p&gt;

&lt;p&gt;Considering top performers for semantic similarity, OpenAI embeddings come on top. However, open source chat models can be competitive when combined with &lt;code&gt;text-embedding-3-large&lt;/code&gt; as seen below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;text-embedding-3-large + gemma3:12b: 93.68%&lt;/li&gt;
&lt;li&gt;text-embedding-3-large + deepseek-r1:70b: 93.64%&lt;/li&gt;
&lt;li&gt;text-embedding-3-large + phi4: 93.59%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Observations&lt;/p&gt;

&lt;p&gt;text-embedding-3-large generally achieves higher semantic similarity scores compared to mxbai-embed-large&lt;br&gt;
The combination of embedding model and chat model significantly impacts performance metrics&lt;br&gt;
Larger models don't always outperform their smaller counterparts (e.g., gemma3:12b vs gemma3:27b)&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Embedding Model&lt;/th&gt;
&lt;th&gt;Chat Model&lt;/th&gt;
&lt;th&gt;Faithfulness Score&lt;/th&gt;
&lt;th&gt;Semantic Similarity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;gemma3&lt;/td&gt;
&lt;td&gt;80.42%&lt;/td&gt;
&lt;td&gt;93.41%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;chatgpt-4o-latest&lt;/td&gt;
&lt;td&gt;79.37%&lt;/td&gt;
&lt;td&gt;92.71%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;phi4&lt;/td&gt;
&lt;td&gt;79.28%&lt;/td&gt;
&lt;td&gt;93.59%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;deepseek-r1&lt;/td&gt;
&lt;td&gt;78.62%&lt;/td&gt;
&lt;td&gt;92.42%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;qwen2.5:32b&lt;/td&gt;
&lt;td&gt;78.26%&lt;/td&gt;
&lt;td&gt;93.00%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;gemma3:12b&lt;/td&gt;
&lt;td&gt;78.21%&lt;/td&gt;
&lt;td&gt;93.68%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;llama3.2&lt;/td&gt;
&lt;td&gt;77.85%&lt;/td&gt;
&lt;td&gt;93.46%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;gemma3&lt;/td&gt;
&lt;td&gt;77.83%&lt;/td&gt;
&lt;td&gt;92.22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;gemma3:12b&lt;/td&gt;
&lt;td&gt;77.73%&lt;/td&gt;
&lt;td&gt;92.98%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;llama3.3&lt;/td&gt;
&lt;td&gt;77.64%&lt;/td&gt;
&lt;td&gt;93.58%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;gemma3:27b&lt;/td&gt;
&lt;td&gt;76.63%&lt;/td&gt;
&lt;td&gt;92.44%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;llama3.3&lt;/td&gt;
&lt;td&gt;76.59%&lt;/td&gt;
&lt;td&gt;92.91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;deepseek-r1:70b&lt;/td&gt;
&lt;td&gt;76.46%&lt;/td&gt;
&lt;td&gt;93.64%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;mistral-small3.1&lt;/td&gt;
&lt;td&gt;76.40%&lt;/td&gt;
&lt;td&gt;93.01%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;deepseek-r1&lt;/td&gt;
&lt;td&gt;75.76%&lt;/td&gt;
&lt;td&gt;93.18%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;llama3.2&lt;/td&gt;
&lt;td&gt;75.29%&lt;/td&gt;
&lt;td&gt;92.38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;deepseek-r1:70b&lt;/td&gt;
&lt;td&gt;75.19%&lt;/td&gt;
&lt;td&gt;92.60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;chatgpt-4o-latest&lt;/td&gt;
&lt;td&gt;74.56%&lt;/td&gt;
&lt;td&gt;93.56%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;gemma3:27b&lt;/td&gt;
&lt;td&gt;74.53%&lt;/td&gt;
&lt;td&gt;92.83%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;qwen2.5:32b&lt;/td&gt;
&lt;td&gt;74.25%&lt;/td&gt;
&lt;td&gt;93.40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;mistral-small3.1&lt;/td&gt;
&lt;td&gt;74.12%&lt;/td&gt;
&lt;td&gt;93.10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mxbai-embed-large&lt;/td&gt;
&lt;td&gt;phi4&lt;/td&gt;
&lt;td&gt;73.50%&lt;/td&gt;
&lt;td&gt;92.28%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Open source models can be competitive&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;mxbai-embed-large outperformed OpenAI's premium text-embedding-3-large + chatgpt-4o-latest combination in faithfulness metrics.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Small models can be mighty&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Local 3B-parameter models like llama3.2 achieved 93.46% semantic similarity, proving even quantised and smaller models have recently become more powerful.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Hidden Cost of Accuracy&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;OpenAI's top-performing combination (text-embedding-3-large + ChatGPT-4o) had 74.56% faithfulness vs. open source alternatives 80.42% - a critical tradeoff between commercial API provider costs and local precision.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Microsoft is investing on AI with .Net Platform&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We can now build a RAG system and achieve decent performance using nearly out of the box implementation. .NET Aspire takes this even further giving a flexible local development environment where we can mix and match the hosts for local inference as a container, host machine or over local network. &lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Future
&lt;/h2&gt;

&lt;p&gt;The results are based on out of the box code for ingestion, generation as well as test data generation and first step towards establishing a baseline.&lt;/p&gt;

&lt;p&gt;Given the evaluation process uses OpenAI API, it is costly to perform evaluation on a large dataset for hobby purposes. The next steps I would ideally follow are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start experimenting with prompts and versioning.&lt;/li&gt;
&lt;li&gt;Run evaluation again.&lt;/li&gt;
&lt;li&gt;Consider further tests using different chunking, retrieval / reranking strategies and compare the results.&lt;/li&gt;
&lt;li&gt;If using a local llm as a judge proves effective, carry on the experiments using local models on a larger scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/moonbeans/tree/bulk-performance_evaluation" rel="noopener noreferrer"&gt;Sample code repository - bulk-performance_evaluation branch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/datasets/google/frames-benchmark" rel="noopener noreferrer"&gt;Google Frames Benchmark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv-trends.com" rel="noopener noreferrer"&gt;ArXiv Trends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/tryAGI/LangChain" rel="noopener noreferrer"&gt;LangChain C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/microsoft/semantic-kernel" rel="noopener noreferrer"&gt;Semantic Kernel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/azureai/azureai-openai-integration?tabs=dotnet-cli" rel="noopener noreferrer"&gt;Aspire.Azure.AI.OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/SciSharp/LLamaSharp" rel="noopener noreferrer"&gt;LLamaSharp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/ggml-org/llama.cpp" rel="noopener noreferrer"&gt;llama.cpp&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/" rel="noopener noreferrer"&gt;RAGAS - List of available metrics&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>aspire</category>
      <category>dotnet</category>
    </item>
    <item>
      <title>Jupyter AI &amp; .NET Aspire: Building an LLM-Enabled Jupyter Environment</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Mon, 17 Feb 2025 20:58:44 +0000</pubDate>
      <link>https://dev.to/syamaner/jupyter-ai-net-aspire-building-an-llm-enabled-jupyter-environment-59bo</link>
      <guid>https://dev.to/syamaner/jupyter-ai-net-aspire-building-an-llm-enabled-jupyter-environment-59bo</guid>
      <description>&lt;p&gt;In this post, we will cover installing and configuring Jupyter AI with Jupyter while driving configuration from .NET Aspire. The approach documented here provides out of the box solution using Jupyter AI without having to manually set it up.&lt;/p&gt;

&lt;p&gt;What we will cover?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is Jupyter AI?&lt;/li&gt;
&lt;li&gt;Adding Jupyter AI to Jupyter image&lt;/li&gt;
&lt;li&gt;Adding Microsoft.dotnet-interactive Jupyter kernel to add c# support to Jupyter Notebooks&lt;/li&gt;
&lt;li&gt;.NET Aspire Configuration to specify code and embedding models / model providers &lt;/li&gt;
&lt;li&gt;Running the custom image using .NET Aspire&lt;/li&gt;
&lt;li&gt;And a quick Python demo&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Jupyter AI
&lt;/h2&gt;

&lt;p&gt;Jupyter AI is an extension that adds generative AI support to JupyterLab. With this extension it is possible to use the chat interface to ask about our code in Jupyter Notebooks, the source files and documentation accessible. It can also be used to generate code and inject to a cell. &lt;/p&gt;

&lt;p&gt;To get Jupyter AI working the following steps are necessary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install and activate the extension.&lt;/li&gt;
&lt;li&gt;Configure the embedding and language model (model name, endpoint, api keys if needed)&lt;/li&gt;
&lt;li&gt;Access the extension and interact with it&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installing Jupyter AI and .NET Interactive kernel using a custom Dockerfile
&lt;/h2&gt;

&lt;p&gt;In this section we will cover the Dockerfile used as well as the configuration via custom entry point script.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the Dockerfile
&lt;/h3&gt;

&lt;p&gt;This part is straightforward. We start with an appropriate Jupyter base image and then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install .NET 9&lt;/li&gt;
&lt;li&gt;Install Python dependencies using &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Jupyter/requirements.txt" rel="noopener noreferrer"&gt;requirements.txt&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install Microsoft.dotnet-interactive (so we can install .Net Interactive Kernel)&lt;/li&gt;
&lt;li&gt;Copy our entry point file (run.sh)&lt;/li&gt;
&lt;li&gt;Call entry point as a non root user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes with this minimal &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/python/src/AspireJupyterAI.AppHost/Jupyter/Dockerfile" rel="noopener noreferrer"&gt;Dockerfile&lt;/a&gt; we have JupyterLabs server, Jupyter AI and even .Net Interactive Kernel allowing us using C#, F# and even PowerShell in nor notebooks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; jupyter/base-notebook:ubuntu-22.04&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PYTHONDONTWRITEBYTECODE=1&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install &lt;/span&gt;software-properties-common cmake build-essential  libc6  &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;add-apt-repository ppa:dotnet/backports &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get update &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; dotnet-sdk-9.0  libgl1-mesa-dev  libglib2.0-0 &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get clean &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/cache/apt/archives /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; ${NB_UID}&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;dotnet tool &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; Microsoft.dotnet-interactive
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PATH="${PATH}:/home/jovyan/.dotnet/tools"&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./requirements.txt /home/jovyan/requirements.txt&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /home/jovyan/requirements.txt
&lt;span class="k"&gt;RUN &lt;/span&gt;dotnet interactive jupyter &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./run.sh /home/jovyan/run.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /home/jovyan/run.sh

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; ${NB_UID}&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/home/jovyan/run.sh"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuring Jupyter AI via entry point script
&lt;/h3&gt;

&lt;p&gt;This is achieved by our &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Jupyter/run.sh" rel="noopener noreferrer"&gt;run.sh&lt;/a&gt; file as following:&lt;/p&gt;

&lt;p&gt;We know that .NET Aspire injects the connection strings and additional config as environment variables. So we will utilise this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pass '' as token so we do not have auth for local Jupyter server.&lt;/li&gt;
&lt;li&gt;We know that Jupyter AI extension is configured at startup by passing --AiExtension.* arguments to &lt;code&gt;Jupyter lab&lt;/code&gt; command.

&lt;ul&gt;
&lt;li&gt;We inject relevant --AiExtension.* arguments as passed from our Aspire host using environment variables.&lt;/li&gt;
&lt;li&gt;Pass EMBEDDING_MODEL and CODE_MODEL as language and embedding models using relevant arguments.&lt;/li&gt;
&lt;li&gt;Optionally set Embedding and Language model urls (if not using OpenAI).&lt;/li&gt;
&lt;li&gt;Inject the API Keys (required ig using OpenAI)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Execute the built entry command.
&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;CODEMODELURL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ConnectionStrings__codemodel&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'='&lt;/span&gt; &lt;span class="nt"&gt;-f2&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;';'&lt;/span&gt; &lt;span class="nt"&gt;-f1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;EMBEDDINGMODELURL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ConnectionStrings__embeddingmodel&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'='&lt;/span&gt; &lt;span class="nt"&gt;-f2&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;';'&lt;/span&gt; &lt;span class="nt"&gt;-f1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Base command&lt;/span&gt;
&lt;span class="nv"&gt;CMD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"jupyter lab --NotebookApp.token=''"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"code model: &lt;/span&gt;&lt;span class="nv"&gt;$CODEMODELURL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"embedding model: &lt;/span&gt;&lt;span class="nv"&gt;$EMBEDDINGMODELURL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Add embedding model&lt;/span&gt;
&lt;span class="nv"&gt;CMD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt; --AiExtension.default_embeddings_model=&lt;/span&gt;&lt;span class="nv"&gt;$EMBEDDING_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="c"&gt;# Add code model&lt;/span&gt;
&lt;span class="nv"&gt;CMD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt; --AiExtension.default_language_model=&lt;/span&gt;&lt;span class="nv"&gt;$CODE_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;CMD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt; --AiExtension.default_api_keys='{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;HUGGINGFACEHUB_API_TOKEN&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$HUGGINGFACEHUB_API_TOKEN&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}'"&lt;/span&gt;
&lt;span class="nv"&gt;CMD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt; --AiExtension.default_max_chat_history=12"&lt;/span&gt;
&lt;span class="c"&gt;#,&lt;/span&gt;

&lt;span class="c"&gt;# Add embedding model URL if specified&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EMBEDDINGMODELURL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;CMD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt; --AiExtension.model_parameters &lt;/span&gt;&lt;span class="nv"&gt;$EMBEDDING_MODEL&lt;/span&gt;&lt;span class="s2"&gt;='{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;base_url&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$EMBEDDINGMODELURL&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}'"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Add code model URL if specified&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CODEMODELURL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;CMD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt; --AiExtension.model_parameters &lt;/span&gt;&lt;span class="nv"&gt;$CODE_MODEL&lt;/span&gt;&lt;span class="s2"&gt;='{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;base_url&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$CODEMODELURL&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}'"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Execute the command&lt;/span&gt;
&lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entry command above will provide us the following when we run out Aspire host:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopcz0gd3gm8izsqadrzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopcz0gd3gm8izsqadrzq.png" alt="Jupyter AI configuration" width="413" height="824"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  .NET Aspire configuration and execution
&lt;/h2&gt;

&lt;p&gt;In this section, we will cover the structure of launchSettings.json and the Aspire code putting it all together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;Provided example has 3 profiles as following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;http-ollama-host&lt;/code&gt; : Using Ollama running on host with   &lt;code&gt;ollama:qwen2.5-coder:32b&lt;/code&gt; as code model and &lt;code&gt;ollama:nomic-embed-text&lt;/code&gt; as embedding model. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;http-ollama-local&lt;/code&gt; : &lt;code&gt;ollama:qwen2.5-coder:14b&lt;/code&gt; as code  model and &lt;code&gt;ollama:nomic-embed-text&lt;/code&gt; as embedding model. and &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;http-openai&lt;/code&gt; : &lt;code&gt;openai-chat:chatgpt-4o-latest&lt;/code&gt; as code model and &lt;code&gt;openai:text-embedding-3-large&lt;/code&gt; as embedding model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As an example, the following is how the settings is configured within &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Properties/launchSettings.json" rel="noopener noreferrer"&gt;launchsettings.json&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;"http-ollama-host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="nl"&gt;"environmentVariables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; 
        &lt;/span&gt;&lt;span class="nl"&gt;"CODE_MODEL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama:qwen2.5-coder:32b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"CODE_MODEL_PROVIDER"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OllamaHost"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"EMBEDDING_MODEL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama:nomic-embed-text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"EMBEDDING_MODEL_PROVIDER"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OllamaHost"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"EXTERNAL_OLLAMA_CONNECTION_STRING"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Endpoint=http://host.docker.internal:11434;"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Aspire Code
&lt;/h3&gt;

&lt;p&gt;There is not much new here. We are spinning up a Jupyter container using the Dockerfile and optionally spinning up Ollama if the configuration requires so. For more reference the source file is available &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Program.cs" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Dockerfile is run as a container as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;jupyter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddDockerfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Constants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionStringNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JupyterService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"./Jupyter"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithBuildArg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PORT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;applicationPorts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Constants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionStringNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JupyterService&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithBindMount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"./Jupyter/Notebooks/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/home/jovyan/work"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithHttpEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;targetPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;applicationPorts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Constants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionStringNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JupyterService&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"PORT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithLifetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContainerLifetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithOtlpExporter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OTEL_SERVICE_NAME"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"jupyterdemo"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OTEL_EXPORTER_OTLP_INSECURE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PYTHONUNBUFFERED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"CODE_MODEL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chatConfiguration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CodeModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"EMBEDDING_MODEL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chatConfiguration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Python Demo
&lt;/h2&gt;

&lt;p&gt;For the demo the following use case is considered:&lt;/p&gt;

&lt;p&gt;"Using the coding assistant, write code to extract SIFT features from two images and match them using approximate nearest neighbour approach (ANN). Then guide the assistant to implement RANSAC using Homography to improve the matches and eliminate false positives."&lt;/p&gt;

&lt;p&gt;The initial prompt was straightforward and got a working code without much effort. However this approach is also full of false positives as seen below: &lt;/p&gt;

&lt;h3&gt;
  
  
  Initial Matches
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffygqyj1mfkof909dl2p8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffygqyj1mfkof909dl2p8.png" alt="Initial matches" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From prompt two, we start asking improving matches by RANSAC and things start going wrong. however after a number of prompt and /fix commands, we actually get working code without human interaction&lt;/p&gt;

&lt;h3&gt;
  
  
  Improved matches
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fle6c17ddapwb6229jl63.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fle6c17ddapwb6229jl63.png" alt="Improved matches" width="800" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The conversation can be seen by inspecting the notebook snapshot:&lt;br&gt;
&lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Jupyter/Notebooks/py-qwen-2-5-coder-32b-ollama-host.ipynb" rel="noopener noreferrer"&gt;src/AspireJupyterAI.AppHost/Jupyter/Notebooks/py-qwen-2-5-coder-32b-ollama-host.ipynb&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  C# Demo
&lt;/h2&gt;

&lt;p&gt;Same use case was tried with a &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Jupyter/Notebooks/csharp-openai-gpt-40-latest.ipynb" rel="noopener noreferrer"&gt;C# notebook&lt;/a&gt; however it took GPT-4o to come up with the solution. Given that OpenCV support in .Net is a bit of a niche, it is not surprising the models are not as effective. In addition as we are using .NET interactive in Jupyter, we are also in a niche territory there.&lt;/p&gt;

&lt;p&gt;Here is an example notebook with c# (the prompts that led to the final code are missing unfortunately. &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Jupyter/Notebooks/csharp-openai-gpt-40-latest.ipynb" rel="noopener noreferrer"&gt;csharp-openai-gpt-40-latest.ipynb&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;To get this working on an ARM laptop,  the Dockerfile is also more complicated as we actually build OpenCv and OpenCVSharp as a stage in pour Docker file then copy native libraries and bindings to our final stage. &lt;a href="https://github.com/syamaner/aspire-jupyter-ai/blob/main/src/AspireJupyterAI.AppHost/Jupyter/Dockerfile" rel="noopener noreferrer"&gt;Modified Dockerfile to support OpenCvSharp on ARM&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The changes to this file are adopted from one of my older &lt;a href="https://dev.to/syamaner/docker-multi-architecture-net-60-and-opencvsharp-1okd"&gt;posts - Docker multi-architecture, .NET 6.0 and OpenCVSharp&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/aspire-jupyter-ai" rel="noopener noreferrer"&gt;Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/dotnet/interactive" rel="noopener noreferrer"&gt;.NET Interactive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.jupyter.org/generative-ai-in-jupyter-3f7174824862" rel="noopener noreferrer"&gt;Jupyter AI Post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jupyter-ai.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;Jupyter AI RTD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/jupyterlab/jupyter-ai" rel="noopener noreferrer"&gt;Jupyter AI GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aspire</category>
      <category>ai</category>
      <category>jupyter</category>
      <category>docker</category>
    </item>
    <item>
      <title>Ingesting documents using .NET to build a simple Retrieval Augmented Generation (RAG) system</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Sun, 16 Feb 2025 18:55:12 +0000</pubDate>
      <link>https://dev.to/syamaner/a-simple-approach-for-ingesting-documents-using-net-for-a-simple-retrieval-augmented-generation-47e1</link>
      <guid>https://dev.to/syamaner/a-simple-approach-for-ingesting-documents-using-net-for-a-simple-retrieval-augmented-generation-47e1</guid>
      <description>&lt;p&gt;Here is a quick post summarising how to use .NET Semantic Kernel, Qdrant and .Net to ingest markdown documents. One of the comments &lt;a href="https://dev.to/syamaner/building-a-simple-retrieval-augmented-generation-system-using-net-aspire-4pdp"&gt;a recent post&lt;/a&gt; related to the topic was about why using Python for ingestion instead of .NET. That was a personal preference at the time but also using .NET with Semantic Kernel to ingest documents for a simple pipeline is not necessarily any more work. &lt;/p&gt;

&lt;p&gt;In this post, we will go through the ingestion process utilising high level libraries available to us in .NET ecosystem.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;.NET Semantic Kernel and related connectors for managing vector store&lt;/li&gt;
&lt;li&gt;LangChain .NET for chunking&lt;/li&gt;
&lt;li&gt;.NET Aspire to bring it all together using one of the Inference APIs. (Ollama on host, Ollama as container managed by ASPIRE or OpenAI)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use case
&lt;/h2&gt;

&lt;p&gt;In the Python version, we can either pull the documents from a GitHub Repository or use a file generated by &lt;a href="https://gitingest.com/" rel="noopener noreferrer"&gt;GitIngest UI&lt;/a&gt;. HitIngest is an open source library allowing consumers to integrate ability to scrape public repositories from GitHub or manually downloading a file using the Web UI linked earlier.&lt;/p&gt;

&lt;p&gt;In this case, we have a single &lt;a href="https://github.com/syamaner/moonbeans/blob/performance_evaluation/src/AspireRagDemo.API/dotnet-docs-aspire.txt#:~:text=dotnet-,%2D,-docs%2Daspire.txt" rel="noopener noreferrer"&gt;File&lt;/a&gt; that contains markdown and .yml files from &lt;a href="https://github.com/dotnet/docs-aspire" rel="noopener noreferrer"&gt;Official .NET Aspire Documentation Repository&lt;/a&gt;. This file is generated by GitIngest UI and contains around 180 files concatenated into a single text file. &lt;/p&gt;

&lt;h2&gt;
  
  
  Ingestion Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  File Format.
&lt;/h3&gt;

&lt;p&gt;The ingestion process in this example is straightforward and we follow the steps illustrated below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuje92yb6zrr5vvr5qg99.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuje92yb6zrr5vvr5qg99.png" alt="Ingestion Process" width="800" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Splitting actual files
&lt;/h3&gt;

&lt;p&gt;As we are using a single file containing multiple .md and .yml files as described above, first step is to split them into filename, file content pairs. &lt;/p&gt;

&lt;p&gt;The files are separated by headers as following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;... content
================================================
File: README.md
================================================
... content
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Given this is a throw away example, code below is just enough to demonstrate the process without much distractions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GitIngestFileSplitter&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;SeparatorLine&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"====================="&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;FilePrefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"File:"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;Dictionary&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;ParseContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// declarations omitted &lt;/span&gt;
        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SeparatorLine&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;currentFileName&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;isCollectingContent&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="n"&gt;skipNextSeperatorLine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;currentFileName&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;contentBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;TrimEnd&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                    &lt;span class="n"&gt;contentBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Clear&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                    &lt;span class="n"&gt;currentFileName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="n"&gt;isCollectingContent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="n"&gt;skipNextSeperatorLine&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isCollectingContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;StartsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FilePrefix&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;currentFileName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FilePrefix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                    &lt;span class="n"&gt;isCollectingContent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="n"&gt;skipNextSeperatorLine&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;currentFileName&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;skipNextSeperatorLine&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SeparatorLine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;IsNullOrWhiteSpace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="n"&gt;contentBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AppendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;

                    &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Don't forget to add the last file if there is one&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;currentFileName&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;contentBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;currentFileName&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;contentBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;TrimEnd&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; 
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Chunking
&lt;/h3&gt;

&lt;p&gt;Now that we have a Dictionary of file names and file content, we now need to get chinks for the file contents.&lt;/p&gt;

&lt;p&gt;In this case, I have opted to experiment with &lt;a href="https://github.com/tryAGI/LangChain" rel="noopener noreferrer"&gt;LangChain .NET project&lt;/a&gt; &lt;br&gt;
We are using &lt;a href="https://github.com/tryAGI/LangChain/blob/main/src/Splitters/Abstractions/src/Text/MarkdownHeaderTextSplitter.cs" rel="noopener noreferrer"&gt;MarkdownHeaderTextSplitter&lt;/a&gt; and &lt;a href="https://github.com/tryAGI/LangChain/blob/main/src/Splitters/Abstractions/src/Text/CharacterTextSplitter.cs" rel="noopener noreferrer"&gt;CharacterTextSplitter&lt;/a&gt; from LangChain .NET.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GitIngestChunker&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IChunker&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// declarations / constructor omitted.&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;IAsyncEnumerable&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FileChunks&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;gitIngestFilePath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Read the text file (this is the single file containing all markdown files)&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;gitIngestFileContent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ReadAllTextAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gitIngestFilePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// Split the files as discussed earlier&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GitIngestFileSplitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ParseContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gitIngestFileContent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// Start chunking each split file.&lt;/span&gt;
        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;chunkingTimer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;MetricTimer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MetricNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunking&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;            
            &lt;span class="c1"&gt;// omitted: get TextSplitter for given file type.            &lt;/span&gt;
            &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;fileChunks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;FileChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]);&lt;/span&gt;
            &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SplitText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="c1"&gt;// we are using markdown header splitter. So if generated chinks are large, we need to keep chunking them.&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="m"&gt;600&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="m"&gt;600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;subChunks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_characterSplitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SplitText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                        &lt;span class="n"&gt;fileChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddRange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subChunks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="n"&gt;fileChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;fileChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="c1"&gt;// return the chunks representing the current markdown or yml file&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fileChunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="nf"&gt;CanChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DocumentType&lt;/span&gt; &lt;span class="n"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;documentType&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;DocumentType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GitIngest&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Getting embedding for the chunks
&lt;/h3&gt;

&lt;p&gt;We are using Semantic Kernel so this part is straightforward and will work with whichever API we chose to use. Given we have so far split the file, and got the chunks for each document, we can use the registered ITextEmbeddingGenerationService (this is driven by app and aspire configuration) to compute the embeddings using the inference approach we have configured.&lt;/p&gt;

&lt;p&gt;We also have some custom metrics we are tracking that are visible on Aspire Dashboard as we perform ingestion.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IngestionPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Kernel&lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ITextEmbeddingGenerationService&lt;/span&gt; &lt;span class="n"&gt;_embeddingGenerator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetRequiredService&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ITextEmbeddingGenerationService&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;IngestDataAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DocumentType&lt;/span&gt; &lt;span class="n"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;fileChunk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;documentChunker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;IList&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ReadOnlyMemory&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;?&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;MetricTimer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;MetricNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KeyValuePair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"File"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                       &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KeyValuePair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"EmbeddingModel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_embeddingGenerator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateEmbeddingsAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fileChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="n"&gt;rest&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;    
    &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="n"&gt;rest&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inserting the vectors
&lt;/h3&gt;

&lt;p&gt;Now that we have the embeddings, we need to insert them. This process involves a few steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mapping a .NET class to a vector store document&lt;/li&gt;
&lt;li&gt;Ensuring the Collection exists (optionally recreated)&lt;/li&gt;
&lt;li&gt;Using correct dimensions for the collection which depends on what embedding model we use.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Mapping
&lt;/h4&gt;

&lt;p&gt;Microsoft has good documentation on &lt;a href="https://learn.microsoft.com/en-us/semantic-kernel/concepts/vector-store-connectors/how-to/vector-store-custom-mapper?pivots=programming-language-csharp" rel="noopener noreferrer"&gt;how to build custom mappers for Vector Store Connectors&lt;/a&gt; so I will not repeat it here. However at a high level, it is important to cover some of the aspects.&lt;/p&gt;

&lt;p&gt;We can use attributes for mapping but in this demo we can use multiple embedding models and they have different dimensions for embedding vectors so using attributes would mean hardcoding these. &lt;/p&gt;

&lt;p&gt;We can however define our  VectorStoreRecordDefinition in code so that we can at runtime chose the correct dimensions for our collection. &lt;/p&gt;

&lt;p&gt;So our mapping can be as simple as the following snippet from &lt;a href="https://github.com/syamaner/moonbeans/blob/performance_evaluation/src/AspireRagDemo.API/Infrastructure/QdrantCollectionFactory.cs" rel="noopener noreferrer"&gt;QdrantCollectionFactory.cs&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;Dictionary&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;EmbeddingModels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"mxbai-embed-large"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"nomic-embed-text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;768&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"granite-embedding:30m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;384&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;


    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;VectorStoreRecordDefinition&lt;/span&gt; &lt;span class="n"&gt;_faqRecordDefinition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Properties&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;VectorStoreRecordProperty&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorStoreRecordKeyProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Guid&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorStoreRecordDataProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="k"&gt;typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;IsFilterable&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StoragePropertyName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"page_content"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorStoreRecordDataProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Metadata"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FileMetadata&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;IsFullTextSearchable&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StoragePropertyName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"metadata"&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorStoreRecordVectorProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Vector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;Dimensions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EmbeddingModels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ContainsKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;EmbeddingModels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;DistanceFunction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DistanceFunction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IndexKind&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;IndexKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Hnsw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;StoragePropertyName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"page_content_vector"&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When bootstrapping we can then use our factory and register it with .NET Semantic Kernel so whenever we inject and &lt;code&gt;IVectorStore&lt;/code&gt; we will have our mappers integrated in the pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        var options = new QdrantVectorStoreOptions
        {
            HasNamedVectors = true,
            VectorStoreCollectionFactory = new QdrantCollectionFactory(embeddingModelName)
        };
        kernelBuilder.AddQdrantVectorStore(options: options);
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Inserting vectors to our collection
&lt;/h4&gt;

&lt;p&gt;Once we handle the registration and configuration, we are ready to consume &lt;code&gt;IVectorStore&lt;/code&gt; in our code and make use of it. So in our &lt;a href="https://github.com/syamaner/moonbeans/blob/performance_evaluation/src/AspireRagDemo.API/Ingestion/IngestionPipeline.cs" rel="noopener noreferrer"&gt;IngestionPipeline.cs&lt;/a&gt; we need to perform the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure collection exits:

&lt;ul&gt;
&lt;li&gt;Create if it does not or recreate if required.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Insert the vectors as below:
&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// .NET Semantic Kernel is experimental so we need to opt in to use it.&lt;/span&gt;
&lt;span class="cp"&gt;#pragma warning disable SKEXP0001
&lt;/span&gt;&lt;span class="p"&gt;....&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="n"&gt;omitted&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IngestionPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;IVectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AspireRagDemoIngestionMetrics&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;IVectorStoreRecordCollection&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Guid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FaqRecord&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_faqCollection&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetCollection&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Guid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FaqRecord&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;VectorStoreCollectionName&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;IngestDataAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DocumentType&lt;/span&gt; &lt;span class="n"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;EnsureCollectionExists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;documentsProcessed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;....&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="n"&gt;omitted&lt;/span&gt;
        &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ingestionTimer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;MetricTimer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;MetricNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DocumentIngestion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KeyValuePair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"File"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KeyValuePair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"EmbeddingModel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;fileChunk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;documentChunker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;               &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RecordProcessedChunkCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fileChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;fileChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;++)&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;try&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;faqRecord&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;FaqRecord&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="n"&gt;Id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Guid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewGuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                            &lt;span class="n"&gt;Content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fileChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                            &lt;span class="n"&gt;Vector&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                            &lt;span class="n"&gt;Metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;FileMetadata&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                            &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="n"&gt;FileName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;StringValue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fileChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FileName&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                            &lt;span class="p"&gt;}&lt;/span&gt;
                       &lt;span class="p"&gt;};&lt;/span&gt;
                        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_faqCollection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UpsertAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;faqRecord&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;documentsProcessed&lt;/span&gt;&lt;span class="p"&gt;++;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RecordProcessedDocumentCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documentsProcessed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;EnsureCollectionExists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;forceRecreate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;collectionExists&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_faqCollection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CollectionExistsAsync&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collectionExists&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="n"&gt;forceRecreate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_faqCollection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;DeleteCollectionAsync&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_faqCollection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateCollectionAsync&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this quick post we have covered using TextSplitters from LangChain .NET, Vector Stores and Embedding models via .NET Semantic Kernel and some custom metrics captured during ingestion.&lt;/p&gt;

&lt;p&gt;Without much code, we can get impressive results using what is available to us in .NET world and if you would like to see the results here is how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clone the repository&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;http-ollama-local&lt;/code&gt; configuration in the AppHost Project.&lt;/li&gt;
&lt;li&gt;Run the aspire project&lt;/li&gt;
&lt;li&gt;Wait for models to one downloaded and started&lt;/li&gt;
&lt;li&gt;Then use the &lt;a href="https://github.com/syamaner/moonbeans/blob/performance_evaluation/src/AspireRagDemo.API/AspireRagDemo.API.http" rel="noopener noreferrer"&gt;src/AspireRagDemo.API/AspireRagDemo.API.http&lt;/a&gt; and execute &lt;code&gt;http://localhost:5026/ingest?fileName=dotnet-docs-aspire.txt&lt;/code&gt; call. Depending on model size and CPU, tis can take somewhere between 30 seconds to 15 minutes.&lt;/li&gt;
&lt;li&gt;Once ingestion completed, access the UI from Aspire Dashboard and run some Aspire Related queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F869uelhwvl65f50r58zc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F869uelhwvl65f50r58zc.png" alt="Rag query: Is .Net Aspire a replacement for Kubernetes?" width="800" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In addition, feel free to explore the metrics as below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9nr2mlqbq5isbzp87q2f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9nr2mlqbq5isbzp87q2f.png" alt="Custom metrics for the demo" width="450" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskuhn8ay2eiawnbyjoyu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskuhn8ay2eiawnbyjoyu.png" alt="Embedding timings" width="706" height="742"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>aspire</category>
      <category>ai</category>
      <category>rag</category>
    </item>
    <item>
      <title>Building a simple Retrieval Augmented Generation system using .Net Aspire</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Sun, 02 Feb 2025 15:31:43 +0000</pubDate>
      <link>https://dev.to/syamaner/building-a-simple-retrieval-augmented-generation-system-using-net-aspire-4pdp</link>
      <guid>https://dev.to/syamaner/building-a-simple-retrieval-augmented-generation-system-using-net-aspire-4pdp</guid>
      <description>&lt;p&gt;In this post, we will look into building a simple Retrieval Augmented Generation (RAG) system where we use Jupyter Notebooks for ingestion and .NET Web API for retrieval and generation part using .NET Aspire and having telemetry from both Python and C# components of the system.&lt;/p&gt;

&lt;p&gt;We will be looking into the following components to build our system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector store: Qdrant with Aspire.Hosting.Qdrant package.&lt;/li&gt;
&lt;li&gt;Ingestion: Jupyter Notebooks

&lt;ul&gt;
&lt;li&gt;Langchain for ingestion and OpenTelemetry to ensure &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Experimental UI: Streamlit.&lt;/li&gt;

&lt;li&gt;Embeddings and Generative models: 

&lt;ul&gt;
&lt;li&gt;Ollama using CommunityToolkit.Aspire.Hosting.Ollama package. &lt;/li&gt;
&lt;li&gt;Ollama hosted on the development machine (without Docker)&lt;/li&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;HuggingFace &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;API: Asp.Net Web API with .Net 9

&lt;ul&gt;
&lt;li&gt;Microsoft Semantic Kernel.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;There are several posts about how to integrate Ollama, OpenAI, Semantic Kernel and emerging open source models. This post will focus on how Aspire 9 networking enhancements help us to build and debug the systems where we might be using multiple languages and frameworks as well as being able to switch our models and model providers with a few lines of configuration change. In additional, we will also look into how to utilise hardware acceleration when such improvements are not available via Docker (mainly MacOS devices)&lt;/p&gt;

&lt;p&gt;This post will focus on the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How we can use .Net Aspire for polyglot solutions where some components might be better off in different programming languages.&lt;/li&gt;
&lt;li&gt;How improved Docker network support in Aspire 9 helps us?&lt;/li&gt;
&lt;li&gt;How to utilise power of configuration in Aspire to be able to run Ollama as a container or as an application on the host machine without changing any code.

&lt;ul&gt;
&lt;li&gt;Likewise, how to swap Ollama with OpenAI or HuggingFace inference endpoints. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The use case in this post is ingesting .Net Aspire documentation repository and using a RAG approach to answer questions about .Net Aspire. Building such a system is easy but not necessarily helpful if we don't have any metrics to measure success. We are not covering evaluation on this post and that will be the main subject of the next post on the topic. To achieve our use case, we will be utilising Gitingest which is a Python library that helps scraping Github repositories in a format easy for parsing and ingesting. The python code can use the library directly or can consume a text file produced by Gitingest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval Augmented Generation (RAG)
&lt;/h2&gt;

&lt;p&gt;There is almost universal awareness that the technology and architecture behind Large Language Models (LLMs) is prone to being creative and making things up. Likes of &lt;a href="https://garymarcus.substack.com/" rel="noopener noreferrer"&gt;Gary Marcus&lt;/a&gt; and &lt;a href="https://www.infoworld.com/article/2338107/the-philosopher-a-conversation-with-grady-booch.html" rel="noopener noreferrer"&gt;Grady Booch&lt;/a&gt; have been trying to raise awareness on what the architectures enabling LLMs are and what they are not.&lt;/p&gt;

&lt;p&gt;So if a given technology is good at some areas and have well known limitations on other areas such as being creative with facts, how can we utilise the strengths of such technologies?&lt;/p&gt;

&lt;p&gt;One of the approaches is Retrieval Augmented Generation. One of the earlier papers using the term "Retrieval Augmented Generation" is  &lt;a href="https://www.semanticscholar.org/paper/Retrieval-Augmented-Generation-for-NLP-Tasks-Lewis-Perez/58ed1fbaabe027345f7bb3a6312d41c5aac63e22#cited-papers" rel="noopener noreferrer"&gt;“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.”&lt;/a&gt; by Lewis, Patrick et al. 2020&lt;/p&gt;

&lt;p&gt;A simple RAG system involves in context learning where a general purpose LLM can be used to summarise / extract the answer to a question given related context retrieved using a vector store. In the next section we'll cover these building blocks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embeddings and Vector Storage
&lt;/h3&gt;

&lt;p&gt;R in RAG stands for Retrieval. This is where the strength of the approach comes from. Given a repository of data, if we can get relevant answers (context - from a vector store), then we can use a generative model to provide the answer we are looking for given a number of matches to our query.&lt;/p&gt;

&lt;h4&gt;
  
  
  The ingestion process and impact of chunking
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4a8ldmyg6jamt6k0knvf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4a8ldmyg6jamt6k0knvf.png" alt="Ingestion for RAG" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ingestion process involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enumerating the documents (in this case we are dealing with text only where multiple markdown and yml files are merged into a single text file)&lt;/li&gt;
&lt;li&gt;Breaking them down into chunks to make them manageable (chunking).

&lt;ul&gt;
&lt;li&gt;Typically, there is a lot to consider here. Some documents have hierarchy which can be utilised when chunking and some documents might be ok with simple breaking down by a fixed size. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Getting the embeddings using embedding model (this needs to be use for the retrieval stage later too)&lt;/li&gt;

&lt;li&gt;Adding them to our Vector Store.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;For more information on chunking, follow the links at the bottom of the post.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval and Generation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcmnyvr24a8vei10ufcm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcmnyvr24a8vei10ufcm.png" alt="Retrieval and Generation in RAG" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the data is ingested and vector store is up to date, then we can query our RAG system as illustrated in the above diagram.&lt;/p&gt;

&lt;p&gt;The steps are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Get embeddings for the query using the same embedding model utilised for ingestion.&lt;/li&gt;
&lt;li&gt;Query the vector store for n nearest results matching our input.&lt;/li&gt;
&lt;li&gt;Build the context and run our prompt against our generative model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  .Net Aspire, .NET and Python
&lt;/h2&gt;

&lt;p&gt;There is experimental support for running Python projects as executable in an Aspire Application Host. However, it is also possible to run containers from a Dockerfile which can provide more flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Jupyter Notebooks
&lt;/h3&gt;

&lt;p&gt;We could run directly using a prebuilt image. However, if we need additional modules or any customisation, then using our Dockerfile and requirements file will ensure our notebook is available immediately (once built) so that we don't have to install same packager each time container is recreated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; quay.io/jupyter/minimal-notebook:python-3.12.8&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; libmagic-dev

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; /app
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt /app&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /app/requirements.txt

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; ${NB_UID}&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["start-notebook.sh"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can then run this as container in our AppHost project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;jupyter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddDockerfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Constants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionStringNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JupyterService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"./Jupyter"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithBuildArg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PORT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;applicationPorts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Constants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionStringNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JupyterService&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;    
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;$"--NotebookApp.token=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;jupyterLocalSecret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithBindMount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"./Jupyter/Notebooks/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"/home/jovyan/work"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Replit UI
&lt;/h3&gt;

&lt;p&gt;Given this was intended as a quick experimentation to understand how the pieces plug together, using Replit made sense.&lt;/p&gt;

&lt;p&gt;However, as Replit is started using a CLI it seemed easier to run it as a container too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.9-slim&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; PORT=8501&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; APP_PORT=$PORT&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt /app&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /app/requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; main.py /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; TraceSetup.py /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; entrypoint.sh /app&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /app/entrypoint.sh

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; ${PORT}&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;groupadd &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; 65532 replitui &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; useradd &lt;span class="nt"&gt;--create-home&lt;/span&gt; &lt;span class="nt"&gt;--shell&lt;/span&gt; /bin/bash &lt;span class="nt"&gt;--uid&lt;/span&gt; 65532 &lt;span class="nt"&gt;-g&lt;/span&gt; replitui ui_user
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 65532:65532&lt;/span&gt;

&lt;span class="k"&gt;HEALTHCHECK&lt;/span&gt;&lt;span class="s"&gt; CMD curl --fail http://localhost:${PORT}/_stcore/health&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; [ "bash", "/app/entrypoint.sh"] &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the framework, it only takes a few lines of code to have the basic components needed for our UI. The Python and bash code are linked at the corresponding branch for this article.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/moonbeans/blob/44f8d354d5cde6d7aeebdab02adada612c631979/src/AspireRagDemo.UI/main.py#L24" rel="noopener noreferrer"&gt;UI set up&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/moonbeans/blob/44f8d354d5cde6d7aeebdab02adada612c631979/src/AspireRagDemo.UI/entrypoint.sh#L3" rel="noopener noreferrer"&gt;Startup code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Web API with Semantic Kernel
&lt;/h3&gt;

&lt;p&gt;To utilise Semantic Kernel, we need to define our prompt as well as the Prompt Template. For the RAG query, it is defined as below:&lt;/p&gt;

&lt;p&gt;In our prompt, we define the input placeholders for context and question. Then in the PromptTemplateConfig, we link the prompt as well as defining two input arguments for runtime.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;RagPromptTemplate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"""
&lt;/span&gt;                                             &lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;helpful&lt;/span&gt; &lt;span class="n"&gt;AI&lt;/span&gt; &lt;span class="n"&gt;assistant&lt;/span&gt; &lt;span class="n"&gt;specialised&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;technical&lt;/span&gt; &lt;span class="n"&gt;questions&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;good&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="n"&gt;utilising&lt;/span&gt; &lt;span class="n"&gt;additional&lt;/span&gt; &lt;span class="n"&gt;technical&lt;/span&gt; &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="n"&gt;provided&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;you&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;additional&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
                                             &lt;span class="n"&gt;Use&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;following&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="n"&gt;bringing&lt;/span&gt; &lt;span class="n"&gt;necessary&lt;/span&gt; &lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
                                             &lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="n"&gt;prefer&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;good&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="n"&gt;over&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;explanation&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="n"&gt;also&lt;/span&gt; &lt;span class="n"&gt;provide&lt;/span&gt; &lt;span class="n"&gt;clear&lt;/span&gt; &lt;span class="n"&gt;justification&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
                                             &lt;span class="n"&gt;If&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="n"&gt;has&lt;/span&gt; &lt;span class="n"&gt;absolutely&lt;/span&gt; &lt;span class="n"&gt;no&lt;/span&gt; &lt;span class="n"&gt;relevance&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;please&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="s"&gt;"I don't know the answer."&lt;/span&gt;
                                             &lt;span class="n"&gt;Please&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt; &lt;span class="n"&gt;sometimes&lt;/span&gt; &lt;span class="n"&gt;make&lt;/span&gt; &lt;span class="n"&gt;educated&lt;/span&gt; &lt;span class="n"&gt;guesses&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt; &lt;span class="n"&gt;imply&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

                                             &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                                             &lt;span class="p"&gt;{{&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;

                                             &lt;span class="n"&gt;Question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                                             &lt;span class="p"&gt;{{&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;                                             
                                             &lt;span class="s"&gt;""";
&lt;/span&gt;
    &lt;span class="c1"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span class="c1"&gt;/// To answer the question, the AI assistant will use the provided context.&lt;/span&gt;
    &lt;span class="c1"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;PromptTemplateConfig&lt;/span&gt; &lt;span class="n"&gt;RagPromptConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Template&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RagPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;InputVariables&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;InputVariable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"context"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;InputVariable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"question"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the configuration out of the way, we can build a compact C# class that puts it all together for us as below. The notable sections are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GetContextFromVectorStore where we query our Vector Store by getting embeddings for the user's question.&lt;/li&gt;
&lt;li&gt;In the method AnswerWithAdditionalContext, we then create a kernel function and execute it by passing arguments containing user's question and additional context retrieved from our vector store.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// omit using&lt;/span&gt;
&lt;span class="cp"&gt;#pragma warning disable SKEXP0001
&lt;/span&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Kernel&lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IVectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;IOptions&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ModelConfiguration&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ILogger&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ChatClient&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IChatClient&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt; &lt;span class="n"&gt;TopSearchResults&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ITextEmbeddingGenerationService&lt;/span&gt; &lt;span class="n"&gt;_embeddingGenerator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;IVectorStoreRecordCollection&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Guid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FaqRecord&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_faqCollection&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;....&lt;/span&gt;

    &lt;span class="c1"&gt;// additional methods omitted for brevity.&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;AnswerWithAdditionalContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KernelArguments&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"question"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;

        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;kernelFunction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateFunctionFromPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PromptConstants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RagPromptConfig&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;kernelFunction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;InvokeAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span class="c1"&gt;/// Get context from the vector store based on the question.&lt;/span&gt;
    &lt;span class="c1"&gt;///  This method uses the vector store to search for the most relevant context based on the question:&lt;/span&gt;
    &lt;span class="c1"&gt;///      1. Retrieve the embeddings using the embedding model&lt;/span&gt;
    &lt;span class="c1"&gt;///      2. Search the vector store for the most relevant context based on the embeddings.&lt;/span&gt;
    &lt;span class="c1"&gt;///      3. Return the context as a string.&lt;/span&gt;
    &lt;span class="c1"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span class="c1"&gt;/// &amp;lt;param name="question"&amp;gt;&amp;lt;/param&amp;gt;&lt;/span&gt;
    &lt;span class="c1"&gt;/// &amp;lt;returns&amp;gt;Vector Search Results.&amp;lt;/returns&amp;gt;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetContextFromVectorStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;questionVectors&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_embeddingGenerator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateEmbeddingsAsync&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;stbContext&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;StringBuilder&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;searchResults&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_faqCollection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;VectorizedSearchAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;questionVectors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;VectorSearchOptions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Top&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TopSearchResults&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;searchResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;stbContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AppendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;stbContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With little code, we have a RAG system functioning even with a barebones UI for local testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aspire - Docker Networking: Communication Between Components
&lt;/h2&gt;

&lt;p&gt;There are three different ways for application components to communicate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Container to container&lt;/li&gt;
&lt;li&gt;Container to host&lt;/li&gt;
&lt;li&gt;Host to container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aspire 9 creates a Docker network which supports all these communication options.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k1jcv9fvtbj12vudayi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k1jcv9fvtbj12vudayi.png" alt="Demo application networking" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Container to container
&lt;/h3&gt;

&lt;p&gt;When using Docker compose, we can use service names to connect from one container to another container on the same Docker network. &lt;/p&gt;

&lt;p&gt;In the demo application Jupyter Notebook container can connect to Ollama (if running as container - more on this later) and Qdrant container using their service names.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container to host
&lt;/h3&gt;

&lt;p&gt;Aspire Dashboard in our project runs as an executable as opposed to container. which means, if containers to Use OpenTelemetry, then OTLP endpoint telemetry on Aspire Dashboard running as executable on our host machine needs to be accessible from the containers. &lt;/p&gt;

&lt;p&gt;In this case, it is not possible to use localhost as destination in containers so we can use host.docker.internal as the OTEL collector url from containers. This way containers can reach services running on host machine too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Host to container
&lt;/h3&gt;

&lt;p&gt;This is the case where our .Net Web Api project which is running as an executable process on our host machine and can access all services using localhost and corresponding ports. &lt;/p&gt;

&lt;h2&gt;
  
  
  Running Ollama as a container or not?
&lt;/h2&gt;

&lt;p&gt;Just because we can run everything as containers does not mean we should run it as a container. &lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware accelerated Docker
&lt;/h3&gt;

&lt;p&gt;Currently, it is possible to utilise the GPU on Docker hosting NVIDIA Docker. This setup requires device running Linux (or Windows + WSL 2 configured correctly).&lt;/p&gt;

&lt;p&gt;When this is the case, running Ollama as a container makes sense. &lt;/p&gt;

&lt;p&gt;There are times, when the host machine supports hardware acceleration for Ollama when running on the host but not when running as containers.&lt;/p&gt;

&lt;p&gt;For instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ARM based MacBook Pro and other MacOS devices.

&lt;ul&gt;
&lt;li&gt;Ollama supports acceleration and depending on the specs, can make a huge difference. &lt;/li&gt;
&lt;li&gt;However as hardware acceleration is not supported by Docker in these operating systems, running Ollama in Docker (with or without Aspire) will end up being much slower.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Similarly on Windows devices where there is a dedicated NVIDIA GPU but no NVIDIA Docker support, running Ollama on the host OS will provide better performance.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Our example project also allows for the following set up with a configuration change:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjp52vzbjrxnxseje0o7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjp52vzbjrxnxseje0o7.png" alt="Ollama running on host" width="800" height="528"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Switching Models and Model providers
&lt;/h2&gt;

&lt;p&gt;Given .NET Aspire shines as a development time orchestration framework, it is no wonder the configuration system is also powerful but simple.&lt;/p&gt;

&lt;p&gt;In this project we can conditionally spin up Ollama (if needed) or inject connection string for Ollama running on Host using launchSettings.json on App Host. In addition, the models used for embeddings and generation can be easily swapped too and both Python and .NET based components will use whichever values injected via configuration at runtime.&lt;/p&gt;

&lt;p&gt;In addition, it is also possible to use OpenAI for both embeddings and generation using the configuration. In such case, we do need to set up developer secrets to contain a valid OpenAI key.&lt;/p&gt;

&lt;p&gt;The main driver for our solution is the launchsettings.json file included with Aspire AppHost project. By modifying this, all our components will utilise the desired models or providers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json.schemastore.org/launchsettings.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"profiles"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"commandName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Project"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"dotnetRunMessages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"launchBrowser"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"applicationUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:15062"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"environmentVariables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="w"&gt;        
        &lt;/span&gt;&lt;span class="nl"&gt;"EMBEDDING_MODEL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nomic-embed-text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"EMBEDDING_MODEL_PROVIDER"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;        
        &lt;/span&gt;&lt;span class="nl"&gt;"CHAT_MODEL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mistral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"CHAT_MODEL_PROVIDER"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;        
        &lt;/span&gt;&lt;span class="nl"&gt;"VECTOR_STORE_VECTOR_NAME"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"page_content_vector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this project, we can use the following values for EMBEDDING_MODEL_PROVIDER:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama : Spin up an Ollama Container using aspire and inject the connection string.&lt;/li&gt;
&lt;li&gt;OllamaHost : Do not spin up Ollama container and inject host.docker.internal to containers or localhost to the application executables.&lt;/li&gt;
&lt;li&gt;OpenAI: Inject the API key from secrets and use default OpenAI urls.&lt;/li&gt;
&lt;li&gt;HuggingFace: Inject API key from developer secrets and use default HuggingFace inference urls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To use OpenAI or huggingFace, the following user secrets need to be set with valid keys:&lt;br&gt;
Please note, nothing Python and .Net components will use default endpoints for these services and therefore connection strings are not used.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Parameters:OpenAIKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Parameters:HuggingFaceKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This was a warm up to building a metadata driven retrieval system to query photographs using multi-modal computer vision models which I have been posting about.&lt;/p&gt;

&lt;p&gt;Even with quantised and smaller models that run on CPU, we can get decent results asking about .Net Aspire based on the markdown and yml files in the official documentation repository. In this case, we did not utilise any metadata but that will be part of the photo search project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Models used for testing
&lt;/h3&gt;

&lt;p&gt;Embedding model: &lt;a href="https://ollama.com/library/granite-embedding" rel="noopener noreferrer"&gt;granite-embedding&lt;/a&gt;&lt;br&gt;
Generative model: &lt;a href="https://ollama.com/library/qwen2.5" rel="noopener noreferrer"&gt;qwen2.5:1.5b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a simple question with a relevant answer when using Rag query: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p3c6mhx9o63x9npm4kt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p3c6mhx9o63x9npm4kt.png" alt="RAG Search" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And a made up answer when the question is sent directly to the llm: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jpsuzxpz9j195x9dgnp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jpsuzxpz9j195x9dgnp.png" alt="Search without context" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Runtime performance
&lt;/h3&gt;

&lt;p&gt;In addition, if using Ollama and working with a laptop that has some level of hardware acceleration but acceleration is not available in Docker, then using Ollama installed locally vs running as a container via Aspire gives much better runtime performance. Here is a comparison using a small model:&lt;/p&gt;

&lt;h3&gt;
  
  
  Running as a container
&lt;/h3&gt;

&lt;p&gt;We can see that &lt;strong&gt;ingestion took 1 minute&lt;/strong&gt; and running &lt;strong&gt;two questions took about 33 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4idocfdaw2u2dz06lyzj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4idocfdaw2u2dz06lyzj.png" alt="Runtime performance when running Ollama in Docker" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Running natively on host machine where host has acceleration
&lt;/h3&gt;

&lt;p&gt;When running Ollama natively on the laptop and using host.docker.internal to connect to it from containers, we get around &lt;strong&gt;15 seconds for ingestion&lt;/strong&gt; and &lt;strong&gt;4 seconds for two queries&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkx6goev5sbx1rdrlotjx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkx6goev5sbx1rdrlotjx.png" alt="Runtime Performance when running Ollama natively on a laptop with acceleration" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;With the available technologies, we can rapidly build question and answer solutions. However if we don’t define our performance metrics and a suitable evaluation approach, there is little value in building such systems.&lt;/p&gt;

&lt;p&gt;For instance, we can use different embedding models and generative models. We can change our chunking method or use additional metadata to query and extract relevant chunks more effectively. We can also change model parameters and the list goes on.&lt;/p&gt;

&lt;p&gt;With so many variables, how do we compare the outcome? The next post on this topic will include the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating evaluation data using LLMs.&lt;/li&gt;
&lt;li&gt;Defining our metrics.&lt;/li&gt;
&lt;li&gt;Performing evaluation using the evaluation data and our target metrics.&lt;/li&gt;
&lt;li&gt;Collecting the results from our experiments.&lt;/li&gt;
&lt;li&gt;Visualising and comparing the performance of the evaluation process.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links and References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/syamaner/moonbeans/tree/aspire-rag-intro" rel="noopener noreferrer"&gt;Sample Repository - moonbeans&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://garymarcus.substack.com/" rel="noopener noreferrer"&gt;Gary Marcus&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.semanticscholar.org/paper/Retrieval-Augmented-Generation-for-NLP-Tasks-Lewis-Perez/58ed1fbaabe027345f7bb3a6312d41c5aac63e22#cited-papers" rel="noopener noreferrer"&gt;“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.”&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://learning.oreilly.com/library/view/learning-langchain/9781098167271/" rel="noopener noreferrer"&gt;Learning LangChain&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Chunking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/towards-data-science/rag-101-chunking-strategies-fdc6f6c2aaec" rel="noopener noreferrer"&gt;RAG 101: Chunking Strategies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/gkamradt/ChunkViz?tab=readme-ov-file" rel="noopener noreferrer"&gt;ChunkViz - Visualising Chunking methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2024/10/chunking-techniques-to-build-exceptional-rag-systems/" rel="noopener noreferrer"&gt;15 Chunking Techniques  to Build Exceptional RAG Systems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tools and Frameworks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://gitingest.com/" rel="noopener noreferrer"&gt;Gitingest&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://replit.com/" rel="noopener noreferrer"&gt;Replit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/introduction/" rel="noopener noreferrer"&gt;Langchain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/semantic-kernel/overview/" rel="noopener noreferrer"&gt;Semantic Kernel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/dotnet/docs-aspire" rel="noopener noreferrer"&gt;.NET Aspire Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ollama.com/search" rel="noopener noreferrer"&gt;Ollama - Available Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aspire</category>
      <category>dotnet</category>
      <category>jupyter</category>
      <category>docker</category>
    </item>
    <item>
      <title>Comparing Open-Source Vision Models for Photo Description Tasks Using .NET Aspire</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Mon, 16 Dec 2024 07:00:00 +0000</pubDate>
      <link>https://dev.to/syamaner/comparing-open-source-vision-models-for-photo-description-tasks-using-net-aspire-2ebm</link>
      <guid>https://dev.to/syamaner/comparing-open-source-vision-models-for-photo-description-tasks-using-net-aspire-2ebm</guid>
      <description>&lt;p&gt;In our ongoing series about building a local image summarisation system, we have explored how to combine various open-source technologies to generate meaningful descriptions of photos. Today, we'll tackle a crucial question: How do we choose the best vision model for our needs?&lt;/p&gt;

&lt;p&gt;This article we focus on a simple approach: using OpenAI's GPT-4o as an automated judge to evaluate the quality of summaries generated by different open-source models. In the next sections, we will explore the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Setting up an evaluation pipeline with .NET Aspire&lt;/li&gt;
&lt;li&gt;Using GPT-4o to score model outputs&lt;/li&gt;
&lt;li&gt;Visualising and analysing the results using Jupyter notebooks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our evaluation covers six prominent open-source vision models:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://ollama.com/library/llama3.2-vision" rel="noopener noreferrer"&gt;llama3.2-vision&lt;/a&gt;: Latest iteration of Meta's multimodal model&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ollama.com/library/llava-llama3" rel="noopener noreferrer"&gt;llava-llama3&lt;/a&gt;: Vision-language model built on LLaMA architecture&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ollama.com/library/llava" rel="noopener noreferrer"&gt;llava:7b&lt;/a&gt;: Compact vision-language model suitable for local deployment&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ollama.com/library/llava" rel="noopener noreferrer"&gt;llava:13b&lt;/a&gt;: Larger variant offering enhanced capabilities&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/microsoft/Florence-2-large-ft" rel="noopener noreferrer"&gt;Florence-2-large-ft&lt;/a&gt;: Microsoft's vision model known for detailed scene understanding&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ollama.com/library/llava-phi3" rel="noopener noreferrer"&gt;llava-phi3&lt;/a&gt;: Recent addition combining efficiency with strong performance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These models run locally through our Aspire based infrastructure, which handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model inference and serving&lt;/li&gt;
&lt;li&gt;Reverse geocoding for location context&lt;/li&gt;
&lt;li&gt;Experiment tracking and result storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we can generate summaries, which model should we use for summarising our photo library? This post will cover a simple approach how that can be achieved using help of a commercial model. &lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation Process
&lt;/h2&gt;

&lt;p&gt;As a weekend project this post covers the following idea: "How about using a commercial model to judge the output generated by Open Source models?" &lt;/p&gt;

&lt;h3&gt;
  
  
  Why GPT-4o?
&lt;/h3&gt;

&lt;p&gt;GPT-4o (released May 2024) offers several advantages as our evaluation model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multimodal capabilities for analysing both images and text.&lt;/li&gt;
&lt;li&gt;Consistent scoring methodology.&lt;/li&gt;
&lt;li&gt;Cost-effective solution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Besides being a good choice for the task, pricing was also an advantage for a fun project where there is no budget at all. For instance, 300 evaluation requests (50 images x 6 open source models) cost around $0.8. Open AI API Pricing is available &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach
&lt;/h3&gt;

&lt;p&gt;Our approach can be described simply as below:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Input parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Original photo (scaled to 256px width).&lt;/li&gt;
&lt;li&gt;Model-generated summary.&lt;/li&gt;
&lt;li&gt;Model used.&lt;/li&gt;
&lt;li&gt;Categorisation predictions.&lt;/li&gt;
&lt;li&gt;Top 10 detected objects.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scoring Criteria:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quality and accuracy of the summary (0-100).&lt;/li&gt;
&lt;li&gt;Accuracy of category predictions.&lt;/li&gt;
&lt;li&gt;Precision of object detection.&lt;/li&gt;
&lt;li&gt;Consistency with image content.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Result Collection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured score and justification storage.&lt;/li&gt;
&lt;li&gt;Integration with existing MongoDB database.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Setting up
&lt;/h2&gt;

&lt;p&gt;For this task, we utilise OpenAIClient from Aspire.OpenAI as seen in code sample below. &lt;/p&gt;

&lt;h3&gt;
  
  
  Key Implementation Decisions:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Temperature Setting (0.1f):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chosen for consistent, deterministic evaluations.&lt;/li&gt;
&lt;li&gt;Reduces random variation in scoring.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;JSON Schema Format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensures structured, parseable responses.&lt;/li&gt;
&lt;li&gt;Simplifies result processing and storage.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Image Preprocessing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;256px width limitation balances detail and API costs.&lt;/li&gt;
&lt;li&gt;Consistent sizing ensures fair comparisons.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAiPhotoSummaryEvaluationClient&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;FromKeyedServices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"openaiConnection"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="n"&gt;OpenAIClient&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IPhotoSummaryEvaluator&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;SystemPrompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="s"&gt;"You are a highly accurate and fair image summarisation evaluation model. "&lt;/span&gt;
        &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" Your job is to evaluate the quality of summaries generated from images by different computer vision models. \n\n"&lt;/span&gt;
        &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" When evaluating a summary of the provided image:\n\n"&lt;/span&gt;
        &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" - Provide a single score ranging between 0 and 100 combining the following properties: \n\n"&lt;/span&gt;
        &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"    - Quality and accuracyof the summary.\n\n"&lt;/span&gt;
        &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"    - Quality and accuracy of the categories predicted for the image.\n\n"&lt;/span&gt;
        &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"    - Quality and accuracy of the objects predicted to be in the image.\n\n"&lt;/span&gt;
        &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"  - Be fair and consistent when evaluating. \n\n"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;PromptSummary&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="s"&gt;"Please score the provided image summary based on the quality and accuracy of the summary, categories, and objects predicted in the image."&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PhotoSummaryScore&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;EvaluatePhotoSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;base64Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageSummaryEvaluationRequest&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;..&lt;/span&gt; &lt;span class="n"&gt;omitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resize&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;me&lt;/span&gt; &lt;span class="n"&gt;max&lt;/span&gt; &lt;span class="m"&gt;256&lt;/span&gt; &lt;span class="n"&gt;px&lt;/span&gt; &lt;span class="n"&gt;wide&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatMessageContentPart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateImagePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;BinaryData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memStream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToArray&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="s"&gt;"image/jpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ChatImageDetailLevel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Auto&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;UserChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PromptSummary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SystemChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SystemPrompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;

        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ChatCompletionOptions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Temperature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0.1f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ResponseFormat&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatResponseFormat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateJsonSchemaFormat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonSchemaFormatName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"image_summary_result"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;jsonSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BinaryData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"""
&lt;/span&gt;                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="s"&gt;"Score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="s"&gt;"Justification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="s"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Score"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Justification"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="s"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="s"&gt;"""),
&lt;/span&gt;            &lt;span class="n"&gt;jsonSchemaIsStrict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;CompleteChatAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;structuredJson&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JsonDocument&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;structuredJson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RootElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Score"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;GetDouble&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;justification&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;structuredJson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RootElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Justification"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;GetString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;PhotoSummaryScore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;justification&lt;/span&gt;&lt;span class="p"&gt;!,&lt;/span&gt; &lt;span class="s"&gt;"OpenAI"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The results from this process are then stored in the database with the summaries against the original image. OpenAI API has certain rate limits and therefore it is important to manage how often these are being Called.&lt;/p&gt;

&lt;h2&gt;
  
  
  Analysis and visualisation
&lt;/h2&gt;

&lt;p&gt;Our analysis notebook provides:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Data Collection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MongoDB query and result aggregation&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Visualisation Components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model comparison table.&lt;/li&gt;
&lt;li&gt;Example evaluation cases&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Validating the outcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter results by best model.&lt;/li&gt;
&lt;li&gt;Visualise evaluation justifications.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Aspire Command to download and upload the notebook between development machine and Jupyter Server on Docker host.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below is an example of the evaluation process where GPT-4o correctly identifies the inaccuracies in the generated summaries. The results look fair and accurate making it easier to introduce more open source models and then use the notebook to evaluate the performances.&lt;br&gt;
This also allows to tweak the prompts to get better results from the models. For example wrong location information is likely from including the address resolved from the GPS tag of the photo which leads some models to be more creative with their description.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F816tfgybkckminl02ltk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F816tfgybkckminl02ltk.png" alt="Evaluation result" width="680" height="618"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/syamaner/photo-search/blob/main/src/PhotoSearch.AppHost/Notebooks/comparison.ipynb" rel="noopener noreferrer"&gt;Further results can be seen on the notebook&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  Results and Remarks
&lt;/h2&gt;

&lt;p&gt;Following the process outlined earlier, &lt;code&gt;llava:13b&lt;/code&gt; is on top with an average of &lt;code&gt;85.6&lt;/code&gt; score with &lt;code&gt;Florence-2-large-ft&lt;/code&gt; being second as below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhpsgl7hqwgz8zkig8hv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhpsgl7hqwgz8zkig8hv.png" alt="Model Rankings" width="346" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Observations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Providing too much address detail can lead models to make up location information.&lt;/li&gt;
&lt;li&gt;Larger models provide more detailed summaries. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;OpenAIClient&lt;/code&gt; from Aspire.OpenAI works well with Ollama Server as well. &lt;/li&gt;
&lt;li&gt;The Aspire Command for Jupyter notebook made it was for me to pull and push the notebook from my machine to wherever Aspire is running containers on. 

&lt;ul&gt;
&lt;li&gt;As a next step, it makes sense to consider periodic downloading of the notebook. &lt;/li&gt;
&lt;li&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mcpp09csyphrlj87sbb.png" alt="Jupyter Notebook Command in Aspire Dashboard" width="800" height="265"&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion and what's next
&lt;/h2&gt;

&lt;p&gt;It is easy enough to utilise APIs that allow inference on image inputs. However, making a decision on what model to use is not so straightforward given the need for a large number of test images against each model. This is what makes the evaluation process crucial to make the most out of such technology. &lt;/p&gt;

&lt;p&gt;In this post, we have looked into using OpenAI GPT-4o model to evaluate open source model performance to assess the quality of the image summaries generated by open source models. &lt;/p&gt;

&lt;p&gt;Our evaluation framework using GPT-4o provides a systematic approach to comparing vision model performance. Key takeaways include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Automated Evaluation Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consistent scoring methodology.&lt;/li&gt;
&lt;li&gt;Scalable to large image sets.&lt;/li&gt;
&lt;li&gt;Cost-effective solution.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Implementation Insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aspire.OpenAI simplifies integration.&lt;/li&gt;
&lt;li&gt;Jupyter notebooks enable flexible analysis.&lt;/li&gt;
&lt;li&gt;.Net Aspire makes local development orchestration a breeze.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Model Expansion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integration of newer vision models&lt;/li&gt;
&lt;li&gt;Prompt engineering optimisation&lt;/li&gt;
&lt;li&gt;Performance benchmarking&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Feature Development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Natural language image search implementation&lt;/li&gt;
&lt;li&gt;Enhanced evaluation metrics&lt;/li&gt;
&lt;li&gt;Automated testing pipeline&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The notebook can be accessed in &lt;a href="https://github.com/syamaner/photo-search/blob/main/src/PhotoSearch.AppHost/Notebooks/comparison.ipynb" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; with rest of the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/photo-search" rel="noopener noreferrer"&gt;Code Base&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/think/topics/gpt-4o" rel="noopener noreferrer"&gt;What is GPT-4o&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2405.05253v1" rel="noopener noreferrer"&gt;Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/syamaner/photo-search/blob/main/src/PhotoSearch.AppHost/Notebooks/comparison.ipynb" rel="noopener noreferrer"&gt;The notebook with evaluation results&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dotnet</category>
      <category>docker</category>
      <category>aspire</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Welcoming .NET Aspire 9.0 : Photo Summary Project</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Mon, 18 Nov 2024 08:27:22 +0000</pubDate>
      <link>https://dev.to/syamaner/welcoming-net-aspire-90-photo-summary-project-dlj</link>
      <guid>https://dev.to/syamaner/welcoming-net-aspire-90-photo-summary-project-dlj</guid>
      <description>&lt;p&gt;The project so far makes it possible to scan photos in a directory, run them through various vision models and store the details in a database. Using Aspire instead of Docker Compose for local development as I would usually do has been fun so far. &lt;/p&gt;

&lt;p&gt;Aspire 9.0 has handy new features, that are interesting for the next stage of the project. This post will summarise them with some examples.&lt;/p&gt;

&lt;p&gt;As the next stage in the project is evaluating the performance of the models I am using, at this point I needed to make Jupyter Notebooks easily accessible inside my codebase and development environment. With .Net Aspire 9.0 this becomes a convenient process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Ready for Next Steps
&lt;/h2&gt;

&lt;p&gt;Given we have several options from open source when it comes to computer vision models go generate photo summaries, we need to be able to evaluate the results from these models to be able to choose one that suits our domain.&lt;/p&gt;

&lt;p&gt;One workflow to do this effectively is using Jupyter Notebooks where we can retrieve our results from the database and then compare with results obtained from commercial models.&lt;/p&gt;

&lt;p&gt;Introducing Jupyter Notebooks that runs on a remote host to our Project means the following would be important:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containers are running on remote host so we need to be able to include the notebooks in version control

&lt;ul&gt;
&lt;li&gt;However docker volumes are on a remote host, no easy way to copy them&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Also we need container to container connection Jupyter Server to MongoDB both of which are running on a remote server and Jupyter Server needs to be able to speak to MongoDb. &lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;As we will see below, .NET Aspire 9.0 takes care of these.&lt;/p&gt;

&lt;p&gt;Here is the list of features in .Net Aspire 9.0 that are relevant to this project and will be covered in this post&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tooling

&lt;ul&gt;
&lt;li&gt;No longer relying on workloads: Now we can set up .Net Aspire using packages and project templates.&lt;/li&gt;
&lt;li&gt;Templates can also be installed as following: &lt;code&gt;dotnet new install Aspire.ProjectTemplates::9.0.0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Dashboard and UX

&lt;ul&gt;
&lt;li&gt;Managing Resource Lifecycles: Start, Stop, Restart from the dashboard.&lt;/li&gt;
&lt;li&gt;Browser Telemetry Support.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;App Host (Orchestration)

&lt;ul&gt;
&lt;li&gt;Waiting for dependencies.&lt;/li&gt;
&lt;li&gt;Resource health Checks&lt;/li&gt;
&lt;li&gt;Persistent Containers&lt;/li&gt;
&lt;li&gt;Resource commands&lt;/li&gt;
&lt;li&gt;Container networking&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;One from .Net 9.0

&lt;ul&gt;
&lt;li&gt;Enabling DI registration of metrics using IMeterFactory &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Browser Telemetry Support
&lt;/h2&gt;

&lt;p&gt;Earlier on, I was curious how to integrate traces from the front end and see the distributed traces. .Net Aspire 9.0 brings an out of the box way for this as below. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;It is important to remember that Open Telemetry client instrumentation on the browser is experimental.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d6rqnzud0pnid0jt648.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d6rqnzud0pnid0jt648.png" alt="Open Telemetry experimental browser support warning." width="800" height="134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define &lt;code&gt;DOTNET_DASHBOARD_OTLP_HTTP_ENDPOINT_URL&lt;/code&gt; environment variable for Apphost launch settings.
'"DOTNET_DASHBOARD_OTLP_HTTP_ENDPOINT_URL": "&lt;a href="http://localhost:16175%22" rel="noopener noreferrer"&gt;http://localhost:16175"&lt;/a&gt;'&lt;/li&gt;
&lt;li&gt;I still needed to inject &lt;code&gt;DOTNET_DASHBOARD_OTLP_ENDPOINT_URL&lt;/code&gt; to the .Net applications.&lt;/li&gt;
&lt;li&gt;Front end required the HTTP endpoint as well as the following environment variable:
&lt;code&gt;"OTEL_EXPORTER_OTLP_PROTOCOL","http/protobuf"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With these in place, I was able to follow Microsoft examples to make it work on a Stencil Js application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="c1"&gt;//https://www.honeycomb.io/blog/opentelemetry-browser-instrumentation&lt;/span&gt;
&lt;span class="c1"&gt;//https://github.com/open-telemetry/opentelemetry-js/tree/main/experimental/packages/opentelemetry-instrumentation-xml-http-request&lt;/span&gt;
&lt;span class="c1"&gt;//https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/plugins/web/opentelemetry-instrumentation-user-interaction&lt;/span&gt;
&lt;span class="c1"&gt;//https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/plugins/web/opentelemetry-instrumentation-long-task&lt;/span&gt;
&lt;span class="c1"&gt;//https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/plugins/web/opentelemetry-instrumentation-long-task&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;otlpOptions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;omitted&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;attributes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;omitted&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SimpleSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OTLPTraceExporter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;otlpOptions&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
  &lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;contextManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StackContextManager&lt;/span&gt;&lt;span class="p"&gt;()});&lt;/span&gt;
  &lt;span class="nf"&gt;registerInstrumentations&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;instrumentations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="nf"&gt;getWebAutoInstrumentations&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/instrumentation-xml-http-request&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;clearTimingResources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LongTaskInstrumentation&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;observerCallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;span&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;longtaskEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;location.pathname&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FetchInstrumentation&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;propagateTraceHeaderCorsUrls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RegExp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/api&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
        &lt;span class="na"&gt;ignoreUrls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RegExp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/tile&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
      &lt;span class="p"&gt;})],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the changes above and the existing setup in our backend, we can see the end to end traces below. The use case below is for selecting a model and then seeing a request to extract summaries from all 50 images in the db. We can see the durations of db calls, calls to inference endpoints as well as transport and our backend components as they all been triggered by the ui.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu1zvtweb7mj2zp7kfm4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu1zvtweb7mj2zp7kfm4.png" alt="Trace view starting from browser click." width="800" height="889"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Resource Health Checks
&lt;/h2&gt;

&lt;p&gt;If we would like to specify resource dependencies to control startup-process, it is important to be able to define what health check meant to various components. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If no health checks defined, a resource is considered healthy if it is in running state&lt;/li&gt;
&lt;li&gt;If the Resource exposes an http health check, we can register with one call &lt;code&gt;.WithHttpHealthCheck("/health")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sometimes external resources do not provide health checks suitable for us, we can also define, register and use our health checks. This is the method that will be discussed here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Checking if Nominatim Resource is healthy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NominatimHealthCheck&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IHealthCheck&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="n"&gt;ignored&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;HealthCheckResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;CheckHealthAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HealthCheckContext&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;cancellationToken&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;CancellationToken&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; 
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ready&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;IsServerReady&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cancellationToken&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ready&lt;/span&gt;
            &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;HealthCheckResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Healthy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HealthCheckResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Unhealthy&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;IsServerReady&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;cancellationToken&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;searchUrl&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/search.php?q=avenue%20pasteur"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// check if we have success result or not. Code omitted.&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// we also need to register this as below:&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddHealthChecks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddTypeActivatedCheck&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;NominatimHealthCheck&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"nominatim-healthcheck"&lt;/span&gt;&lt;span class="p"&gt;,..);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;nominatimResourceBuilder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nominatimResource&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithHealthCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"nominatim-healthcheck"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;....);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And once all registered, we can also see the health in the dashboard:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5m6i75byp5nt3k39v5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5m6i75byp5nt3k39v5e.png" alt="Resource health view in dashboard." width="404" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining Resource Dependencies
&lt;/h2&gt;

&lt;p&gt;Once we have defined our health checks, we can declare our dependencies so that our applications will not start before services they might depend on at start up.&lt;/p&gt;

&lt;p&gt;Prior to .NET Aspire 9.0, it was possible to achieve this following an example provided by David Fowler in Issue #921 in Aspire repository as linked below. &lt;/p&gt;

&lt;p&gt;So now, once all the additional code deleted, all we need is to use  &lt;code&gt;.WaitFor(resource)&lt;/code&gt; from the framework as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;apiService&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddProject&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Projects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PhotoSearch_API&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"apiservice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaContainer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongodb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaContainer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongodb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures all dependencies spin up first and become healthy and then the application will start. This also helps in cases where the containers need to download data on first run, which might take several minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Persistent Containers
&lt;/h2&gt;

&lt;p&gt;In this project we have some containers that take time to lad and get ready such as OSM Map Tile Server, Nominatim container for reverse geocoding and Ollama is starting the first time as it needs to download the model. &lt;/p&gt;

&lt;p&gt;So if we make code changes, containers would stop and then we would need to wait for all dependencies again. &lt;/p&gt;

&lt;p&gt;This is another area where Aspire 9.0 comes to the rescue with a single method call as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;nominatimResourceBuilder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nominatimResource&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithLifetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContainerLifetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Persistent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt; &lt;span class="n"&gt;omitted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this in place, we can start debugging stop and start again lightning fast without having to wait for the containers. &lt;/p&gt;

&lt;h2&gt;
  
  
  Resource Commands
&lt;/h2&gt;

&lt;p&gt;Resource commands allow the developers register commands that can be accessed from the Aspire Dashboard against a resource. &lt;/p&gt;

&lt;p&gt;As I have added Jupyter Notebooks container to the project this weekend, command have helped sole one problem when running the Jupyter Server on a remote Docker host: How do we manage the notebook files? Ideally we manage them in the same repository. &lt;/p&gt;

&lt;p&gt;With a download and upload command, we can upload the notebook from our local drive where our git repository is and then we can download the model when modified inside the Jupyter Notebook container. &lt;/p&gt;

&lt;p&gt;Downloading the notebook in regular intervals will be one of the next steps. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6a81g9n5os9nr8pgnua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6a81g9n5os9nr8pgnua.png" alt="Download / upload command menu items in dashboard." width="271" height="373"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;IResourceBuilder&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ContainerResource&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;WithUploadNoteBookCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="n"&gt;IResourceBuilder&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ContainerResource&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;jupyterToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;jupyterUrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"upload-notebook"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;displayName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Upload Notebook"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;executeCommand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;OnUploadNotebookCommandAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jupyterToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jupyterUrl&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;updateState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OnUpdateResourceState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;iconName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ArrowUpload"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;iconVariant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IconVariant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Filled&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ExecuteCommandResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;OnUploadNotebookCommandAsync&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt; &lt;span class="k"&gt;params&lt;/span&gt; &lt;span class="n"&gt;omitted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;notebookData&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;disk&lt;/span&gt;
       &lt;span class="c1"&gt;// setup omitted&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;httpclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SendAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uploadNoteBookHttpRequestMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
       &lt;span class="c1"&gt;// handle the response&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// And command is registered as below:&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;jupyter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithUploadNoteBookCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:8888"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithDownloadNoteBookCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:8888"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt; &lt;span class="n"&gt;omitted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From a high level the process looks as below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsz8nekc1m7b9l7or14h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsz8nekc1m7b9l7or14h.png" alt="Jupyter Notebook download Command Overview" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although the above illustrates the case when we are using a remote Docker daemon, the experience is the same if we were using a local Docker daemon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Container Networking
&lt;/h2&gt;

&lt;p&gt;This is another feature that made life easy. With workloads running on other machines on local network, so far container to container access have not been that important. &lt;/p&gt;

&lt;p&gt;With the Jupyter Notebooks container, as our data source is a MongoDb container on the same docker host, it was necessary to be able to access the database. &lt;/p&gt;

&lt;p&gt;This is another change where improvements to Aspire have been transparent to the developer and gained the benefits without extra work as below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sgv3uz9figr1mnv40y8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sgv3uz9figr1mnv40y8.png" alt="Accessing MogoDb container from a notebook in a container." width="800" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is not much to do besides ensuring the connection string are injected as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddJupyter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"jupyter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;IsNullOrWhiteSpace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dockerHost&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"secret"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;portMappings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"JupyterPort"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;PublicPort&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongodb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once that is done, we can access the connection string in python as following: &lt;code&gt;connection_string = os.environ.get('ConnectionStrings__photo-search')&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Metrics via Dependency Injection
&lt;/h2&gt;

&lt;p&gt;This is not aspire but as a colleague pointed last week on elf the features in .Net 9.0 was out of the box ability to use Dependency Injection (DI) for registering consuming metrics. &lt;/p&gt;

&lt;p&gt;We can achieve this by injecting IMeterFactory to a utility class where we manage our meters.&lt;/p&gt;

&lt;p&gt;Here is an example from this project for measuring number of models summarised as well as durations for each image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;System.Diagnostics.Metrics&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;PhotoSearch.ServiceDefaults&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ConsoleMetrics&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_photosSummariesCounter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_photosSummaryHistogram&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ConsoleMetrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IMeterFactory&lt;/span&gt; &lt;span class="n"&gt;meterFactory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;meter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;meterFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PhotoSummary.Worker"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;_photosSummariesCounter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;meter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateCounter&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"photosummary.summary.generated"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;_photosSummaryHistogram&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;meter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateHistogram&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"photosummary.summary.durationseconds"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;PhotoSummarised&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_photosSummariesCounter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KeyValuePair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"photosummary.summary.model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;PhotoSummaryTiming&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;durationSeconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_photosSummaryHistogram&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;durationSeconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KeyValuePair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"photosummary.summary.model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;KeyValuePair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"photosummary.summary.photo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Register:&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddSingleton&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ConsoleMetrics&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Ensure our meter is added when configuring OpenTelemetry:&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddOpenTelemetry&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithMetrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddMeter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;InstrumentationOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MeterName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// other calls&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// Inject and use: &lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddSingleton&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ConsoleMetrics&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we see below, total 70 images summarised using two models so far. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flfsjtmtkq7wv5l1qkyan.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flfsjtmtkq7wv5l1qkyan.png" alt="Image summary counter." width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally in the following screen capture, we can see the timings of each photo summary request. While generally they are 5 - 10 seconds, there are some outliers taking about 5 minutes. We can dig into the metrics, find out which photo / model combination causes the spikes at that stage using the traces and then understand if it is a random GPU issue or a consistent delay under some circumstances.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tclj78idvz4za7ut1lh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tclj78idvz4za7ut1lh.png" alt="Image summary duration metrics." width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we investigate the traces, the results are interesting. &lt;/p&gt;

&lt;p&gt;To summarise a photo, currently there are 3 calls to Ollama container:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Get the overall summary&lt;/li&gt;
&lt;li&gt;Using the context, get a list of objects&lt;/li&gt;
&lt;li&gt;Again using the context so far, get a list of possible categories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the traces show us, we spent 4 minute 27 seconds waiting for categories to be generated. &lt;/p&gt;

&lt;p&gt;This is worth investigation and as we have a notebook, it is also easier to experiment with the same prompt / image combinations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6mfaq3lbt323dzbxw7q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6mfaq3lbt323dzbxw7q.png" alt="Trace view for the slow summary operation." width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;.Net Aspire 9.0 changes make Aspire a great alternative to using Docker Compose. By adopting standard container technologies, managing a local development environment using Aspire is worth having a go. &lt;/p&gt;

&lt;p&gt;It has also been great witnessing how Aspire 9.0 features have been discussed in Github issues and ended up ready to consume for the masses with the new release. The transparency and the speed of improvement makes it a great choice for development. &lt;/p&gt;

&lt;p&gt;Now that upgrade is out of the way, next step will be generating the summaries using Open API models and then comparing / ranking each summary generated locally against it and evaluating results. I have also come across a paper that proposes a more systemic approach to evaluation using state of the art models and will be experimenting with that too. &lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/whats-new/dotnet-aspire-9?tabs=unix#tooling-improvements" rel="noopener noreferrer"&gt;Aspire What's New&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/get-started/upgrade-to-aspire-9?pivots=visual-studio" rel="noopener noreferrer"&gt;Upgrade to .NET Aspire 9.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opentelemetry.io/docs/languages/js/" rel="noopener noreferrer"&gt;Open Telemetry Javascript&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/dashboard/enable-browser-telemetry?tabs=bash" rel="noopener noreferrer"&gt;Aspire - Enable Browser Telemetry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/aspnet/core/log-mon/metrics/metrics?view=aspnetcore-9.0#creating-metrics-in-aspnet-core-apps-with-imeterfactory" rel="noopener noreferrer"&gt;Creating metrics in ASP.NET Core apps with IMeterFactory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/custom-resource-commands" rel="noopener noreferrer"&gt;Custom Resource Commands in Aspire&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/photo-search/tree/main/src/PhotoSearch.AppHost" rel="noopener noreferrer"&gt;Project Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/dotnet/aspire/issues/921#issuecomment-2074272361" rel="noopener noreferrer"&gt;Previous method to define container startup dependencies&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>dotnet</category>
      <category>aspire</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Adding Map Based Photo Viewer to .Net Aspire Project with Stencil and OpenStreetMap Tile Server</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Sun, 01 Sep 2024 13:50:44 +0000</pubDate>
      <link>https://dev.to/syamaner/adding-map-based-photo-viewer-to-net-aspire-project-with-stencil-and-openstreetmap-tile-server-1fni</link>
      <guid>https://dev.to/syamaner/adding-map-based-photo-viewer-to-net-aspire-project-with-stencil-and-openstreetmap-tile-server-1fni</guid>
      <description>&lt;p&gt;Here is the next post on my journey building a personal photo search application using open source technologies as a testbed for .Net Aspire.&lt;/p&gt;

&lt;p&gt;Please note that, while these posts cover somehow to's, these are not necessary intended a tutorials on specific topics and instead an overview of how we can integrate these technologies to build something interesting. &lt;/p&gt;

&lt;p&gt;The following areas have been covered so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;.NET Aspire

&lt;ul&gt;
&lt;li&gt;Declaring resource dependencies so that our applications can wait until all referenced services have started successfully and fully initialised.&lt;/li&gt;
&lt;li&gt;How to use remote Docker daemon for the containers we depend when we don't want to overload our current development machine using SSH.&lt;/li&gt;
&lt;li&gt;How we can use SSH port forwarding in our App Host.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Machine Learning

&lt;ul&gt;
&lt;li&gt;How to use MultiModal ML Models for summarising and extracting information from photos.&lt;/li&gt;
&lt;li&gt;How we could integrate local models using Ollama or a simple Python project using Hugging Face hosted models.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Incorporated reverse geocoding into our solution with .Net Aspire and Open Street Map (OSM) Nominatim containers. &lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;In today's post the following topics will be explored:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building a simple web component using Stencil that will:

&lt;ul&gt;
&lt;li&gt;Provide a map Web Component to display Photos on a database.&lt;/li&gt;
&lt;li&gt;Provide a summary Web Component that will show the generated summaries of the photo using multiple models as well as the address, location and the predicted categories of the images using given models.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;A new .Net Aspire resource for Open Street Map (OSM) Tile Server so that our web component can render maps using the local (or using a remote docker daemon) resources.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What does it look like?
&lt;/h2&gt;

&lt;p&gt;So far, we have been able to import our Photos into MongoDB including GeoLocation and Metadata. In addition, reverse geocoding applied so that the GeoData is converted to the nearest address based on OSM data using Nominatim resource.&lt;/p&gt;

&lt;p&gt;Once the images are imported, then a background worker has generated Summary, Category and Content using a number of open source multi modal models and stored them in a dictionary against the photos.&lt;/p&gt;

&lt;p&gt;So the recent changes in the project mean, we can visualise these on the map and see what the models generated. We have not yet started looking into model evaluation so right now, we will see a number of accurate results as well as totally made up information which makes evaluation a critical part of this project. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljlipxaxhlfg9np2pow9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljlipxaxhlfg9np2pow9.png" alt="Map Component and Map Summary." width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the image is no clear enough here is the text content: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The image captures a lively scene of a band performing on stage. The stage is bathed in warm yellow lights, creating an atmosphere of excitement and energy. At the heart of the stage, three musicians are immersed in their performance. On the left, a guitarist strums his instrument with passion, his fingers moving over the strings as he plays a melody that fills the room. In the center, a singer belts out a tune, her voice echoing off the walls of the auditorium. To the right, a drummer beats out a rhythm on his drum set, his hands striking the drums in a steady beat. In the background, a large screen displays an image of the band, amplifying their presence and engaging with the audience. The stage is surrounded by a sea of spectators, some of whom are captured in the foreground of the image, their faces turned towards the performers. The perspective of the photo suggests it was taken from the viewpoint of someone standing close to the stage, immersing themselves in the concert experience. The image is a snapshot of a moment filled with music and energy, encapsulating the spirit of live performance.&lt;/p&gt;

&lt;p&gt;Photo Categories&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concert, music, performance&lt;/li&gt;
&lt;li&gt;Audience, stage, band&lt;/li&gt;
&lt;li&gt;Instruments, lighting, yellow lights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Photo Contents&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guitarist, singer, drummer&lt;/li&gt;
&lt;li&gt;Guitar, drum set, drums&lt;/li&gt;
&lt;li&gt;Screen, audience, stage lights&lt;/li&gt;
&lt;li&gt;Yellow light bulbs, spectators.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Web Components and Stencil
&lt;/h2&gt;

&lt;p&gt;Web components are a set of standardised technologies that allow developers to create reusable custom elements with encapsulated functionality.&lt;/p&gt;

&lt;p&gt;The key parts of Web Components can be summarised as the following: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom Elements: Define new HTML elements.&lt;/li&gt;
&lt;li&gt;Shadow DOM: Encapsulates styles and markup to prevent them from affecting the rest of the page.&lt;/li&gt;
&lt;li&gt;HTML Templates: Define chunks of markup that can be reused without rendering immediately.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Use Web Components?
&lt;/h3&gt;

&lt;p&gt;Web components offer lightweight reusability, allowing developers to create components that can be used across multiple projects without being tied to specific frameworks. They provide encapsulation by isolating styles and scripts, which helps prevent conflicts and bugs in the global scope. &lt;/p&gt;

&lt;p&gt;Additionally, web components are highly interoperable, which means they can be integrated in applications built with frameworks such as React, Vue.js, Angular as well as vanilla HTML / JavaScript, making them a future-proof solution for developers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web Components vs. Frameworks
&lt;/h3&gt;

&lt;p&gt;Unlike traditional frameworks, web components are not dependent on any specific framework since they are built on browser-native APIs. This foundation on web standards guarantees their longevity and stability, ensuring long-term support without the need to adapt to changes in framework-specific updates. Web components are also known for their performance benefits, as they can be lightweight and optimised without the runtime overhead that frameworks often introduce. &lt;/p&gt;

&lt;p&gt;For developers already familiar with HTML, CSS, and JavaScript, web components offer a more straightforward learning curve compared to adopting an entire new framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stencil
&lt;/h3&gt;

&lt;p&gt;Stencil is web component compiler that simplifies the process of building scalable, performant, and framework-agnostic web components. It is one of the many options we have for simplifying the build process for Web Components.&lt;/p&gt;

&lt;p&gt;The web components in this Project are built using Stencil.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding a Mapping UI to our Aspire Project as a Web Component.
&lt;/h2&gt;

&lt;p&gt;.NET Aspire already have support for NPM applications so adding aStencil Startup Application is as simple as following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a hello world application as outlined in &lt;a href="https://stenciljs.com/docs/getting-started" rel="noopener noreferrer"&gt;Stencil documentation&lt;/a&gt; in the same repository as our Aspire application.&lt;/li&gt;
&lt;li&gt;Restore the packages, add your components&lt;/li&gt;
&lt;li&gt;Then use AddNpmApp to register the application in our Aspire App Host.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddNpmApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"stencil"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"../photosearch-frontend"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;apiService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;osmTileService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithHttpEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;portMappings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"FEPort"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;PublicPort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;targetPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;portMappings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"FEPort"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;PrivatePort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"PORT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;isProxied&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PublishAsDockerFile&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This even supports auto reloading so we can keep modifying the UI source code and see the changes reflected quickly. I have been using Rider for the Aspire Project and VS Code for the Stencil or Python code.  &lt;/p&gt;

&lt;p&gt;The map component is simple and responsible for the following: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calling our .Net API endpoint to get photos.&lt;/li&gt;
&lt;li&gt;Render the photos on map. &lt;/li&gt;
&lt;li&gt;As the user makes a selection and views a photo on the map, it will also raise an event so that summary component can deploy details of the selected photo and summaries. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The component encapsulates and open source library for map rendering called "MapLibre GL JS" and uses our OSM Map Tile Server container to render the tiles. As we can see below, the service discovery is handled by Aspire and we do not have to worry about manually updating urls.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;
&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="nx"&gt;imports&lt;/span&gt; &lt;span class="nx"&gt;omitted&lt;/span&gt; 

&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Component&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;map-component&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;styleUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;map-component.css&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;shadow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MapComponent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="nl"&gt;mapElement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;HTMLElement&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;photoSummaries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PhotoSummary&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nl"&gt;map&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nl"&gt;markers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="nx"&gt;Marker&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

  &lt;span class="nx"&gt;loadPhotos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;API_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/photos&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;photoSummaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;photoSummaries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;marker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Marker&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;draggable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;setLngLat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Longitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Latitude&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imgUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;API_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/image/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/1280/1280`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setPopup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Popup&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;apple-popup&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHTML&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&amp;lt;img src='&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;imgUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' data-id="&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;" loading="lazy"&amp;gt;&amp;lt;/img&amp;gt;`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="nx"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPopup&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;setMaxWidth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;300px&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;popupElem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElement&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="nx"&gt;popupElem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;click&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;PubSub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;EventNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PhotoSelected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;markers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="nx"&gt;componentWillLoad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loadPhotos&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;disconnectCallback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;markers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;componentDidLoad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StyleSpecification&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;osm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;raster&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;tiles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MAP_TILE_SERVER&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/tile/{z}/{x}/{y}.png`&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="na"&gt;tileSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;attribution&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.....&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;osm&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;raster&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;osm&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mapElement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;center&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;photoSummaries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;Longitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;photoSummaries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;Latitude&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;zoom&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;photoSummaries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;markers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;addTo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;map&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mapElement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;HTMLElement&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service discovery for the Stencil Application
&lt;/h3&gt;

&lt;p&gt;Given Aspire injects connection strings of services we depend on, the Stencil configuration can read these variables and inject into components as following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;  &lt;span class="p"&gt;...,&lt;/span&gt;
  &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;API_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;services__apiservice__http__0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;MAP_TILE_SERVER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ConnectionStrings__OSMMapTileServer&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to use the component?
&lt;/h3&gt;

&lt;p&gt;Once the components ready we can use them just like any html elements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;flex mb-4 map-container&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;w-1/2 h-120&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;component&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/map-component&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;          &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;          &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;w-1/2   h-120&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;view&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/photo-summary-view&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;          &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Open Street Map (OSM) Tile Server .NET Aspire Resource
&lt;/h2&gt;

&lt;p&gt;We have a mapping web component but without a Map Tile server, we will not be Abe to render the maps. although there are some free tile servers for demo, it would be better not to overload those resources and ensure we consume our own computing power instead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hub.docker.com/r/overv/openstreetmap-tile-server/" rel="noopener noreferrer"&gt;OSM Tile Server Container&lt;/a&gt; makes this a simple job. And once we integrate with Aspire, we have a Tile Server running on demand without a complicated setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adjustments to the Tile Server Container
&lt;/h3&gt;

&lt;p&gt;As per documentation, we would need to run the container once to download the maps into our volume and then use the same volume ad run the container to host the map. The changes made in this post will allow to perform both actions at startup.&lt;/p&gt;

&lt;p&gt;The image built in this project will execute the following startup script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nb"&gt;cd&lt;/span&gt; /
./run.sh import
./run.sh run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Besides the above there is no change to the original OSM Tile Server Image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating an OSM Tile Server Resource
&lt;/h3&gt;

&lt;p&gt;The process for defining the resource is similar to Nominatim resources covered in one of the previous posts so will not be covered here but the source code is available at the &lt;a href="https://github.com/syamaner/photo-search" rel="noopener noreferrer"&gt;repository&lt;/a&gt; as always.&lt;/p&gt;

&lt;p&gt;The sample images are all across London only therefore we are using London maps to ensure quick downloading of maps and ability to setup the map database quickly at the first start of the container. &lt;/p&gt;

&lt;h2&gt;
  
  
  Where We are and What Next
&lt;/h2&gt;

&lt;p&gt;So far we have a means of importing photos and then processing them using various multi modal machine learning models. Generative models will often generate what is requested but the results are not often as desired. Before deciding how to use these models, it is important to have an evaluation approach. &lt;/p&gt;

&lt;p&gt;Now that we have a very basic UI, we can now focus on evaluating these models to find the best prompt / model combination for our search application.&lt;/p&gt;

&lt;p&gt;This means we will need to version our results to include the prompt used as well as the models used and then work out some metrics that will help choosing the combinations that offer highest success rate. &lt;/p&gt;

&lt;p&gt;Initial method will be looking into comparing results to those generated by models such as GPT-4 and see how or small / local models compare. &lt;/p&gt;

&lt;p&gt;This will be the focus of next post.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;Web Components&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://web.dev/articles/ps-on-the-web" rel="noopener noreferrer"&gt;Photoshop's journey to the web&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://eisenbergeffect.medium.com/2023-state-of-web-components-c8feb21d4f16" rel="noopener noreferrer"&gt;2023 State of Web Components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arewebcomponentsathingyet.com/" rel="noopener noreferrer"&gt;Are Web Components a Thing Yet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://eisenbergeffect.medium.com/libraries-and-frameworks-and-platforms-oh-my-f77a0ec3d57d" rel="noopener noreferrer"&gt;Libraries and Frameworks and Platforms, Oh My!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://eisenbergeffect.medium.com/debunking-web-component-myths-and-misconceptions-ea9bb13daf61" rel="noopener noreferrer"&gt;Debunking Web Component Myths and Misconceptions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://eisenbergeffect.medium.com/the-many-faces-of-a-web-component-fd974e2b1ee6" rel="noopener noreferrer"&gt;The Many Faces of a Web Component&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Web Component Tooling&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://lit.dev/" rel="noopener noreferrer"&gt;Lit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stenciljs.com/" rel="noopener noreferrer"&gt;Stencil&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fast.design/" rel="noopener noreferrer"&gt;FAST&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.salesforce.com/docs/platform/lwc/guide" rel="noopener noreferrer"&gt;Salesforce - Lightning Web Components (LWC)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open Street Maps&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://switch2osm.org/serving-tiles/using-a-docker-container/" rel="noopener noreferrer"&gt;Serving OSM Tiles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Overv/openstreetmap-tile-server/tree/master" rel="noopener noreferrer"&gt;OSM Tile Server Image&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://switch2osm.org/" rel="noopener noreferrer"&gt;Switch2OSM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stencil&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ionic.io/blog/advanced-stencil-component-styling" rel="noopener noreferrer"&gt;Advanced Stencil Component Styling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dotnet</category>
      <category>webcomponents</category>
      <category>aspire</category>
      <category>openstreetmap</category>
    </item>
    <item>
      <title>Simplifying Remote Docker Container Connections in .NET Aspire with SSH.Net</title>
      <dc:creator>syamaner</dc:creator>
      <pubDate>Sun, 04 Aug 2024 18:44:27 +0000</pubDate>
      <link>https://dev.to/syamaner/simplifying-remote-docker-container-connections-in-net-aspire-with-sshnet-207</link>
      <guid>https://dev.to/syamaner/simplifying-remote-docker-container-connections-in-net-aspire-with-sshnet-207</guid>
      <description>&lt;p&gt;Previously, I have shared my experience so far in my journey towards .NET Aspire using Photo Search use case by the help of multi-modal models. The key part for me was being able to use remote docker host which has worked well out of the box for most part. My focus so far is on the local development environments and therefore exploring different ways to achieve the original goal of running containers remotely on local network using SSH.&lt;/p&gt;

&lt;p&gt;As a recap, it all started with the question whether or not the scenario illustrated in the following image could be supported out of the box or not:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kqexteonfw9rz95ehn9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kqexteonfw9rz95ehn9.png" alt="Components overview" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this brief post, we will start with some of the challenges of the approach used so far, and then introduce the SSH Port forwarding method and how it improves on the shortcomings of the initial approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous Approach: Overriding Connection Strings When Using a Remote Host
&lt;/h2&gt;

&lt;p&gt;The approach so far has been as following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the containers:

&lt;ul&gt;
&lt;li&gt;Ensure the container port binding uses 0.0.0.0 to listen on all interfaces so that it can be reached from the local network.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container.WithContainerRuntimeArgs("-p", $"0.0.0.0:{publicPort}:{publicPort}");&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;For Aspire AppHost

&lt;ul&gt;
&lt;li&gt;Ensure DOCKER_HOST environment on the development machine is set.

&lt;ul&gt;
&lt;li&gt;The remote Docker host allows ssh connections from the development machine (e.g. ssh-copy-id is run to add our public key to the remote server)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;If creating a custom resource, ensure the connection strings and exposed endpoints use the IP of the Docker Host instead of localhost.&lt;/li&gt;

&lt;li&gt;For built in resources:

&lt;ul&gt;
&lt;li&gt;Either override the injected connection string environment variables.&lt;/li&gt;
&lt;li&gt;Or use connection string redirection.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The challenges with this approach are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Having to modify connection strings at runtime when using remote docker host can get messy as we add more resources.&lt;/li&gt;
&lt;li&gt;Connection string redirection succeeded for PostgreSQL but not for RabbitMQ so had to fall back to environment variables override. &lt;/li&gt;
&lt;li&gt;When we update the connection strings, the exposed endpoints in the Aspire Dashboard can still point to localhost or removed as they are no longer valid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For instance, this has worked for PostgreSQL connection string redirection to remote host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pgConnectionStringRedirection&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;CustomPostgresConnectionStringRedirection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dbName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;publicPort&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;pgUsername&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pgPassword&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;postgresContainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithConnectionStringRedirection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pgConnectionStringRedirection&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SSH.Net to the rescue
&lt;/h2&gt;

&lt;p&gt;As Docker CLI is using SSH to run the containers on the remote Docker Daemon, instead of making containers listen on all interfaces, we can instead do SSH port forwarding from our development machine into Docker Host machine. &lt;/p&gt;

&lt;p&gt;Wouldn't it be cool to orchestrate this from Aspire AppHost project so that it happens for us when the development orchestration starts and stopped when we spin down our development environment.&lt;/p&gt;

&lt;p&gt;The following diagram is a simplified illustration of this concept. The ports used for containers are forwarded using SSH from development machine to Docker Host. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxc20cw56bt6veyzrfikd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxc20cw56bt6veyzrfikd.png" alt="Using SSH Port Forwarding" width="786" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;p&gt;Given we have already had a successful SSH connection which allows managing containers on a remote host, forwarding container ports vis SSH not only made sense but also simplified the setup. &lt;/p&gt;

&lt;p&gt;When the AppHost starts we need to forward the ports as illustrated in the code snipped below. Instead of calling  &lt;code&gt;WithContainerRuntimeArgs("-p", $"0.0.0.0:{publicPort}:{publicPort}");&lt;/code&gt; on the resources, we need to ensure the Endpoints are not proxied as we are taking care of this using SSH Port forwarding instead. &lt;/p&gt;

&lt;p&gt;Some advantages are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connection strings and endpoints can be used without having to transform or injecting connection strings.&lt;/li&gt;
&lt;li&gt;No more having to call &lt;code&gt;WithContainerRuntimeArgs("-p", $"0.0.0.0:{publicPort}:{publicPort}");&lt;/code&gt; the containers will listen on localhost and no need to worry abut exposing on other network interfaces.&lt;/li&gt;
&lt;li&gt;Able to use the links on aspire Dashboard to access container resources.

&lt;ul&gt;
&lt;li&gt;Localhost on local machine is forwarded to the right host / port so this works transparently for us.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;
&lt;span class="c1"&gt;// ... usings&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DistributedApplication&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;dockerHost&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StartupHelper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetDockerHostValue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;enableNvidiaDocker&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StartupHelper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NvidiaDockerEnabled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// ... resources&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;apiService&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddProject&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Projects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PhotoSearch_API&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"apiservice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaContainer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postgresDb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;backgroundWorker&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddProject&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Projects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PhotoSearch_Worker&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"backgroundservice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaContainer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postgresDb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;florence3Api&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nominatimContainer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaContainer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nominatimContainer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// add ssh_user and ssh_key_file (path to the key file) for ser secrets.&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;sshUtility&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SShUtility&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dockerHost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Configuration&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"ssh_user"&lt;/span&gt;&lt;span class="p"&gt;]!,&lt;/span&gt;  &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Configuration&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"ssh_key_file"&lt;/span&gt;&lt;span class="p"&gt;]!);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;IsNullOrWhiteSpace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dockerHost&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Forwards the ports to the docker host machine&lt;/span&gt;
    &lt;span class="n"&gt;sshUtility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Connect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="c1"&gt;// PgAdmin&lt;/span&gt;
    &lt;span class="n"&gt;sshUtility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddForwardedPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;8081&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;8081&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Postgres&lt;/span&gt;
    &lt;span class="n"&gt;sshUtility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddForwardedPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// RabbitMQ&lt;/span&gt;
    &lt;span class="n"&gt;sshUtility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddForwardedPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;5672&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5672&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// RabbitMQ Management&lt;/span&gt;
    &lt;span class="n"&gt;sshUtility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddForwardedPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;15672&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;15672&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Nominatim&lt;/span&gt;
    &lt;span class="n"&gt;sshUtility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddForwardedPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;8180&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;8180&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Ollama&lt;/span&gt;
    &lt;span class="n"&gt;sshUtility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddForwardedPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;11438&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;11438&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we can see below. the dashboard URLs point to localhost and they are fully functional even though the containers are running remotely. Using SSH port forwarding improved this where previously the links in dashboard were either blank (removed endpoints as they were not valid) or listed incorrectly as localhost even though we had to access container services using the IP address of the Docker host. &lt;/p&gt;

&lt;p&gt;With port forwarding this issue has been resolved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vi3yrl7j2dafdhwqscx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vi3yrl7j2dafdhwqscx.png" alt="aspire Dashboard" width="800" height="134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;When using remote docker host, we can also do port forwarding via SSH which simplifies the setup and requires less code compared to overriding connection strings for applications. When we are running the containers locally, we don't do the port forwarding and let Aspire Endpoints do the job as intended.&lt;/p&gt;

&lt;p&gt;This also means this setup supports using GPU hosting providers such as &lt;a href="https://lambdalabs.com/service/gpu-cloud" rel="noopener noreferrer"&gt;Lambda Labs GPU Cloud&lt;/a&gt; for running containers for a few hours to experiment with large models that require GPUs with large VRAM and pay per use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/syamaner/photo-search" rel="noopener noreferrer"&gt;Current code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/sshnet/SSH.NET" rel="noopener noreferrer"&gt;SSH .Net&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dotnet</category>
      <category>aspire</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
