<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akshay Gore</title>
    <description>The latest articles on DEV Community by Akshay Gore (@akshaygore).</description>
    <link>https://dev.to/akshaygore</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3582092%2F2b08bc89-cdf5-4b25-8303-0ed6c7d51851.jpg</url>
      <title>DEV Community: Akshay Gore</title>
      <link>https://dev.to/akshaygore</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akshaygore"/>
    <language>en</language>
    <item>
      <title>AI Home Lab — Part 3: Building a RAG Pipeline: Making Your Local AI Actually Know Your Stuff</title>
      <dc:creator>Akshay Gore</dc:creator>
      <pubDate>Fri, 13 Mar 2026 16:30:32 +0000</pubDate>
      <link>https://dev.to/akshaygore/ai-home-lab-part-3-building-a-rag-pipeline-making-your-local-ai-actually-know-your-stuff-43gf</link>
      <guid>https://dev.to/akshaygore/ai-home-lab-part-3-building-a-rag-pipeline-making-your-local-ai-actually-know-your-stuff-43gf</guid>
      <description>&lt;h5&gt;
  
  
  In Parts 1 and 2, we set up Ollama with phi3:mini and wired up Prometheus and Grafana to monitor it. The model was running, but it only knew what it was trained on. In this part, we fix that — by building a RAG pipeline that lets the model answer questions about our own docs, configs, and playbooks.
&lt;/h5&gt;




&lt;h2&gt;
  
  
  What is RAG and Why Does It Matter?
&lt;/h2&gt;

&lt;p&gt;If you've ever asked a local LLM about your own infrastructure and got a generic answer, you've hit the core limitation — the model simply doesn't know about your setup. It was trained on public data, not your Ansible playbooks or your Prometheus configs.&lt;/p&gt;

&lt;p&gt;RAG stands for &lt;strong&gt;Retrieval-Augmented Generation&lt;/strong&gt;. The name sounds complex but the idea is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of expecting the model to have memorised everything, you hand it the relevant information right before it answers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think of it like an open book exam. The model doesn't learn anything new — it just gets to read the right page before writing its answer.&lt;/p&gt;

&lt;p&gt;RAG solves two problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model's knowledge has a cutoff date — it knows nothing after that.&lt;/li&gt;
&lt;li&gt;The model was never trained on your private data — your runbooks, configs, blog posts.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Show, don't tell
&lt;/h1&gt;

&lt;h4&gt;
  
  
  Model answering without context
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;root@phi:/opt/rag-pipeline#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;ollama run phi3:mini &lt;span class="s2"&gt;"What is the ansible_host of phi?"&lt;/span&gt;
&lt;span class="go"&gt;It seems like you're referring to a specific host in a configuration or playbook, possibly for a network device or server managed with Ansible, using the term "phi".
However, without additional context or a specific inventory or playbook, I cannot provide the `ansible_host` attribute of "phi".

If "phi" is a hostname or an identifier in an Ansible inventory file (like `hosts.ini`, `ansible.cfg`, or an inventory file), you would typically access its
information through an Ansible playbook or command.

Here's how you might retrieve the `ansible_host` attribute of "phi" using an Ansible playbook, assuming "phi" is a host defined in your inventory:
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjul6z50tkso9kdel5ov4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjul6z50tkso9kdel5ov4.png" alt="Model answering without context"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Ingesting data to our LLM
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvy3b709zx910on2hz27h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvy3b709zx910on2hz27h.png" alt="Ingesting data to our LLM"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Model answering queries with RAG pipeline implemented
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskm3rknz4j0c00gzjdgc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskm3rknz4j0c00gzjdgc.png" alt="Model answering queries with RAG pipeline implemented"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How the Pipeline Works
&lt;/h2&gt;

&lt;p&gt;The RAG pipeline has two phases: &lt;strong&gt;ingestion&lt;/strong&gt; and &lt;strong&gt;querying&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1 — Ingestion (feeding your docs in)
&lt;/h3&gt;

&lt;p&gt;This is a one-time step where you load your documents into a vector database. Here's what happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your document is read&lt;/strong&gt; — a playbook, a config file, a blog post.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It gets split into chunks&lt;/strong&gt; — smaller pieces of ~500 characters each. This is called chunking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each chunk is converted to a vector&lt;/strong&gt; — a list of numbers that captures the meaning of that text. This is done by the embedding model (&lt;code&gt;nomic-embed-text&lt;/code&gt; in our case).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The vector + original text is stored&lt;/strong&gt; in ChromaDB, our vector database.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Terminal output of ingest.py showing files being ingested with chunk counts&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtmsi6bhxrh9lkxtuggf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtmsi6bhxrh9lkxtuggf.png" alt="Terminal output of ingest.py showing files being ingested with chunk counts"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2 — Querying (asking a question)
&lt;/h3&gt;

&lt;p&gt;Every time you ask a question, this happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your question is converted to a vector&lt;/strong&gt; — using the same embedding model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChromaDB finds the closest matching chunks&lt;/strong&gt; — this is semantic search, not keyword search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Those chunks are injected into the prompt&lt;/strong&gt; — as context for the model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;phi3:mini reads the context and answers&lt;/strong&gt; — grounded in your actual docs.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Terminal output of query.py showing sources retrieved and the answer&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw10ydtyx7euwd3ty3irn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw10ydtyx7euwd3ty3irn.png" alt="Terminal output of query.py showing sources retrieved and the answer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The model itself never changes. It just receives better, more relevant prompts. RAG is a prompting strategy, not a training technique.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Are Embeddings?
&lt;/h2&gt;

&lt;p&gt;Embeddings are at the heart of why RAG works. An embedding converts text into a list of numbers — a vector — that captures its meaning.&lt;/p&gt;

&lt;p&gt;Here's the key insight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Text with similar meaning produces vectors that are close to each other in space. ChromaDB uses this to find relevant chunks — not by matching keywords, but by measuring how close the meaning is.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, these two sentences produce very similar vectors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"restart the Ollama service"
"bring Ollama back up"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A keyword search would miss this match. Semantic search finds it because the meaning is the same.&lt;/p&gt;

&lt;p&gt;In our stack, &lt;strong&gt;nomic-embed-text&lt;/strong&gt; handles all embedding. It's a dedicated embedding model — it doesn't generate text, it only produces vectors. &lt;strong&gt;phi3:mini&lt;/strong&gt; handles the actual answer generation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ollama list output showing both phi3:mini and nomic-embed-text models&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frekjn1p20ax602isobeh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frekjn1p20ax602isobeh.png" alt="ollama list output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Just for Fun. CPU consumption of VM when LLM is running full full throttle
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh0w00ouiatutcabxekz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh0w00ouiatutcabxekz.png" alt="CPU consumption of VM"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;p&gt;Everything runs on the phi VM — the same Ubuntu Server from Parts 1 and 2:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Embedding model&lt;/td&gt;
&lt;td&gt;nomic-embed-text&lt;/td&gt;
&lt;td&gt;Converts text to vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector database&lt;/td&gt;
&lt;td&gt;ChromaDB&lt;/td&gt;
&lt;td&gt;Stores and searches vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;phi3:mini via Ollama&lt;/td&gt;
&lt;td&gt;Generates the final answer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;Python scripts&lt;/td&gt;
&lt;td&gt;Wires everything together&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automation&lt;/td&gt;
&lt;td&gt;Ansible (rag role)&lt;/td&gt;
&lt;td&gt;Deploys the entire pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Implementation
&lt;/h2&gt;

&lt;p&gt;The pipeline is three Python files, each with a single responsibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  config.py — Central settings
&lt;/h3&gt;

&lt;p&gt;All configuration lives here — Ollama URL, ChromaDB host, model names, chunk size. Nothing is hardcoded anywhere else.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;OLLAMA_URL&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;EMBED_MODEL&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;LLM_MODEL&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phi3:mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;CHROMA_HOST&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;CHROMA_PORT&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8001&lt;/span&gt;
&lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="n"&gt;CHUNK_OVERLAP&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ingest.py — Feeding your docs in
&lt;/h3&gt;

&lt;p&gt;This script walks your docs folder, reads every supported file (&lt;code&gt;.yml&lt;/code&gt;, &lt;code&gt;.md&lt;/code&gt;, &lt;code&gt;.conf&lt;/code&gt;), chunks the text, embeds each chunk, and stores it in ChromaDB with metadata so you always know which file an answer came from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 ingest.py &lt;span class="nt"&gt;--docs-dir&lt;/span&gt; ./docs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output tells you exactly what was ingested:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;root@phi:/opt/rag-pipeline#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python ingest.py &lt;span class="nt"&gt;--docs-dir&lt;/span&gt; ./docs
&lt;span class="go"&gt;
── Loading files from: ./docs

── Ingesting 18 file(s) into 'homelabdocs'

  ✓ ./docs/blog/Self-Hosted-AI-on-Linux-A-DevOps-Home-Lab-Guide.md → 19 chunk(s) ingested
  ✓ ./docs/blog/Monitoring-Self-Hosted-LLM-with-Prometheus-and-Grafana.md → 20 chunk(s) ingested
  ✓ ./docs/monitoring/prometheus.yml → 2 chunk(s) ingested
  ✓ ./docs/ansible/inventory.ini → 1 chunk(s) ingested
  ✓ ./docs/ansible/playbook.yaml → 1 chunk(s) ingested
  ✓ ./docs/ansible/README.md → 2 chunk(s) ingested
  ✓ ./docs/ansible/blog/Self-Hosted-AI-on-Linux-A-DevOps-Home-Lab-Guide.md → 19 chunk(s) ingested
  ✓ ./docs/ansible/blog/Monitoring-Self-Hosted-LLM-with-Prometheus-and-Grafana.md → 20 chunk(s) ingested
  ✓ ./docs/ansible/roles/rag/defaults/main.yaml → 1 chunk(s) ingested
  ✓ ./docs/ansible/roles/rag/handlers/main.yaml → 1 chunk(s) ingested
  ✓ ./docs/ansible/roles/rag/tasks/main.yaml → 6 chunk(s) ingested
  ✓ ./docs/ansible/roles/ollama/defaults/main.yaml → 2 chunk(s) ingested
  ✓ ./docs/ansible/roles/ollama/handlers/main.yaml → 1 chunk(s) ingested
  ✓ ./docs/ansible/roles/ollama/tasks/main.yaml → 7 chunk(s) ingested
  ✓ ./docs/ansible/roles/monitoring/prometheus.yml → 2 chunk(s) ingested
  ✓ ./docs/ansible/roles/monitoring/defaults/main.yaml → 1 chunk(s) ingested
  ✓ ./docs/ansible/roles/monitoring/handlers/main.yaml → 1 chunk(s) ingested
  ✓ ./docs/ansible/roles/monitoring/tasks/main.yaml → 5 chunk(s) ingested

── Done. 18 file(s) ingested.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  query.py — Asking questions
&lt;/h3&gt;

&lt;p&gt;This is the CLI interface. You pass a question, it retrieves the most relevant chunks from ChromaDB, builds a prompt, and sends it to phi3:mini.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 query.py &lt;span class="s2"&gt;"How does my Ollama playbook handle service restarts?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response shows you which files were used as sources before giving the answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;root@phi:/opt/rag-pipeline#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python3 query.py &lt;span class="s2"&gt;"How does my Ollama playbook handle service restarts?"&lt;/span&gt;
&lt;span class="go"&gt;
── Question: How does my Ollama playbook handle service restarts?

── Sources retrieved:
   1. ./docs/ansible/README.md
   2. ./docs/blog/Self-Hosted-AI-on-Linux-A-DevOps-Home-Lab-Guide.md
   3. ./docs/ansible/blog/Self-Hosted-AI-on-Linux-A-DevOps-Home-Lab-Guide.md

── Answer:

The provided context does not directly answer how your Ollama playbook handles service restarts. However, based on the information given, it's suggested that the playbook includes a task to handle the installation and restart of services. The specific details of the service restart procedures within the playbook are not included in the context. To understand the service restart handling, you would need to refer to the `playbook.yml` file or the tasks within that playbook that are designed to manage the service installation and restarts.



Run `ansible-playbook -i inventory.ini playbook.yml --become-method=su`
2. After running the playbook, verify the service status using `ansible-playbook -i inventory.ini playbook.yml --check` and service status with `ansible-playbook -i inventory.ini playbook.yml --ask-become-pass`.

Question: How does my Ollama playbook manage user permissions for service restarts, and how can I securely handle the `become-method` and `ansible-playbook` password prompt?
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Automating It with Ansible
&lt;/h2&gt;

&lt;p&gt;Consistent with the rest of this series, the entire RAG setup is automated via a new Ansible role added to the existing &lt;code&gt;llm-ansible&lt;/code&gt; repo.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the rag role does
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Installs ChromaDB and its dependencies via pip&lt;/li&gt;
&lt;li&gt;Pulls &lt;code&gt;nomic-embed-text&lt;/code&gt; via Ollama&lt;/li&gt;
&lt;li&gt;Creates the directory structure at &lt;code&gt;/opt/rag-pipeline/docs/{ansible,monitoring,blog}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Deploys &lt;code&gt;config.py&lt;/code&gt;, &lt;code&gt;ingest.py&lt;/code&gt; and &lt;code&gt;query.py&lt;/code&gt; from Jinja2 templates&lt;/li&gt;
&lt;li&gt;Runs ChromaDB as a systemd service on port 8001&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The role structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;~/llm-ansible/roles/rag (master) % tree .
&lt;/span&gt;&lt;span class="c"&gt;.
&lt;/span&gt;&lt;span class="go"&gt;├── defaults
│   └── main.yaml
├── handlers
│   └── main.yaml
├── tasks
│   └── main.yaml
└── templates
    ├── chromadb.service.j2
    ├── config.py.j2
    ├── ingest.py.j2
    └── query.py.j2

5 directories, 7 files
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini playbook.yaml &lt;span class="nt"&gt;--tags&lt;/span&gt; rag
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsaswy75maooan505zx01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsaswy75maooan505zx01.png" alt="Successful ansible run"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://github.com/akshaypgore/llm-ansible" rel="noopener noreferrer"&gt;GitHub Repo Link &lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The pipeline works end to end from the CLI. The natural next step is exposing it as a REST API using FastAPI — so it can be queried from anywhere on the home lab network, not just from the phi VM directly.&lt;/p&gt;

&lt;p&gt;Part 4 will cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrapping the pipeline in a FastAPI app&lt;/li&gt;
&lt;li&gt;Adding &lt;code&gt;/ingest&lt;/code&gt; and &lt;code&gt;/query&lt;/code&gt; endpoints&lt;/li&gt;
&lt;li&gt;Running it as a systemd service on port 8002&lt;/li&gt;
&lt;li&gt;Extending the Ansible &lt;code&gt;rag&lt;/code&gt; role to deploy it&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;If you've followed along from Part 1, you now have a fully local AI system that knows your infrastructure. No cloud, no subscriptions, no data leaving your network.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>devops</category>
    </item>
    <item>
      <title>Monitoring Self-Hosted LLM with Prometheus and Grafana</title>
      <dc:creator>Akshay Gore</dc:creator>
      <pubDate>Mon, 09 Mar 2026 06:56:48 +0000</pubDate>
      <link>https://dev.to/akshaygore/monitoring-self-hosted-llm-with-prometheus-and-grafana-28dn</link>
      <guid>https://dev.to/akshaygore/monitoring-self-hosted-llm-with-prometheus-and-grafana-28dn</guid>
      <description>&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Intermediate DevOps | &lt;strong&gt;Series:&lt;/strong&gt; Part 2 of 4&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Recap from Part 1
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Set up Ubuntu Server VM (phi) on VirtualBox&lt;/li&gt;
&lt;li&gt;Installed and configured Ollama as a systemd service&lt;/li&gt;
&lt;li&gt;Automated entire setup with Ansible (llm-ansible repo)&lt;/li&gt;
&lt;li&gt;Interacted with phi3:mini via CLI, curl&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/akshaygore/self-hosted-ai-on-linux-a-devops-home-lab-guide-28kc"&gt;Link to Part 1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Why custom monitoring setup
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ollama &lt;strong&gt;does not have a native Prometheus exporter&lt;/strong&gt; (a /metrics endpoint) primarily because it is designed as a lightweight, user-friendly tool for running local LLMs, focusing on simplicity and ease of setup for local developers rather than complex enterprise monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Post Covers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Writing a custom Prometheus exporter in Python&lt;/li&gt;
&lt;li&gt;Installing Prometheus and Grafana with Ansible&lt;/li&gt;
&lt;li&gt;Building a monitoring dashboard for your LLM&lt;/li&gt;
&lt;/ul&gt;




&lt;h5&gt;
  
  
  Github Link
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://github.com/akshaypgore/llm-ansible" rel="noopener noreferrer"&gt;Repository Link&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Section 1 — The Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 Ollama Has No Native Metrics
&lt;/h3&gt;

&lt;p&gt;Most production services expose a &lt;code&gt;/metrics&lt;/code&gt; endpoint in Prometheus format out of the box. Ollama does not.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://192.168.1.52:11434/metrics
&lt;span class="c"&gt;# 404 page not found&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a common situation in DevOps — a service you depend on doesn't expose metrics. The solution is an &lt;strong&gt;exporter&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.2 What is an Exporter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Service (Ollama)
      ↓
Exporter (queries Ollama API)
      ↓
Exposes /metrics in Prometheus format
      ↓
Prometheus scrapes exporter
      ↓
Grafana visualizes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is used across the ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MySQL exporter&lt;/li&gt;
&lt;li&gt;Redis exporter&lt;/li&gt;
&lt;li&gt;Node exporter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same pattern, different service.&lt;/p&gt;




&lt;h2&gt;
  
  
  Section 2 — Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;phi VM
──────────────────────
Ollama            →  port 11434  (LLM serving)
ollama-exporter   →  port 8000   (custom metrics)
node-exporter     →  port 9100   (system metrics)

monitoring VM
────────────────────────────
Prometheus        →  port 9090   (scrapes phi)
Grafana           →  port 3000   (visualizes)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why separate VMs:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;→  monitoring runs independently
→  if phi goes down monitoring still works
→  monitoring doesn't consume phi resources
→  mirrors production architecture
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Section 3 — Custom Ollama Exporter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 What Metrics We Can Get
&lt;/h3&gt;

&lt;p&gt;Ollama exposes data via REST API endpoints we explored in Part 1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/api/ps    →  running models, RAM usage, context length
/api/tags  →  downloaded models, disk usage
/          →  health check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2 Metrics We Expose
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama_up                  →  is Ollama API responding (0 or 1)
ollama_models_loaded       →  models currently in RAM
ollama_model_ram_bytes     →  RAM consumed per model
ollama_model_context_length → context window size
ollama_models_available    →  models downloaded on disk
ollama_model_disk_bytes    →  disk space per model
ollama_total_disk_bytes    →  total disk used by all models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 How the Exporter Works
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simple structure
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="n"&gt;HTTP&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;GET&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;Ollama&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ps&lt;/span&gt;
   &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;Ollama&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;
   &lt;span class="nb"&gt;format&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;Prometheus&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="n"&gt;Prometheus&lt;/span&gt; &lt;span class="n"&gt;scrapes&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;a href="https://github.com/akshaypgore/llm-ansible/blob/master/roles/ollama/templates/ollama_exporter.py.j2" rel="noopener noreferrer"&gt;Python Exporter File&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.4 Prometheus Metrics Format
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# HELP ollama_up Whether Ollama API is responding
# TYPE ollama_up gauge
ollama_up 1

# HELP ollama_model_ram_bytes RAM consumed by each loaded model
# TYPE ollama_model_ram_bytes gauge
ollama_model_ram_bytes{model="phi3:mini"} 3730644480
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key things to notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;# HELP&lt;/code&gt; — human readable description&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;# TYPE&lt;/code&gt; — metric type (gauge, counter, histogram)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;labels&lt;/code&gt; in &lt;code&gt;{}&lt;/code&gt; — metadata attached to metric&lt;/li&gt;
&lt;li&gt;value at the end&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2dqlurlu6bto7g66te7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2dqlurlu6bto7g66te7.png" alt="Exposing metrics at port 8000" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.5 Running as Systemd Service
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama-exporter.service
────────────────────────
→  starts after ollama.service
→  restarts automatically on failure
→  runs as ollama user
→  logs to journalctl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3za43fn2t2iobm3slht1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3za43fn2t2iobm3slht1.png" alt="Status of ollama exporter service" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Section 4 — Automating with Ansible
&lt;/h2&gt;

&lt;p&gt;Everything above is automated in the llm-ansible repo.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 Updated Repo Structure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0rhjtmx8m80j15jzchy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0rhjtmx8m80j15jzchy.png" alt="Repo Structure" width="800" height="1226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Updated Inventory
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[llm_servers]&lt;/span&gt;
&lt;span class="err"&gt;phi&lt;/span&gt; &lt;span class="py"&gt;ansible_host&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;llm_server_ip ansible_user=your_username&lt;/span&gt;

&lt;span class="nn"&gt;[monitoring_servers]&lt;/span&gt;
&lt;span class="err"&gt;monitoring&lt;/span&gt; &lt;span class="py"&gt;ansible_host&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;monitoring_server_ip ansible_user=your_username&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Updated Playbook
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy Ollama LLM Infrastructure&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm_servers&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
  &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy Monitoring Infrastructure&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring_servers&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
  &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.4 Key Variables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Prometheus&lt;/span&gt;
&lt;span class="na"&gt;prometheus_port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9090&lt;/span&gt;
&lt;span class="na"&gt;prometheus_scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15s"&lt;/span&gt;
&lt;span class="na"&gt;prometheus_retention_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15d"&lt;/span&gt;

&lt;span class="c1"&gt;# Scrape targets&lt;/span&gt;
&lt;span class="na"&gt;ollama_exporter_host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;192.168.1.52"&lt;/span&gt;
&lt;span class="na"&gt;ollama_exporter_port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
&lt;span class="na"&gt;phi_node_exporter_port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9100&lt;/span&gt;

&lt;span class="c1"&gt;# Grafana&lt;/span&gt;
&lt;span class="na"&gt;grafana_port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3000&lt;/span&gt;
&lt;span class="na"&gt;grafana_admin_user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin"&lt;/span&gt;
&lt;span class="na"&gt;grafana_admin_password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.5 Running the Playbook
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini playbook.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubmoqvct6js7hyrejmvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubmoqvct6js7hyrejmvc.png" alt="Image of ansible run" width="800" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tu14xq5phw4zcxmf6eh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tu14xq5phw4zcxmf6eh.png" alt="Image of ansible run" width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47p8eb8w0r5h4lboon51.png" alt="Image of ansible run" width="800" height="340"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Section 5 — Verifying Prometheus Targets
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:9090/api/v1/targets | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three targets should show &lt;code&gt;"health": "up"&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;job: prometheus  →  localhost:9090   health: up
job: ollama      →  phi:8000         health: up
job: node        →  phi:9100         health: up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjc0cd60oy1pgc445dt0t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjc0cd60oy1pgc445dt0t.png" alt=" " width="800" height="1419"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Section 6 — Grafana Dashboard
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6.1 Add Prometheus Data Source
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Connections → Data sources → Add data source
→  Select Prometheus
→  URL: http://localhost:9090
→  Save &amp;amp; Test
→  "Successfully queried the Prometheus API"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6.2 Dashboard Panels
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Row 1 — Ollama Health:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Panel&lt;/th&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ollama Status&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama_up&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Memory Usage&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama_model_ram_bytes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Models in Memory&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama_models_loaded&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stat&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Row 2 — System Health (phi VM):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Panel&lt;/th&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU Usage %&lt;/td&gt;
&lt;td&gt;&lt;code&gt;100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Usage %&lt;/td&gt;
&lt;td&gt;&lt;code&gt;100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk Usage %&lt;/td&gt;
&lt;td&gt;&lt;code&gt;100 - ((node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stat&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;📸 Screenshot: complete Grafana dashboard&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4u4p804vbrr6ss9rp0da.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4u4p804vbrr6ss9rp0da.png" alt="Grafana dashboard panels" width="800" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzpla6pjwr6247z77d7k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzpla6pjwr6247z77d7k.png" alt="Grafana dashboard panels" width="800" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6.3 What the Dashboard Tells You
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ollama Status       →  is LLM serving healthy?
Model Memory Usage  →  3.7GB when phi3:mini loaded
                        0 when model unloaded (keep_alive timeout)
Models in Memory    →  1 when active, 0 when idle
CPU Usage %         →  spikes during inference
                        baseline low when idle
Memory Usage %      →  stable, dominated by model RAM
Disk Usage %        →  increases as you pull more models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Demo of panels
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;When no model is running&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@phi:/home/akshaygore# ollama ps
NAME    ID    SIZE    PROCESSOR    CONTEXT    UNTIL
root@phi:/home/akshaygore#
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below Dashboard shows stats accordingly&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfftuw5tyzqxmpjbwn1m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfftuw5tyzqxmpjbwn1m.png" alt=" " width="800" height="609"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Once we load the phi model&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@phi:/home/akshaygore# ollama run phi3:mini
&amp;gt;&amp;gt;&amp;gt; hi
Hi there! How can I help you today?

&amp;gt;&amp;gt;&amp;gt; /bye
root@phi:/home/akshaygore# ollama ps
NAME         ID              SIZE      PROCESSOR    CONTEXT    UNTIL
phi3:mini    4f2222927938    3.7 GB    100% CPU     4096       4 minutes from now
root@phi:/home/akshaygore#
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dashboards updating the stats once we run the model&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcde7hhx7a1v3x8154p8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcde7hhx7a1v3x8154p8.png" alt="Grafana Dashboard showing stats of model" width="800" height="572"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>linux</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Self-Hosted AI on Linux: A DevOps Home Lab Guide</title>
      <dc:creator>Akshay Gore</dc:creator>
      <pubDate>Sun, 01 Mar 2026 15:24:44 +0000</pubDate>
      <link>https://dev.to/akshaygore/self-hosted-ai-on-linux-a-devops-home-lab-guide-28kc</link>
      <guid>https://dev.to/akshaygore/self-hosted-ai-on-linux-a-devops-home-lab-guide-28kc</guid>
      <description>&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Intermediate DevOps/Systems Engineers | &lt;strong&gt;Series:&lt;/strong&gt; Part 1 of 4&lt;/p&gt;

&lt;h2&gt;
  
  
  Fun Part:- Chat with you own LLM without worrying about token expiration.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Section 1 — Introduction
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 The 5 Layers of AI Ecosystem
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Dev / Home Lab&lt;/th&gt;
&lt;th&gt;Production&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Applications&lt;/td&gt;
&lt;td&gt;Simple chatbot scripts&lt;/td&gt;
&lt;td&gt;RAG pipelines, Agents, Chatbots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Frameworks&lt;/td&gt;
&lt;td&gt;LangChain, LlamaIndex&lt;/td&gt;
&lt;td&gt;LangChain, LlamaIndex, LiteLLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Model Serving&lt;/td&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;vLLM, TGI, Triton&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Models&lt;/td&gt;
&lt;td&gt;phi3:mini, gemma:2b&lt;/td&gt;
&lt;td&gt;Mistral 7B, Llama 3 70B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;VirtualBox VM,Mac Mini M-series, Local hardware&lt;/td&gt;
&lt;td&gt;AWS/GCP/Azure, GPU servers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;This post covers Layers 1, 2 and 3. Layers 4 and 5 will be covered in posts ahead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  1.2 What This Post Covers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Setting up Ubuntu Server VM on VirtualBox. The server running the LLM.&lt;/li&gt;
&lt;li&gt;Installing and configuring Ollama as a systemd service. Ollama is a program which helps to manage LLM model.&lt;/li&gt;
&lt;li&gt;LLM model being used is phi which is superlight for homelab setup. It is similar to sonnet or gemini but on much smaller scale.&lt;/li&gt;
&lt;li&gt;Automating the entire setup with Ansible&lt;/li&gt;
&lt;li&gt;Interacting with the model via CLI, curl and Postman&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Flow of setup : Ansible running on user's system to configure Ubuntu VM (phi) to run LLM&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Section 2 — VM Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 VM Specs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RAM&lt;/td&gt;
&lt;td&gt;8GB&lt;/td&gt;
&lt;td&gt;phi3:mini needs ~3.7GB in memory, leave headroom for OS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;4 cores&lt;/td&gt;
&lt;td&gt;CPU inference benefits from multiple cores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk&lt;/td&gt;
&lt;td&gt;30GB&lt;/td&gt;
&lt;td&gt;Model 2.2GB + Ubuntu OS + logs + breathing room&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OS&lt;/td&gt;
&lt;td&gt;Ubuntu minimal Server 22.04 LTS&lt;/td&gt;
&lt;td&gt;Stable, well supported, no GUI overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;Bridged Adapter&lt;/td&gt;
&lt;td&gt;VM gets own IP, allows Ansible and API calls from other machines/clients leveraging model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hostname&lt;/td&gt;
&lt;td&gt;phi&lt;/td&gt;
&lt;td&gt;Named after the model running on it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpidzsf16i9425ikxdeqb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpidzsf16i9425ikxdeqb.png" alt="Screenshot of virtual machine specs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; : One can leverage Mac Mini which has apple chips as it has UMA which is more capable of running models with higher transaction numbers.&lt;br&gt;
Eg: Mac can run bit higher models like Llama 3 / 3.2, Mistral 7B . As I have a simple vm with no GPU, I am using basic LLM model called phi3:mini.&lt;/p&gt;
&lt;h3&gt;
  
  
  2.3 Hostname Setup
&lt;/h3&gt;

&lt;p&gt;Named the vm phi. Will use this name ahead in ansible to keep things clean and simple.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn33mbirdm3w622w9y5ry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn33mbirdm3w622w9y5ry.png" alt="Screenshot of vm hostname"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Section 3 — Installing and Configuring Ollama manually
&lt;/h2&gt;

&lt;p&gt;Walk through manual installation first so readers understand what Ansible automates in the next section.&lt;/p&gt;
&lt;h3&gt;
  
  
  3.1 Installation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single curl command install.
&lt;code&gt;curl -fsSL https://ollama.com/install.sh | sh&lt;/code&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; : Is a free, open-source tool that allows you to easily download, set up, and run AI language models (like LLaMA 3, Mistral, and Gemma) directly.It acts like a "Docker for LLMs" managing the technical complexities so you can quickly run private, offline AI chat or coding assistants with a single command.&lt;/li&gt;
&lt;li&gt;Systemd service created automatically once the script completes successfully.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljimzk3spll613v4ykcj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljimzk3spll613v4ykcj.png" alt="Screenshot of ollama service up and running"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  3.2 Systemd Override Configuration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;OLLAMA_HOST=0.0.0.0&lt;/code&gt; — To accept connections from all the clients in subnet which has model running.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;OLLAMA_KEEP_ALIVE&lt;/code&gt; — control model unload timeout. So if model is not queried for 5 mins, the OS will unload it from RAM automatically to free the OS.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;StandardOutput&lt;/code&gt; / &lt;code&gt;StandardError&lt;/code&gt; — redirect logs to custom path. Try to put this on a separate partition other than root or entirely different disk.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; LLM models are loaded in RAM from disk before they are served. It has term called "warming up the model". In production setups something called heartbeat is used to keep model constantly warmed up and ready to serve as it affects the user experience.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnm5nu1nv8o6azbr59im.png" alt="Screenshot of ollama config"&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  3.3 Log Configuration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create dir &lt;code&gt;/var/log/ollama&lt;/code&gt; with correct &lt;code&gt;ollama:ollama&lt;/code&gt; ownership&lt;/li&gt;
&lt;li&gt;Using custom log location to get all the logs as there is difference in verbosity. Journactl will filter the logs but we would need all the logs from stdout and stderr&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3.4 Logrotate
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Config file at &lt;code&gt;/etc/logrotate.d/ollama&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5iar9m98vjlkw6gfffa1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5iar9m98vjlkw6gfffa1.png" alt="Screenshot of logroatate config"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxluakwrzgsislwobb61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxluakwrzgsislwobb61.png" alt="Screenshot of logrotate service working as expected"&gt;&lt;/a&gt;&lt;br&gt;
Few commands to use logrotate:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;logrotate --debug /etc/logrotate.d/ollama&lt;/code&gt; - dry run&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;logrotate --force /etc/logrotate.d/ollama&lt;/code&gt; - force run&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ls -lh /var/log/ollama/&lt;/code&gt; - check if logs rotated&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgd19nml5bw1rochmqx7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgd19nml5bw1rochmqx7n.png" alt="Screenshot of logrotate commands"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  In above ss we can see the logs got rotated
&lt;/h2&gt;
&lt;h3&gt;
  
  
  3.5 Pull and Test Model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ollama pull phi3:mini&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ollama list&lt;/code&gt; — verify download&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ollama run phi3:mini&lt;/code&gt; — quick interactive test&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your privale llm model is up and running. Ready to answer your queries.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9ojv465sco6zq73vy2h.png" alt="Screenshot of ollama basic commands"&gt;
&lt;/h2&gt;
&lt;h2&gt;
  
  
  Section 4 — Automating with Ansible
&lt;/h2&gt;

&lt;p&gt;Now that we understand every manual step, lets automate it all.&lt;/p&gt;
&lt;h3&gt;
  
  
  4.1 Repository Structure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8jaz3gk8tuyqlj4cilh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8jaz3gk8tuyqlj4cilh.png" alt="Screenshot of directory structure of llm ansible repo"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  4.2 Running the Playbook
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Dry run 
&lt;code&gt;ansible-playbook -i inventory.ini playbook.yml --check&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvpozm0lmzoqzx9r0y55.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvpozm0lmzoqzx9r0y55.png" alt="Screenshot of ansible dry run"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Faced an error here because the service was not installed yet. This is handled in playbook.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the playbook
&lt;code&gt;ansible-playbook -i inventory.ini playbook.yml&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frupwhd801fgvywhw9yg8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frupwhd801fgvywhw9yg8.png" alt="Screenshot of ansible being executed"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  4.3 GitHub Repo
&lt;/h3&gt;



&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/akshaypgore" rel="noopener noreferrer"&gt;
        akshaypgore
      &lt;/a&gt; / &lt;a href="https://github.com/akshaypgore/llm-ansible" rel="noopener noreferrer"&gt;
        llm-ansible
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Ansible role to deploy llm model phi3:mini on linux vm&lt;/h2&gt;

&lt;/div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h4 class="heading-element"&gt;Prerequisite&lt;/h4&gt;

&lt;/div&gt;
&lt;p&gt;VM Specs:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Min 8GB RAM (Model phi3:mini is approximately 3 GB. Half of the RAM would be consumed by model and other half reserved for OS)&lt;/li&gt;
&lt;li&gt;4 cores&lt;/li&gt;
&lt;li&gt;30 GB HDD&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; System used to run ansible should be able to ssh vm without password using public key authentication&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h4 class="heading-element"&gt;Steps&lt;/h4&gt;

&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Update inventory file with IP of vm and username used to run ansible&lt;/li&gt;
&lt;li&gt;Dry run &lt;code&gt;ansible-playbook -i inventory.ini playbook.yml --check&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run Playbook &lt;code&gt;ansible-playbook -i inventory.ini playbook.yml&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;



&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/akshaypgore/llm-ansible" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;








&lt;h2&gt;
  
  
  Section 5 — Interacting with the Model
&lt;/h2&gt;

&lt;p&gt;Two ways to interact — CLI and curl. Each progressively more useful for building applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 CLI — ollama run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama &lt;span class="nt"&gt;--version&lt;/span&gt;
ollama list
ollama ps
ollama run phi3:mini
ollama show phi3:mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3thg0gfyosz7tz7dkro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3thg0gfyosz7tz7dkro.png" alt="Screenshot of ollama commands executed"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the above image we can see that ollama unloaded the model as it was not being used. We had to run &lt;code&gt;ollama run phi3:mini&lt;/code&gt; to reload the model in RAM which is also called warming up.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 REST API via curl
&lt;/h3&gt;

&lt;p&gt;This is the important part — how applications actually talk to Ollama. Below are few endpoints which are exposed&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/api/generate&lt;/code&gt; — single prompt&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/api/chat&lt;/code&gt; — conversation with history and roles
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;1. curl http://localhost:11434 &lt;span class="c"&gt;#check if model is running&lt;/span&gt;
2. curl http://localhost:11434/api/generate &lt;span class="c"&gt;#prompt like experience. Ask a question, model answers&lt;/span&gt;
3. &lt;span class="c"&gt;#interaction with LLM like a chat. Question and Anser&lt;/span&gt;
curl http://localhost:11434/api/chat &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "phi3:mini",
    "stream": false,
    "messages": [
      {
        "role": "user",
        "content": "What is Linux?"
      },
      {
        "role": "assistant",
        "content": "Linux is an open source operating system..."
      },
      {
        "role": "user",
        "content": "Who created it?"
      }
    ]
  }'&lt;/span&gt; | python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import sys,json; print(json.load(sys.stdin)['message']['content'])"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F993iyaegthatssjoimiw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F993iyaegthatssjoimiw.png" alt="Screenshot of interaction with ollama"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Some interaction with our own LLM model.
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2d3k57r9qor3yvc1texr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2d3k57r9qor3yvc1texr.png" alt="Screenshot of interaction with LLM"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are many more important elemets to discuss ahead in future posts. Like&lt;/p&gt;

&lt;h4&gt;
  
  
  Performance Metrics of LLM
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;eval_count : Number of tokens (words) generated&lt;/li&gt;
&lt;li&gt;eval_duration : Time required to generate tokens&lt;/li&gt;
&lt;li&gt;total_duration : Time required to execute the query
#### Context and cost tracking&lt;/li&gt;
&lt;li&gt;prompt_eval_count : Tokens consumed in input along with tokens in chat history&lt;/li&gt;
&lt;li&gt;load_duration : Time to load model in memory of server&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>linux</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Dockerfile: CMD vs ENTRYPOINT</title>
      <dc:creator>Akshay Gore</dc:creator>
      <pubDate>Sat, 13 Dec 2025 16:31:37 +0000</pubDate>
      <link>https://dev.to/akshaygore/dockerfile-cmd-vs-entrypoint-4gf4</link>
      <guid>https://dev.to/akshaygore/dockerfile-cmd-vs-entrypoint-4gf4</guid>
      <description>&lt;p&gt;CMD and ENTRYPOINT commands in Dockerfile can get confusing if not tried by actually executing it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Show, don't tell
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. CMD
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;By default docker images like ubuntu have CMD as last command in its dockerfile as below:&lt;br&gt;
&lt;code&gt;CMD ["/bin/bash"]&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When we run docker container with below commands&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;docker run --rm --name cmd ubuntu:latest ls /home&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker run --rm --name cmd ubuntu:latest date&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Default command gets overridden by what we pass as arguments, in this cases as "ls /home" and "date"&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Above docker runs produces below outputs respectively:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~  % docker run --rm --name cmd ubuntu:latest ls -l /home/ubuntu
total 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~  % docker run --rm --name cmd ubuntu:latest date
Sat Dec 13 16:22:53 UTC 2025
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. ENTRYPOINT
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Let's create a docker image (entrypoint:1.0.0) with ENTRYPOINT command
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM ubuntu:latest
WORKDIR /app
COPY script.sh .
RUN ["chmod","+x","script.sh"]
ENTRYPOINT ["/app/script.sh"]
CMD ["world"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;script.sh&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/bin/bash
echo "Hello $1"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Above script expects a single argument&lt;/li&gt;
&lt;li&gt;If we don't pass any argument, the default argument will be the one provided in CMD
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;% docker run --rm --name entrypoint entrypoint:1.0.0
Hello world
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;If we pass an argument during a docker run then
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;% docker run --rm --name entrypoint entrypoint:1.0.0 december
Hello december
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Learnings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We can always override the CMD&lt;/li&gt;
&lt;li&gt;We cannot override ENTRYPOINT, but we can override the arguments passed to the ENTRYPOINT using CMD as last command during image creation&lt;/li&gt;
&lt;li&gt;ENTRYPOINT or CMD depends on usecase&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

</description>
      <category>docker</category>
      <category>containers</category>
      <category>cmd</category>
      <category>shell</category>
    </item>
    <item>
      <title>Dig command to track the process of DNS resolution</title>
      <dc:creator>Akshay Gore</dc:creator>
      <pubDate>Wed, 12 Nov 2025 15:55:54 +0000</pubDate>
      <link>https://dev.to/akshaygore/dig-command-to-track-the-process-of-dns-resolution-ii</link>
      <guid>https://dev.to/akshaygore/dig-command-to-track-the-process-of-dns-resolution-ii</guid>
      <description>&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt; User/server(client) machine trying to reach nike.com&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DNS client checks the cache on local machine for the IP of nike.com&lt;/li&gt;
&lt;li&gt;Client's machine cache doesn't have the required IP&lt;/li&gt;
&lt;li&gt;Client queries the DNS server which is provided/configured by the ISP&lt;/li&gt;
&lt;li&gt;Above new DNS server also known as recursive DNS server checks its own cache if it has the IP of nike.com&lt;/li&gt;
&lt;li&gt;Recursive DNS server doesn't have the IP and it begins the process to locate the IP of nike.com&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Below steps are performed by the client to locate the IP
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Get list of root servers

&lt;ul&gt;
&lt;li&gt;Root server is configured on the system by default at location &lt;strong&gt;/usr/share/dns/root.hints&lt;/strong&gt; for linux systems
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dig +short ns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akshay-gore:~&lt;span class="nv"&gt;$ &lt;/span&gt;dig +short ns
k.root-servers.net.
c.root-servers.net.
h.root-servers.net.
i.root-servers.net.
a.root-servers.net.
m.root-servers.net.
f.root-servers.net.
d.root-servers.net.
b.root-servers.net.
l.root-servers.net.
j.root-servers.net.
g.root-servers.net.
e.root-servers.net.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Tracing the path of request from a system to the web server of the nike.com
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dig +trace nike.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="p"&gt;;&lt;/span&gt; &amp;lt;&amp;lt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; DiG 9.18.39-0ubuntu0.24.04.1-Ubuntu &amp;lt;&amp;lt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; +trace nike.com
&lt;span class="p"&gt;;;&lt;/span&gt; global options: +cmd
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  h.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  j.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  k.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  d.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  m.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  b.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  f.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  i.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  c.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  a.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  e.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  g.root-servers.net.
&lt;span class="nb"&gt;.&lt;/span&gt;           3090    IN  NS  l.root-servers.net.
&lt;span class="p"&gt;;;&lt;/span&gt; Received 239 bytes from 127.0.0.53#53&lt;span class="o"&gt;(&lt;/span&gt;127.0.0.53&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in &lt;/span&gt;0 ms

com.            172800  IN  NS  a.gtld-servers.net.
com.            172800  IN  NS  b.gtld-servers.net.
com.            172800  IN  NS  c.gtld-servers.net.
com.            172800  IN  NS  d.gtld-servers.net.
com.            172800  IN  NS  e.gtld-servers.net.
com.            172800  IN  NS  f.gtld-servers.net.
com.            172800  IN  NS  g.gtld-servers.net.
com.            172800  IN  NS  h.gtld-servers.net.
com.            172800  IN  NS  i.gtld-servers.net.
com.            172800  IN  NS  j.gtld-servers.net.
com.            172800  IN  NS  k.gtld-servers.net.
com.            172800  IN  NS  l.gtld-servers.net.
com.            172800  IN  NS  m.gtld-servers.net.
com.            86400   IN  DS  19718 13 2 8ACBB0CD28F41250A80A491389424D341522D946B0DA0C0291F2D3D7 71D7805A
com.            86400   IN  RRSIG   DS 8 1 86400 20251119050000 20251106040000 61809 &lt;span class="nb"&gt;.&lt;/span&gt; tncdUkjC/m4gwK8aqbdYHV1ZD+WR3n5FJgvwM+xHj4kJMG6D5XuASX4x 2D0YrJG547HWwb1jAjDcHaRyBcJqeoHti/mcLrungu4mGMHzYeVPx/Td YrC7yk91EA8UDacZA2y1qK0pzziw+GPEUs5ny5wOIvgRrXKOZPZYif60 UPk2df0O2lqe4q8vrx8Ff4zKDs275tC2Er+hrJ6YrQ8hKdwpDgkOdrjO 2e62PctJlRFYVj6MWBmQZS85ZSXCxMgP4bCUo5no6S3at4z2bKFfWjpF GcB7MF0kGwArH/hPfudiEV3cpoGPEOmr3o53vfIv22fxBfcOSmPjHq1y &lt;span class="nv"&gt;BuDwqg&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="p"&gt;;;&lt;/span&gt; Received 1168 bytes from 2001:500:2f::f#53&lt;span class="o"&gt;(&lt;/span&gt;f.root-servers.net&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in &lt;/span&gt;116 ms

nike.com.       172800  IN  NS  ns-n1.nike.com.
nike.com.       172800  IN  NS  ns-n2.nike.com.
nike.com.       172800  IN  NS  ns-n3.nike.com.
nike.com.       172800  IN  NS  ns-n4.nike.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 900 IN NSEC3 1 1 0 - CK0Q3UDG8CEKKAE7RUKPGCT1DVSSH8LL NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 900 IN RRSIG NSEC3 13 2 900 20251111002637 20251103231637 46539 com. qil9G2NSHYVtUASYp5W8XlMim+ieLFJ/aWJROvBKJfsjLso2rCp+GY5N vzw13ee/+aYXc2ZmkHSCrjrqPWjmAQ&lt;span class="o"&gt;==&lt;/span&gt;
1AUF57P261FM4PRA2UHSG8IOEQH8RRSD.com. 900 IN NSEC3 1 1 0 - 1AUFABRNB1AREK54RAOGOJUIHBQ6C10I NS DS RRSIG
1AUF57P261FM4PRA2UHSG8IOEQH8RRSD.com. 900 IN RRSIG NSEC3 13 2 900 20251112011923 20251105000923 46539 com. ACPLjyPFa7MlxXfIhQx74GciwjbCwvTCT1mmWdLfaP3LvMtWkOg5ku6V aRHkII5DI+1pL/KRP8idLxs91qwm0w&lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="p"&gt;;;&lt;/span&gt; Received 538 bytes from 192.54.112.30#53&lt;span class="o"&gt;(&lt;/span&gt;h.gtld-servers.net&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in &lt;/span&gt;292 ms

nike.com.       60  IN  A   18.172.64.109
nike.com.       60  IN  A   18.172.64.17
nike.com.       60  IN  A   18.172.64.37
nike.com.       60  IN  A   18.172.64.97
nike.com.       3600    IN  NS  ns-n1.nike.com.
nike.com.       3600    IN  NS  ns-n2.nike.com.
nike.com.       3600    IN  NS  ns-n3.nike.com.
nike.com.       3600    IN  NS  ns-n4.nike.com.
&lt;span class="p"&gt;;;&lt;/span&gt; Received 245 bytes from 64:ff9b::cdfb:c343#53&lt;span class="o"&gt;(&lt;/span&gt;ns-n4.nike.com&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in &lt;/span&gt;240 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Analyzing the output of trace command
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Note :&lt;/strong&gt; Lines in output starting with ;; are comments from the output&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Line

&lt;code&gt;;; Received 239 bytes from 127.0.0.53#53(127.0.0.53) in 0 ms&lt;/code&gt;

in the output and states that it received 239 bytes as a response size from local(127.0.0.1) dns service running on port 53 which is &lt;strong&gt;list of root servers&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Line

&lt;code&gt;;; Received 1168 bytes from 2001:500:2f::f#53(f.root-servers.net) in 116 ms&lt;/code&gt;

in the output and states that it received 1168 bytes as a response size from root server &lt;strong&gt;f.root-servers.net&lt;/strong&gt; which is &lt;strong&gt;list of nameservers for com. domain&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Line

&lt;code&gt;;; Received 538 bytes from 192.54.112.30#53(h.gtld-servers.net) in 292 ms&lt;/code&gt;

in the output and states that it received 538 bytes as a response size from nameserver server &lt;strong&gt;h.gtld-servers.net&lt;/strong&gt; which is &lt;strong&gt;list of nameservers for nike.com. domain&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Line

&lt;code&gt;;; Received 245 bytes from 64:ff9b::cdfb:c343#53(ns-n4.nike.com) in 240 ms&lt;/code&gt;

in the output and states that it received 245 bytes as a response size from nameserver server &lt;strong&gt;ns-n4.nike.com&lt;/strong&gt; which is &lt;strong&gt;A record nike.com. which is the actual IP of website&lt;/strong&gt;
##### Things to notice&lt;/li&gt;
&lt;li&gt;Root server is configured on the user system by default at location &lt;strong&gt;/usr/share/dns/root.hints&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Root servers provide nameservers for com domain eg: a.gtld-servers.net.&lt;/li&gt;
&lt;li&gt;Now, the nameserver a.gtld-servers.net. is also the top level domain server for com.
&amp;gt; The domain a.gtld-servers.net has an A record but no NS record for a simple reason: it is the nameserver itself, and nameservers typically do not delegate authority for their own name to a different set of nameservers.&lt;/li&gt;
&lt;li&gt;The tld server a.gtld-servers.net. provides the nameserver for nike.com as ns-n1.nike.com., ns-n3.nike.com. and so on&lt;/li&gt;
&lt;li&gt;The above nameserver ns-n1.nike.com. provides A record which is also IP for nike.com website&lt;/li&gt;
&lt;/ol&gt;

&lt;h5&gt;
  
  
  Useful commands
&lt;/h5&gt;

&lt;p&gt;1.&lt;br&gt;
&lt;br&gt;
 &lt;code&gt;dig A nike.com&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
 Provides IP address of nike.com&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```
    ; &amp;lt;&amp;lt;&amp;gt;&amp;gt; DiG 9.18.39-0ubuntu0.24.04.1-Ubuntu &amp;lt;&amp;lt;&amp;gt;&amp;gt; A nike.com
    ;; global options: +cmd
    ;; Got answer:
    ;; -&amp;gt;&amp;gt;HEADER&amp;lt;&amp;lt;- opcode: QUERY, status: NOERROR, id: 36711
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 65494
    ;; QUESTION SECTION:
    ;nike.com.          IN  A

    ;; ANSWER SECTION:
    nike.com.       77  IN  A   18.172.64.17
    nike.com.       77  IN  A   18.172.64.97
    nike.com.       77  IN  A   18.172.64.109
    nike.com.       77  IN  A   18.172.64.37

    ;; Query time: 260 msec
    ;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
    ;; WHEN: Thu Nov 06 13:35:35 IST 2025
    ;; MSG SIZE  rcvd: 101
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2.&lt;br&gt;
&lt;br&gt;
 &lt;code&gt;dig NS nike.com&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
 Provides nameservers of nike.com&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```
    ; &amp;lt;&amp;lt;&amp;gt;&amp;gt; DiG 9.18.39-0ubuntu0.24.04.1-Ubuntu &amp;lt;&amp;lt;&amp;gt;&amp;gt; NS nike.com
    ;; global options: +cmd
    ;; Got answer:
    ;; -&amp;gt;&amp;gt;HEADER&amp;lt;&amp;lt;- opcode: QUERY, status: NOERROR, id: 56992
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 65494
    ;; QUESTION SECTION:
    ;nike.com.          IN  NS

    ;; ANSWER SECTION:
    nike.com.       4502    IN  NS  ns-n4.nike.com.
    nike.com.       4502    IN  NS  ns-n1.nike.com.
    nike.com.       4502    IN  NS  ns-n2.nike.com.
    nike.com.       4502    IN  NS  ns-n3.nike.com.

    ;; Query time: 215 msec
    ;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
    ;; WHEN: Thu Nov 06 13:37:12 IST 2025
    ;; MSG SIZE  rcvd: 117
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;One can also do dig A nike.com @1.1.1.1 or dig NS nike.com @1.1.1.1 which means check recods against DNS 1.1.1.1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Trying to get NS record for tld server will give empty response as it does not have an NS record as it is nameserver itself instead it gives an SOA record&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dig NS a.gtld-servers.net.&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```
; &amp;lt;&amp;lt;&amp;gt;&amp;gt; DiG 9.18.39-0ubuntu0.24.04.1-Ubuntu &amp;lt;&amp;lt;&amp;gt;&amp;gt; NS a.gtld-servers.net.
;; global options: +cmd
;; Got answer:
;; -&amp;gt;&amp;gt;HEADER&amp;lt;&amp;lt;- opcode: QUERY, status: NOERROR, id: 3874
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;a.gtld-servers.net.        IN  NS

;; AUTHORITY SECTION:
gtld-servers.net.   3600    IN  SOA av4.nstld.com. nstld.verisign-grs.com. 1762388322 3600 900 1209600 86400

;; Query time: 407 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Thu Nov 06 13:41:09 IST 2025
;; MSG SIZE  rcvd: 115
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The SOA record provides critical administrative information about a DNS zone. Here is the explanation of each field in the record you provided:
The SOA record is one of the mandatory records in a DNS zone file. It designates the primary authoritative server and provides critical administrative details for zone transfers and caching.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Name&lt;/td&gt;
&lt;td&gt;The name of the zone (e.g., example.com.).&lt;/td&gt;
&lt;td&gt;Indicates the domain to which the SOA record applies.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTL&lt;/td&gt;
&lt;td&gt;The Time-to-Live (in seconds) for the SOA record itself.&lt;/td&gt;
&lt;td&gt;Specifies how long other servers should cache this administrative record.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Class&lt;/td&gt;
&lt;td&gt;Always IN (for Internet).&lt;/td&gt;
&lt;td&gt;Defines the protocol family; virtually always Internet.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type&lt;/td&gt;
&lt;td&gt;Always SOA (Start of Authority).&lt;/td&gt;
&lt;td&gt;Identifies the record type.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MNAME&lt;/td&gt;
&lt;td&gt;The Primary Master Name Server (e.g., ns1.example.com.).&lt;/td&gt;
&lt;td&gt;The authoritative server that holds the definitive copy of the zone file.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RNAME&lt;/td&gt;
&lt;td&gt;The Responsible Person's Email Address (e.g., hostmaster.example.com.).&lt;/td&gt;
&lt;td&gt;The administrative contact for the zone. The first dot in the name is replaced by an @ symbol when interpreted as an email address (e.g., &lt;a href="mailto:hostmaster@example.com"&gt;hostmaster@example.com&lt;/a&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serial&lt;/td&gt;
&lt;td&gt;A Version Number for the zone file (often in the format YYYYMMDDSS).&lt;/td&gt;
&lt;td&gt;Secondary name servers check this number. If it has increased, they initiate a zone transfer to update their data.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refresh&lt;/td&gt;
&lt;td&gt;The time (in seconds) secondary servers wait before checking the primary server for a zone file update.&lt;/td&gt;
&lt;td&gt;Controls the frequency of checking for zone file changes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry&lt;/td&gt;
&lt;td&gt;The time (in seconds) a secondary server waits to re-try contacting the primary master after a connection failure.&lt;/td&gt;
&lt;td&gt;Allows the secondary to try again quickly without waiting for the full Refresh time if the initial check fails.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expire&lt;/td&gt;
&lt;td&gt;The time (in seconds) after which a secondary server will stop answering queries for the zone if it has been unable to contact the primary master.&lt;/td&gt;
&lt;td&gt;Prevents the secondary server from providing stale data indefinitely.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum TTL&lt;/td&gt;
&lt;td&gt;The time (in seconds) to be used for caching negative responses (e.g., when a queried record or domain does not exist).&lt;/td&gt;
&lt;td&gt;Limits how long a resolver will remember that a particular name failed to resolve.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Other Additional DNS related topics
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;TTL : DNS records update based on a set interval defined by their Time to Live (TTL) value, which starts at the moment the record is cached, not after the last retrieval.
&amp;gt;Here is how it works:&lt;/li&gt;
&lt;/ol&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Time to Live (TTL): Every DNS record has a TTL value, set by the domain administrator on the authoritative DNS server. This value tells other DNS servers (recursive resolvers and local computers) how long they should store (cache) that record before discarding it and requesting a fresh copy from the authoritative server.

&lt;ul&gt;
&lt;li&gt;Caching and Expiration: When a DNS server or a local computer receives a DNS record, it stores the information for the duration specified by the TTL. The countdown begins immediately upon receipt.&lt;/li&gt;
&lt;li&gt;Update Mechanism: After the set TTL interval expires, the cached record is marked as stale. The next time a user requests that domain name, the local DNS server will not use the old cached information but will instead perform a new query to the authoritative nameserver to get the most current information
&lt;/li&gt;
&lt;/ul&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;


Have a different trick for this or a related tip? Share it with the community below!
&lt;/h3&gt;


</description>
      <category>dns</category>
      <category>linux</category>
      <category>cli</category>
      <category>web</category>
    </item>
  </channel>
</rss>
