DEV Community: Zepher Ashe

GitOps for Homelab DNS (Dnsmasq)

Zepher Ashe — Tue, 30 Jun 2026 23:32:59 +0000

Why GitOps?
🚀 Installation & Initial Configuration
🖋 Writing Hostnames
🔄 GitOps Workflow
🔄 Bash Sync Script & Cronjob
🔄 (Optional) GitHub Actions CI Validation
Visual DNS Flow
💡 Benefits & Considerations

Why GitOps?

⚡ It started as a simple typo in /etc/hosts. One wrong hostname. One overlooked IP.

In an instant, my homelab descended into chaos:

My Proxmox cluster stopped resolving nodes.
The Ceph distributed storage went offline.
Every LXC container and VM froze or failed to start.

It was a domino effect: a tiny DNS misconfiguration brought down my entire lab environment.

That’s when I realised: manual DNS editing was a ticking time bomb.

Enter GitOps for DNS - a workflow that stores every host entry in a repo under version control, automates deployment, and ensures a single source of truth.

No more human error. No more cascading failures.

Introduction

In this guide, we’ll implement GitOps-style DNS using dnsmasq, GitHub, Bash, and a cronjob - making your homelab DNS predictable, maintainable, and secure.

Deploy keys (SSH keys) are used to secure repository access, ensuring only your DNS server can pull updates.

🛠 Benefits of a GitOps DNS Workflow

🔄 Automated sync – Changes in Git automatically update the DNS server
🗂 Version-controlled hosts – Every edit is tracked with commit history
🛡 Security-conscious – Deploy keys and configuration validation reduce risk
🧩 Modular & maintainable – Split hosts into .cfg files for easier management
✅ Integrate dnsmasq with GitHub Actions for automated validation

🚀 Installation & Initial Configuration

1. Install dnsmasq and Git

sudo yum install -y dnsmasq git       # RHEL/CentOS/AlmaLinux
sudo apt install -y dnsmasq git       # Debian/Ubuntu

2. Configure dnsmasq

Create a config file in /etc/dnsmasq.d/:

/etc/dnsmasq.d/000-base.conf

domain=safesploit.com
expand-hosts

log-queries
no-hosts
addn-hosts=/etc/hosts.d

listen-address=172.16.5.27

📝 Tip: Always bind dnsmasq to a specific NIC (listen-address)

⚠️ using 0.0.0.0 is not supported.

Otherwise, dnsmasq will only respond on 127.0.0.1.(

3. Prepare `/etc/hosts.d/` (Optional, Modular Setup)

sudo mkdir -p /etc/hosts.d
sudo touch /etc/hosts.d/000-network.cfg
# sudo touch /etc/hosts.d/XXX-*.cfg  # placeholder files

4. Enable & start service

sudo systemctl enable dnsmasq
sudo systemctl start dnsmasq

Now dnsmasq is ready to resolve hostnames from /etc/hosts.d/.

5. (Optional) Test after starting dnsmasq

Use dig @<DNS Server IP> <DNS Entry> to query:

dig @172.16.5.27 ldap.safesploit.com +short

This gives proof it works

🖋 Writing Hostnames

Add new .cfg files to /etc/hosts.d/ following a consistent naming scheme like XXX-<subnet_name>.cfg:

005-critical-prod.cfg → critical production hosts
081-mgmt.cfg → management network
100-storage.cfg → storage servers

Example hostnames

Each .cfg file contains simple IP hostname mappings (just like /etc/hosts):

172.16.5.10       ldap

172.16.100.10      nas1

✅ This modular approach makes it easy to version control, review, and expand.

🔄 GitOps Workflow

Assumed Knowledge: that you already know how to create a GitHub repo.

1. Repository Structure

.dns-configs/
└── dnsmasq/
    └── hosts.d/
        ├── 005-critical-prod.cfg
        ├── 081-mgmt.cfg
        ├── 100-storage.cfg
        └── XXX-*.cfg

2. Secure Access with Deploy Keys

Generate an SSH key for the DNS server:

ssh-keygen -t ed25519 -f ~/.ssh/id_github_ed25519

Add the public key to your GitHub repository’s Deploy Keys with read-only access.
GitHub Docs: Managing Deploy Keys

3. Workflow Steps

Detect changes on main
Clone repo
Sync /etc/hosts.d/ via rsync
Restart dnsmasq
Record applied commit hash

⚠️ Warning: Validate configs before restart:
dnsmasq --test
systemctl restart dnsmasq

🔄 Bash Sync Script & Cronjob

DNS GitOps Script:

⚠️ Make sure you adjust these for your setup

REPO="dns-configs"

OWNER="github_org/username"

GITHUB_DEPLOY_KEY="${HOME}/.ssh/id_github_ed25519"

Bash Script

The script handles:

SSH agent setup
Change detection via commit hash
Cloning and syncing /etc/hosts.d/
Restarting dnsmasq
Logging and optional quiet mode

Git Clone Bash Script - git_clone_dns_configs_dnsmasq.sh

Cronjob (every 5 minutes):

*/5 * * * * /root/git_clone_dnsmasq.sh

🔧 Automatic updates reduce errors and maintain consistency.

🔄 (Optional) GitHub Actions CI Validation

At this point, the DNS server can pull host updates from Git.

However, blindly syncing DNS records from Git is risky. A bad hosts entry, duplicate hostname, or broken dnsmasq config could break DNS resolution.

To reduce that risk, I added a GitHub Actions workflow that validates the dnsmasq configuration before changes are accepted.

CI Validation

CI check	Reason
Runs inside AlmaLinux 10	Matches the RHEL-style target environment
Installs real `dnsmasq` service	Validates against the actual service
Copies `dnsmasq.d` and `hosts.d`	Tests realistic file layout
Validates host format	Catches malformed records
Detects duplicate hostnames	Prevents ambiguous DNS
Creates CI-safe config	Avoids binding to homelab-only IPs in CI
Starts dnsmasq on `127.0.0.1:5353`	Tests real resolution safely
Uses `dig` tests	Proves records actually resolve

GitHub Actions workflow

GitHub Actions CI Dnsmasq - dnsmasq-ci.yml

Visual DNS Flow

   [Client Request] 
          |
          v
   [DNS (dnsmasq)]
          |
    ┌─────┴─────┐
    v           v
[Local hosts]  [Upstream DNS]

🖼 Shows decision flow for DNS resolution.

💡 Benefits & Considerations

Benefits:

✅ Version-controlled DNS (rollback capable)
✅ Modular and maintainable
✅ Automatic sync reduces human error
✅ CI/CD guardrails for DNS

Security Notes:

🔑 Protect deploy keys (chmod 600 ~/.ssh/id_github_ed25519)
🔍 Validate dnsmasq configs before restart
⚠️ Be careful with rsync --delete — deletions in Git remove files on server

🔜 Next Steps / Future Improvements

✅ Integrate dnsmasq with GitHub Actions for automated validation
✅ Add linting / syntax checks for host files
Support multiple domains or overlay networks
Consider BIND migration for more advanced DNS features

Conclusion

GitOps makes your homelab DNS:

Predictable – every change tracked
Secure – deploy keys and validation
Maintainable – modular files and automation

You now have a fully automated, version-controlled DNS workflow for your homelab.

AI Can Write Code Faster Than I Can Understand It - Here’s My Rule for Fixing That

Zepher Ashe — Fri, 29 May 2026 13:26:47 +0000

🧠 1. The Uncomfortable Moment After a Productive AI Session
🧠 2. The Problem: AI Can Create Knowledge Debt
🔍 3. Review Is Not the Same as Consolidation
🛠️ 4. My Rule: 2 Hours Generation, 1 Hour Consolidation
🧩 5. What Counts as Consolidation?
- 1. Read the Code Manually
- 2. Run the Tests Yourself
- 3. Modify One Thing Without AI
- 4. Document the Architecture in My Own Words
- 5. Use AI as a Tutor, Not a Generator
✅ 6. The AI Agent Consolidation Checklist
⚠️ 7. When AI Writes Infrastructure, “Mostly Understand It” Is Not Good Enough
🎯 8. The Portfolio Angle: AI-Assisted Is Fine, AI-Owned Is Not
🚩 9. Warning Signs That AI Is Becoming Harmful
🔁 10. A Healthier AI-Agent Workflow
Final Reflection

The TL;DR Version

AI coding agents are useful.

But if the codebase improves faster than your understanding, you create a new problem:

Knowledge debt.

My current rule is simple:

For every 2 hours of AI-agent generation, I spend 1 hour manually consolidating the result.

That means reading, testing, documenting, and modifying the code myself before moving on.

AI can accelerate the work.

But I still need to own the result.

🧠 1. The Uncomfortable Moment After a Productive AI Session

I had a strange moment after spending around 7 hours with Codex modernising an old PHP/MySQL project.

On paper, the session had gone well:

✅ the repo was cleaner
✅ the architecture was better
✅ tests existed
✅ CI/CD existed
✅ Docker support existed

This was exactly what I wanted.

But there was one uncomfortable problem:

I did not fully understand what had just been built.

That is the strange part of AI-assisted development.

You can look at your own repository and see progress everywhere, while also feeling a gap underneath it.

Repository progress
      ↓
Cleaner structure
More automation
More tests
Better tooling
      ↓
But weaker ownership

That was the moment I realised the real risk was not that AI had written bad code.

The risk was that AI had written useful code before I had properly caught up.

This article is not anti-AI.

The problem is not AI.

The problem is unreviewed acceleration.

If the output grows faster than your understanding, you do not just gain productivity.

You also create knowledge debt.

🧠 2. The Problem: AI Can Create Knowledge Debt

Before AI coding agents, progress was slower.

That was frustrating, but it had one advantage:

I usually understood most of what I had written, because I had struggled through it manually.

With AI, the relationship changes.

The codebase can improve faster than your understanding.

That creates a new kind of debt:

AI-generated code can reduce technical debt in the repository while increasing knowledge debt in the engineer.

Knowledge debt looks like this:

Symptom	What it means
🐛 I cannot confidently debug it	I do not understand the failure modes
🧠 I cannot explain the design	I do not understand the trade-offs
🔁 I need AI for every change	I have become dependent
🧪 I do not understand the tests	I do not know what is actually protected
🎤 I cannot explain it in an interview	It is not portfolio-ready

That last point matters.

A portfolio project is only valuable if I can explain it under questioning.

If I cannot describe the architecture, trade-offs, tests, risks, and changes I personally validated, then it is not really my project yet. It is AI-assisted output sitting in my repository.

That does not mean AI should not be used.

It means AI usage needs discipline.

🔍 3. Review Is Not the Same as Consolidation

This is the distinction I had to make:

Review checks the output.

Consolidation protects the engineer.

They are related, but they are not the same thing.

Review asks	Consolidation asks
Does this code look acceptable?	Can I work with this code independently?
Do the tests pass?	Do I understand what the tests prove?
Does the diff look reasonable?	Can I explain why the change works?
Is anything obviously broken?	Can I debug it without AI?
Can this be merged?	Do I actually own this?

A review might catch obvious problems.

Consolidation goes deeper.

It means manually rebuilding ownership after AI has accelerated the output:

reading the code properly
tracing execution paths
running tests yourself
modifying something by hand
documenting the architecture
explaining what changed without relying on AI

That is the difference.

Review is quality control.

Consolidation is skill protection.

🛠️ 4. My Rule: 2 Hours Generation, 1 Hour Consolidation

My current rule is simple:

For every 2 hours of AI-agent generation, I spend 1 hour manually consolidating the result.

That means if I spend 7 hours using Codex, I owe the project roughly 3 hours of consolidation.

This is not punishment.

It is skill protection.

The goal is not to slow everything down for the sake of it.

The goal is to make sure the speed does not come at the cost of ownership.

The 2:1 ratio gives me enough AI usage to benefit from acceleration, but enough manual work to keep my understanding from falling too far behind.

AI can help me move faster.

But I still need to remain the engineer responsible for the result.

🧩 5. What Counts as Consolidation?

Consolidation is not passive reading.

It is active ownership.

Step	What I do	Why it matters
1	Read the code manually	Understand the actual implementation
2	Run the tests myself	Understand what is protected
3	Modify one thing without AI	Prove I can work independently
4	Document the architecture	Turn vague understanding into clear explanation
5	Use AI as a tutor	Reduce knowledge debt instead of creating more

Expand on Consolidation

1. Read the Code Manually

The first step is simple, but easy to skip.

Read the code.

Not just the summary.

Not just the AI explanation.

The actual code.

I want to understand:

what files were added
what files were modified
what the new entrypoints are
what abstractions were introduced
what assumptions the code makes
what could break

The goal is to trace at least one full execution path from start to finish.

For example:

Request comes in
   ↓
Route/controller handles it
   ↓
Service layer processes logic
   ↓
Repository/database layer is called
   ↓
Response is returned

If I cannot trace the flow, I do not understand the change yet.

2. Run the Tests Yourself

It is not enough to see that CI passed.

I want to run the tests locally and understand what they actually prove.

Questions I ask:

What behaviour is protected?
What behaviour is still untested?
Are these tests meaningful?
Are they testing business logic, infrastructure behaviour, or implementation details?
Could the tests pass while the system is still broken?

This matters because AI can generate tests that look convincing but only prove shallow behaviour.

A passing test suite is useful.

But a test suite I understand is much more valuable.

3. Modify One Thing Without AI

This is the most important anti-atrophy step.

I force myself to make at least one small manual change.

Examples:

add one unit test manually
rename a method and fix references manually
improve one validation rule
refactor one function by hand
adjust a Dockerfile without asking AI
update a GitHub Actions workflow manually

This proves that I am not just observing the code.

I can actually work with it.

My rule is:

> If I cannot safely modify the code without AI, I do not fully own it yet.

That does not mean I will never use AI again on that project.

It means I should be capable without it.

AI should be acceleration, not dependency.

4. Document the Architecture in My Own Words

This is where vague understanding becomes clear understanding.

After a large AI-assisted change, I try to write a short explanation of:

what the old architecture looked like
what the new architecture looks like
why the change was made
how data flows through the system
where tests and CI/CD fit
what risks were reduced
what risks still remain

For example:

Before:
- Logic, database access, and rendering were tightly coupled.
- Testing was difficult because most behaviour depended on direct database access.
- The project had limited separation between application concerns.

After:
- Services handle business logic.
- Repositories handle persistence.
- Tests protect critical behaviours.
- CI/CD validates changes before merge.
- Docker provides a more reproducible development environment.

This is especially important for portfolio projects.

If I cannot document it, I probably cannot explain it properly in an interview.

And if I cannot explain it in an interview, the project is not doing its job.

5. Use AI as a Tutor, Not a Generator

During consolidation, I try not to ask:

> Add the next feature.

Instead, I ask:

> Explain this file line by line.

> What assumptions does this code make?

> Where could this fail?

> What tests are missing?

> What would you ask me in a code review?

That changes the role of AI.

It stops being a code factory and becomes a reviewer, tutor, and sparring partner.

This is where AI becomes very useful for learning.

The same tool that can create knowledge debt can also help reduce it, depending on how it is used.

✅ 6. The AI Agent Consolidation Checklist

After a long AI-agent session, I do not want to rely on vibes.

I want a repeatable checklist.

The goal is simple:

Before I generate more code, I need to prove I understand what was already generated.

## AI Agent Consolidation Checklist

### 🔍 Scope Review
- [ ] I reviewed every file changed by the AI.
- [ ] I understand the purpose of each new file.
- [ ] I understand the main execution flow.
- [ ] I identified any code I do not yet understand.

### 🧪 Test Review
- [ ] I ran the test suite locally.
- [ ] I understand what each test validates.
- [ ] I added or modified at least one test manually.
- [ ] I identified the highest-risk untested behaviour.

### 🛠️ Manual Ownership
- [ ] I made at least one small manual code change.
- [ ] I can explain why the change works.
- [ ] I can revert or debug the change without AI.

### 📝 Documentation
- [ ] I documented the architecture in my own words.
- [ ] I wrote a before/after summary.
- [ ] I updated README or project notes if needed.

### 🎯 Portfolio Readiness
- [ ] I can explain the change in an interview.
- [ ] I can explain what AI helped with.
- [ ] I can explain what I personally reviewed or changed.
- [ ] I can describe what I learned.

This checklist is not about slowing AI down for no reason.

It is about converting generated output into owned knowledge.

AI generation creates output.

Human consolidation creates skill.

⚠️ 7. When AI Writes Infrastructure, “Mostly Understand It” Is Not Good Enough

My principle is simple:

AI can draft infrastructure and automation, but humans must own the blast radius.

When infrastructure is codified, code review is not just about correctness.

It is about operational safety.

This rule matters for application code.

But it matters even more when the “code” changes infrastructure, operating systems, identity, networking, or security controls.

Why?

Because the blast radius changes.

With application code, partial understanding can create:

🐛 bugs
🔓 weak validation
🧪 shallow tests
🧱 poor abstractions

That is bad.

But with infrastructure and IT automation, partial understanding can change the environment itself.

AI-generated automation
      ↓
Terraform / Ansible / PowerShell / Bash / Kubernetes / CI/CD
      ↓
Cloud resources, servers, laptops, networks, identities, policies
      ↓
Real infrastructure or OS-level changes
      ↓
Potential production impact

That is a very different risk profile.

A bad helper function might break a feature.

A bad infrastructure script might:

🌐 expose a private subnet
🔥 open the wrong firewall rule
🔑 over-permission an IAM role
💀 break routing
🚀 deploy into the wrong environment
🧨 give a pipeline more access than it should have
🪟 modify dangerous Windows registry paths
💻 remove or corrupt files needed for boot (bricking laptops, desktops, or servers)

For example, imagine asking AI to produce a PowerShell script to harden Windows registry settings.

On the surface, that sounds useful.

But if I do not understand every registry path, service change, permission change, and file operation, that script could move from “hardening” to “breaking the operating system” for thousands of devices very quickly.

AI-generated hardening script
      ↓
Registry / services / permissions / system files
      ↓
Applied across endpoints or servers
      ↓
Boot failure, broken login, disabled services, outage

That is why “mostly understood” is not good enough for infrastructure or systems automation.

The minimum standard should be:

If this code can change infrastructure, identity, networking, security, operating system behaviour, or data access, I need to understand exactly what it does before I run it.

For IaC and IT automation, consolidation is not just a learning exercise.

It is an operational safety control.

Application code:

Can I understand the behaviour?

Infrastructure and systems automation:

Can I understand the blast radius?

🎯 8. The Portfolio Angle: AI-Assisted Is Fine, AI-Owned Is Not

I do not think there is anything wrong with using AI in portfolio projects.

In modern engineering, avoiding AI completely may become unrealistic.

The issue is not whether AI helped.

The issue is whether I still own the result.

There is a big difference between these two positions:

Weak framing	Strong framing
“AI built this for me.”	“I used AI to accelerate parts of the implementation.”
“I accepted what it generated.”	“I reviewed, tested, modified, and documented the result.”
“The project works.”	“I can explain how and why it works.”
“I need AI to change it.”	“I can work on it independently.”

That distinction matters.

Employers may increasingly expect engineers to use AI tools, but they still value:

judgement
debugging ability
architecture understanding
security awareness
ownership

A stronger portfolio description would be:

AI-assisted legacy modernisation project.

Used Codex to accelerate scaffolding and refactoring,
while retaining ownership through 
- manual review
- tests
- CI/CD validation
- documentation
- post-generation consolidation

That is honest.

It does not pretend AI was not involved.

But it also shows engineering discipline.

I do not want to hide AI usage.

I want to prove that I used it properly.

🚩 9. Warning Signs That AI Is Becoming Harmful

AI becomes harmful when it replaces the struggle that creates understanding.

For me, these are the warning signs:

Warning sign	What it suggests
🚫 I accept code I cannot explain	I am trusting output instead of understanding it
🧯 I cannot debug without asking AI	I do not understand the failure modes
➕ I keep adding features instead of reviewing changes	I am stacking knowledge debt
🧪 The tests pass, but I do not know what they prove	I may have false confidence
🧱 I cannot draw the architecture from memory	I do not understand the system shape
😬 I feel anxious when AI is unavailable	AI has become a dependency
📚 I stop reading documentation	I am outsourcing learning
🧠 I stop forming my own design opinions	I am losing engineering judgement

That is when AI stops being an accelerator and starts becoming a dependency.

The danger is not that AI makes you productive.

The danger is that it can make you look productive while quietly weakening your understanding.

That is the trap I am trying to avoid.

🔁 10. A Healthier AI-Agent Workflow

The workflow I am not aiming for:

Prompt → accept → prompt again → accept again

That is how knowledge debt stacks up.

A healthier loop looks like this:

Human defines architecture and constraints
      ↓
AI drafts implementation
      ↓
Human reviews the diff
      ↓
Human runs tests and tools
      ↓
Human manually changes one part
      ↓
Human documents the result
      ↓
AI is used as reviewer / tutor
      ↓
Next generation cycle begins

The shorter version:

Design manually
      ↓
Generate selectively
      ↓
Review deeply
      ↓
Test automatically
      ↓
Document clearly
      ↓
Own the result

This gives AI a useful place in the workflow.

It is not banned.

It is not blindly trusted.

It is used deliberately.

That is the difference.

Final Reflection

AI agents are not going away.

And honestly, I do not want them to.

They are useful.

They can accelerate boring work, suggest patterns, create tests, explain unfamiliar code, and help modernise old projects faster than I could manually.

But speed has a cost if I do not deliberately consolidate afterwards.

My current rule is simple:

For every 2 hours of AI-generated progress, I spend 1 hour rebuilding ownership.

That means:

reading the code
running the tests
documenting the architecture
making at least one manual change
proving I can explain the result

The goal is not to code slower.

The goal is to make sure I am still an engineer, not just the person pressing accept.

AI should accelerate my learning curve, not outsource it.

Running Local AI (Self-hosted) Coding Assistants in VS Code with Ollama and GitHub Copilot

Zepher Ashe — Thu, 14 May 2026 11:23:32 +0000

Introduction

GitHub Copilot now supports Bring Your Own Key (BYOK), allowing developers to connect local or self-hosted AI models directly into VS Code.

This means you can run coding assistants locally using tools like Ollama (offline AI) without sending prompts to external providers.

Architecture
Step 0 - Prerequisites
Step 1 - Install Ollama
Step 2 - Pull a Model
Step 3 - Test the Model
Step 4 - Confirm API Endpoint
Step 5 - Configure VS Code
Step 6 - Select Your Model
Step 7 - Remote Access (Optional)
Step 8 - Configure Nginx Reverse Proxy (Optional)
Step 9 - Testing

Architecture

VS Code
   ↓
GitHub Copilot Chat (BYOK)
   ↓
Ollama API
   ↓
Local LLM

Example:

VS Code → localhost:11434 → Qwen2.5-Coder

Step 0 - Prerequisites

Step 1 - Install Ollama

Linux

Official:

sudo curl -fsSL https://ollama.com/install.sh | sh

Reference: https://docs.ollama.com/integrations/vscode

Verify

ollama --version

Start service:

systemctl enable ollama --now

Check service:

systemctl status ollama

Step 2 - Pull a Model

Best Starting Models

Model	Approx VRAM	Recommended Hardware	Notes
qwen2.5-coder:14b	10–12GB	RTX 3090 / 4090	Best balance
qwen2.5-coder:32b	24GB+	RTX 4090 / A5000	Excellent coding performance
deepseek-coder-v2	24GB+	RTX 4090 / A5000	Strong reasoning
phi4	CPU friendly	Modern x86 CPU	Lightweight
phi4-mini	CPU friendly (~3GB RAM free)	Modern x86 CPU	Lightweight

Note: Hardware requirements vary depending on quantisation level and context size.

LLMs can run on CPU-only systems, but response latency may increase significantly depending on model size and quantisation.

For practical coding assistance, GPU acceleration is strongly recommended for models larger than 7B–14B parameters.

Step 3 - Test the Model

Run:

ollama pull phi4-mini

ollama run phi4-mini

You should get an interactive prompt.

Step 4 - Confirm API Endpoint

Ollama automatically exposes:

http://localhost:11434

Test:

curl http://localhost:11434/api/tags

You should see JSON listing models.

Step 5 - Configure VS Code

Open:

Ctrl+Shift+P 
(macOS: Command+Shift+P)

Run:

Chat: Manage Language Models

Then:

Add Models → Ollama

NOTE: this section must show “tools”

Otherwise GitHub Copilot cannot select

VS Code should auto-detect:

http://localhost:11434

Or, if using a remote Ollama server (not run locally) - see step 8

https://ollama.internal.domain:443

Reference: https://docs.ollama.com/integrations/vscode

Step 6 - Select Your Model

Inside Copilot Chat (set session target) select Local:

Open model picker
Choose (other models Ollama):
- qwen2.5-coder
- etc…

Now your prompts go to your local model.

Important Limitation

This currently applies mainly to:

✅ Chat
✅ Agent mode
✅ AI interactions

But NOT fully to:

❌ inline autocomplete

Microsoft documents this explicitly.

https://code.visualstudio.com/docs/copilot/customization/language-models

Homelab Setup

Small Setup

High-level overview

Mini PC / NUC
Debian Server
Ollama
Qwen2.5-Coder 14B
Tailscale

Step 7 - Remote Access (Optional)

If your inference server is elsewhere (not hosted locally):

Example:

http://ai-node:11434

https://ollama.internal.domain

Then configure Ollama accordingly.

SECURITY (VERY IMPORTANT)

Do NOT expose Ollama publicly.

Bad:

0.0.0.0:11434

without auth/firewall.

Use:

Tailscale
WireGuard
reverse proxy auth
firewall ACLs

There are already reports of exposed Ollama servers online.

Step 8 - Configure Nginx Reverse Proxy (Optional)

You can skip this step if testing

If you plan to access Ollama remotely, it is recommended to place it behind a reverse proxy rather than exposing the API directly.

VS Code
   ↓
HTTPS
   ↓
Nginx Reverse Proxy
   ↓
Ollama API
   ↓
Local Model

Install Nginx

Debian/Ubuntu:

sudo apt update
sudo apt install nginx -y

RHEL/AlmaLinux:

sudo dnf install nginx -y

Enable and start the service:

sudo systemctl enable --now nginx

Verify:

systemctl status nginx

Create Reverse Proxy Configuration

Create a new Nginx site configuration:

sudo vim /etc/nginx/conf.d/ollama.conf

Example configuration:

This step uses HTTP for testing (not recommended for production)

server {
    listen 80;
    server_name ollama.internal.domain;

    location / {
        proxy_pass http://127.0.0.1:11434;

        proxy_http_version 1.1;

        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Test configuration:

sudo nginx -t

Reload Nginx:

sudo systemctl reload nginx

Verify Reverse Proxy

Test locally:

curl http://localhost/api/tags

Or remotely (DNS must resolve for this to work):

curl http://ollama.internal.domain/api/tags

You should receive JSON output listing available models.

Step 9 - Testing

Inside VS Code Copilot Chat:

Select the Ollama model
Ensure the session target is set to Local
Open a source file
Highlight a small code block
Test prompts such as:

Explain this function

or:

Suggest improvements

If successful:

responses should come from the local model
Ollama logs will show /v1/chat/completions
no external provider API keys are required

Monitor logs:

journalctl -u ollama -f

Example successful request:

POST "/v1/chat/completions"

References

Stop Writing Python Like JavaScript: Common Language-Switching Mistakes

Zepher Ashe — Tue, 14 Apr 2026 22:18:06 +0000

1. Respect Language Idioms
2. Mind the Type System
3. Handle Errors the Right Way
4. Manage Dependencies Correctly
5. Be Conscious of Execution Context
6. Security Practices Don’t Always Translate
7. Testing & Tooling

Overview

When moving between languages, the main risks are forcing habits from one language onto another, misusing idioms, or forgetting environment-specific behaviours. The following practices help minimise friction and avoid subtle bugs.

1. Respect Language Idioms

Don’t force a main() everywhere.
- Bash, Perl, PHP, JavaScript → natural to write procedural top-down code.
- Python, Java, Go, Rust, C/C++ → explicit entry point is expected.
Follow the ecosystem’s coding style:
- Python → snake_case for variables/functions, CamelCase for classes.
- JavaScript → camelCase for variables/functions, PascalCase for classes.
- PowerShell → Verb-Noun (Get-Process).
- Java → CamelCase everywhere, verbose OOP structure.

2. Mind the Type System

Dynamic languages (Python, Bash, Perl, PHP, JavaScript):
- Expect runtime type errors; validate inputs early.
- Use linters/type checkers if available (mypy for Python, tsc for TypeScript).
Static languages (C, C++, Java, Go, Rust):
- Embrace compiler feedback; it saves runtime pain.
- Prefer explicit over inferred types when readability matters.
Hybrid (PowerShell):
- Strongly typed but still allows implicit conversions → always declare expected types in Param() blocks.

3. Handle Errors the Right Way

Bash/Perl: check $? or $! exit codes; use set -euo pipefail in Bash.
Python/Java/JavaScript/PHP: structured try/except or try/catch.
Rust: pattern-match on Result/Option.
Go: check returned err explicitly (if err != nil { ... }).
PowerShell: try/catch/finally with terminating errors.

4. Manage Dependencies Correctly

Python → pip/poetry/venv.
JavaScript → npm/yarn.
PHP → composer.
Go → go mod.
Rust → cargo.
Perl → CPAN/cpanm.
Avoid hardcoding library paths; use ecosystem tools for portability.

5. Be Conscious of Execution Context

CLI scripts: Bash, Perl, Python — make sure you handle arguments ($1, sys.argv, @ARGV).
Web scripts: PHP, JavaScript — expect interaction with HTTP state, superglobals, or events.
Compiled binaries: Go, Rust, C/C++ — distribute with care; consider static builds for portability.
Mixed models: PowerShell, Python, Node.js → scripts can be interactive or part of automation pipelines.

6. Security Practices Don’t Always Translate

String handling:
- Perl, Bash, PHP → watch out for injection risks (shell eval, SQL injection).
- Python, Go, Rust → encourage safer libraries by default.
Secrets handling:
- Never hardcode credentials; environment variables or vaults are standard.
Input validation:
- JavaScript/PHP → sanitize all external input (XSS/SQLi risks).
- Bash/Perl → validate before using in system commands.

7. Testing & Tooling

Always use linters/formatters when switching (helps enforce idioms):
- Python → black, flake8.
- JavaScript → eslint, prettier.
- Go → gofmt.
- Rust → clippy, rustfmt.
- PowerShell → PSScriptAnalyzer.
Write small unit tests in the ecosystem’s framework:
- Python → pytest.
- Java → JUnit.
- PHP → PHPUnit.
- Go → built-in go test.
- Rust → built-in cargo test.

Switching isn’t just syntax — it’s mindset. Learn the idioms, adopt ecosystem tools, and adjust error-handling and security practices to the language’s expectations.

Linux Observability Tools — A Practical Guide

Zepher Ashe — Mon, 09 Mar 2026 09:45:12 +0000

Linux Observability Tools

Observability is the ability to understand what a Linux system is doing internally by examining the signals it emits — metrics, logs, traces, and events.

This guide provides a structured overview of Linux observability tools, grouped by the system layers they inspect. It is designed as a practical reference for troubleshooting, performance engineering, capacity planning, and DevSecOps workflows.

🧱 1. Application & User-Space Observability
🧩 2. System Libraries & Syscall Interface
🧬 3. Kernel Subsystems Observability
🔩 4. Device Drivers & Block Layer Observability
📦 5. Storage & Swap Observability
🌐 6. Network Stack & NIC Observability
🖥️ 7. Hardware Observability (CPU, RAM, Buses)
📊 8. System-Wide Observability Tools

📊 Overview Diagram

Diagram © Brendan Gregg — used here with attribution for educational and informational purposes.

The diagram maps common observability tools to layers of the Linux operating system, from user-space applications down to hardware, providing a mental model for selecting the right tool during analysis or incident response.

🧱 1. Application & User-Space Observability

These tools inspect behaviour at the process and application level, including interactions with system libraries.

🔧 Tools

strace – traces system calls made by an application.
ltrace – traces dynamic library calls.
ss – modern socket statistics (replacement for netstat).
netstat – legacy but still useful for connection state overview.
sysdig – system-wide syscall/event capture and filtering.
lsof – lists open files, sockets, pipes, etc.
pidstat – per-process CPU, memory, I/O, threads.
pcstat – page cache statistics for specific files.

🧠 When to use

Debugging why an application is slow or blocked.
Identifying network usage per process.
Auditing open files and ports.
Understanding syscall patterns for performance tuning.

🧩 2. System Libraries & Syscall Interface

This layer sits between applications and the kernel. Tools here help examine transitions between user-space and kernel-space.

🔧 Tools

strace / ltrace – observe execution flow into syscalls and libraries.
perf – syscall latency, profiling, hotspots.
ftrace – built-in kernel tracer for syscalls and function calls.
SystemTap (stap) – programmable probes for syscalls.
LTTng – high-performance tracing for production systems.
eBPF / bpftrace – modern, safe kernel-level instrumentation.

🧠 When to use

Diagnosing syscall bottlenecks.
Monitoring unexpected kernel interactions.
High-resolution production tracing with low overhead (eBPF).

🧬 3. Kernel Subsystems Observability

The kernel handles filesystems, memory management, scheduling, and networking. Tools here inspect these internal mechanisms.

🔧 Tools

perf – scheduler behaviour, CPU cycles, kernel hotspots.
tcpdump – raw packet capture at the IP/Ethernet layers.
iptraf – lightweight network utilisation monitor.
vmstat – processes, memory, swap, I/O, interrupts.
slabtop – kernel slab allocator usage.
free – memory allocation breakdown.
pidstat – scheduler awareness and per-thread stats.
tiptop – per-thread metrics using hardware counters.

🧠 When to use

Identifying memory pressure, leaks, or slab exhaustion.
Determining network packet loss or congestion.
Analysing scheduler-induced latency.
Understanding kernel-side performance issues.

🔩 4. Device Drivers & Block Layer Observability

These tools examine I/O as it flows through the Linux block subsystem.

🔧 Tools

iostat – block device throughput and latency.
iotop – per-process disk I/O usage.
blktrace – very detailed block layer tracing.
perf / tiptop – device driver profiling.

🧠 When to use

Troubleshooting slow disk I/O.
Detecting I/O starvation or noisy-neighbour workloads.
Analysing LVM/RAID performance issues.

📦 5. Storage & Swap Observability

Tools focusing on physical disks, logical volumes, controllers, and swap usage.

🔧 Tools

iostat – read/write performance.
iotop – which processes are causing I/O.
blktrace – kernel-level I/O event tracing.
swapon -s – view swap devices and utilisation.

🧠 When to use

Swap thrash detection.
Disk queue depth analysis.
Understanding storage behaviour under load.

🌐 6. Network Stack & NIC Observability

These tools examine network interfaces, Ethernet drivers, ports, and NIC statistics.

🔧 Tools

tcpdump – packet-level visibility.
ss / netstat – connections and sockets.
iptraf – per-interface traffic charts.
ethtool – NIC driver settings and link state.
nicstat – interface utilisation.
lldptool – LLDP neighbour discovery.
snmpget – SNMP-based network metrics.

🧠 When to use

Packet drops, retransmits, or MTU mismatches.
NIC offload tuning (TSO, GRO, etc.).
Link speed/duplex mismatch troubleshooting.

🖥️ 7. Hardware Observability (CPU, RAM, Buses)

These tools provide insights into how the hardware itself behaves — including CPU frequency, power states, performance counters, NUMA locality, memory pressure, cache behaviour, and bus throughput.

🔧 CPU Tools

mpstat – Reports CPU usage per core, showing utilisation, steal time, IRQ time, and more.
top – Real-time process monitoring with CPU, load average, and per-thread breakdowns.
ps – Snapshot of process states, CPU usage, memory usage, and scheduling information.
pidstat – Per-thread and per-process CPU utilisation, context switching, and scheduling metrics.
perf – Hardware performance counter profiler (cycles, cache misses, branch mispredictions).
turbostat – Intel-specific tool showing CPU frequencies, C-states, P-states, and turbo boost behaviour.
rdmsr – Reads CPU model-specific registers (MSRs) for extremely low-level introspection.

🔧 Memory Tools

vmstat – Shows paging, swapping, memory pressure, interrupts, and system-wide throughput.
free – Reports total, used, cached, and available system memory.
slabtop – Displays kernel slab allocator statistics (caches, objects, memory used).
numastat – NUMA locality, node memory distribution, and remote memory access counts.
perf (memory events) – Analyses hardware counters related to RAM, cache, and memory bus traffic.

🧠 When to use

NUMA locality and cross-node memory access debugging.
CPU throttling, frequency scaling, or thermal throttling investigations.
Memory pressure analysis, leaking workloads, or kernel slab issues.
High-performance tuning for compute-heavy or latency-sensitive workloads.

📊 8. System-Wide Observability Tools

These tools cover multiple layers at once.

🔧 Tools

sar – historic performance logs across CPU, memory, I/O, network.
dstat – live multi-metric system aggregation.
sysdig – holistic tracing across syscalls, network, containers.
/proc – raw kernel data for metrics, states, drivers, and interfaces.

🧠 When to use

Incident response and baselining.
Long-term trending and anomaly detection.
System-wide correlation across resources.

🛠️ Practical Use Cases

✔ Root Cause Analysis (RCA)

Identify if a slowdown is CPU, memory, network, or storage related.
Trace a misbehaving process through syscalls into the kernel.
Compare observed performance against baseline.

✔ Performance Tuning

Scheduler tracing for latency-sensitive workloads.
NIC tuning via ethtool for high-throughput environments.
Storage insight for LVM/RAID/SSD/HDD tuning.

✔ DevSecOps / Security

eBPF tools for detecting suspicious syscalls.
lsof for auditing unexpected open sockets/files.
sysdig rules for behavioural anomaly detection.

A secure system is one that is understood, not just hardened.

🔐 Observability in DevSecOps

Observability is not just operational — it is security-critical:

Detect unusual syscall patterns (possible intrusion).
Identify crypto miners via CPU and scheduler patterns.
Spot exfiltration via abnormal NIC or TCP behaviour.
Validate hardening changes improve performance rather than degrade it.

📚 References

Brendan Gregg — Linux Performance Tools
Kernel documentation — https://www.kernel.org/doc/
Sysdig, LTTng, SystemTap official docs
eBPF / bpftrace reference guides

Ceph Public Network Migration (No Downtime)

Zepher Ashe — Sun, 08 Mar 2026 00:43:10 +0000

Ceph Public Network Migration (Proxmox)

172.16.0.0/16 → 10.50.0.0/24

No service downtime, no data loss

📌 Context

This procedure documents a live Ceph public network migration performed on a Proxmox-backed Ceph cluster.

The goal was to eliminate management-network congestion while maintaining cluster availability and data integrity.

🎯 Objective
🧱 Key Concepts (Read Once)
🚨 Troubleshooting
⚠️ Risks Considered
✅ Final State

🎯 Objective

Migrate all Ceph traffic (MON, MGR, MDS, OSD front + back) from a congested management network to a dedicated Ceph fabric (e.g. 2.5 GbE switch), while keeping the cluster healthy and online.

🧱 Key Concepts (Read Once)

`public_network`

Client ↔ OSD traffic
MON / MGR control plane
CephFS metadata traffic

`cluster_network`

OSD ↔ OSD replication & recovery (data plane)

Important behaviours

MON & MGR enforce address validation
OSDs bind addresses at restart
/etc/pve/ceph.conf is not authoritative alone — Ceph also uses its internal config database

1️⃣ Prepare the New Ceph Network

Create a dedicated bridge on each node (example: vmbr-ceph):

vim /etc/network/interfaces

auto vmbr-ceph
iface vmbr-ceph inet static
    address 10.50.0.20/24
    bridge-ports eno2
    bridge-stp off
    bridge-fd 0
# Ceph (Fabric)

Assign IPs on the new subnet:

pve2 → 10.50.0.20/24
pve3 → 10.50.0.30/24
pve4 → 10.50.0.40/24

Ensure this network is isolated (no gateway required).

Verify connectivity

ping 10.50.0.30
iperf3 -s / -c <peer>

2️⃣ Add the New Public Network (Dual-Network Phase)

NOTE: Back up the file first

cp /etc/pve/ceph.conf /etc/pve/ceph.conf.bak

Edit /etc/pve/ceph.conf:

public_network = 10.50.0.0/24, 172.16.0.0/16
cluster_network = 10.50.0.0/24, 172.16.0.0/16

⚠️ Do NOT remove the old network yet

Confirm:

Proxmox UI → Ceph → Nodes
ceph config dump

3️⃣ Recreate MONs (One by One)

MONs enforce network validation.

For each node:

pveceph mon destroy <node>
pveceph mon create
ceph -s

✔ Ensure quorum after each step.

4️⃣ Recreate MGRs (One by One)

Recreate standby managers first
Leave the active manager for last

pveceph mgr destroy <node>
pveceph mgr create

Verify:

ceph mgr dump

🔧 Recovery Tip

If a manager fails to start:

systemctl reset-failed ceph-mgr@<node>
systemctl start ceph-mgr@<node>

5️⃣ Recreate CephFS Metadata Servers (MDS)

MDS binds its address at creation time

pveceph mds destroy <node>
pveceph mds create

✔ Verify CephFS health before proceeding.

6️⃣ Remove the Old Public Network

Edit /etc/pve/ceph.conf and remove 172.16.0.0/16:

public_network = 10.50.0.0/24
cluster_network = 10.50.0.0/24

7️⃣ Recreate MONs, MGRs, and MDS (Again)

This ensures all control-plane daemons bind exclusively to the new network.

Order:

MONs (one by one)
MGRs (standbys first, active last)
MDS (one by one)

8️⃣ Protect the Cluster Before Touching OSDs

ceph osd set noout

9️⃣ Restart OSDs (Data Plane Migration)

Restart one OSD at a time:

systemctl restart ceph-osd@<id>
ceph -s

Wait for:

PGs: active+clean

Repeat for all OSDs.

🔟 Remove Protection

ceph osd unset noout

🔎 Verification (Critical)

1️⃣ Verify Ceph daemon addresses

ceph osd metadata <id> | egrep 'front_addr|back_addr'

Expected:

✅ front_addr → 10.50.0.x
✅ back_addr → 10.50.0.x
❌ No 172.16.x.x

2️⃣ Verify traffic is using the Ceph fabric

While Ceph is under load:

ip -s link show vmbr-ceph

RX/TX counters should increase, confirming traffic is not using the management network.

3️⃣ Verify raw network performance (iperf3)

⚠️ Important: iperf3 must be installed on all Ceph nodes to test the fabric correctly.

apt install iperf3

Correct testing method:

Server on one node:

iperf3 -s

Client on a different node:

iperf3 -c <peer_ip> -P 4

Expected for 2.5 GbE Ceph fabric:

~2.1–2.4 Gbit/s
Minimal or zero retransmits
Stable throughput across multiple streams

🚨 Troubleshooting: “OSDs Not Reachable / Wrong Subnet”

Symptom

osd.X's public address is not in '172.16.x.x/16' subnet

Cause

Ceph config DB or MON/MGR cache still references the old network.

Fix (Critical)

Restart ALL MONs (mandatory)

systemctl restart ceph-mon@pve2
systemctl restart ceph-mon@pve3
systemctl restart ceph-mon@pve4

Restart ALL MGRs (mandatory)

systemctl restart ceph-mgr@pve2
systemctl restart ceph-mgr@pve3
systemctl restart ceph-mgr@pve4

(Optional) Clean config DB

ceph config rm global public_network
ceph config rm global cluster_network
ceph config set global public_network 10.50.0.0/24
ceph config set global cluster_network 10.50.0.0/24

Restart OSDs again (one by one).

✔ This should resolve any “OSDs missing / wrong subnet” cases.

⚠️ Risks Considered

Why this change is risky

Changing Ceph cluster networking affects quorum, OSD availability, replication traffic, and client IO. Incorrect sequencing can cause data unavailability or permanent loss.

Failure modes considered

MON quorum loss
OSD flapping
Client IO stalls
Backfill storms
Split-brain conditions

Assumptions

Single Ceph cluster
Dedicated replication network (fabric)
Change executed during low IO window

✅ Final State

Dedicated Ceph fabric (2.5 GbE)
No Ceph traffic on management NIC
MON / MGR / MDS / OSD fully migrated
No data loss
Stable cluster

🙏 Acknowledgements

This migration approach was heavily informed by the following Proxmox forum discussion, which proved critical in resolving address-binding and daemon recreation issues during the Ceph public network transition:

Proxmox Forum – “Ceph: changing public network” https://forum.proxmox.com/threads/ceph-changing-public-network.119116/

In particular, the guidance around:

Temporarily running dual public networks
Recreating MON, MGR, and MDS daemons to force address rebinding
Avoiding full cluster downtime during network migration

was instrumental in achieving a clean, no–data-loss migration.

Many thanks to the contributors in that thread for sharing real-world operational experience.

Why Online Disk Expansion Is Safe on Linux (SAN, LVM2, XFS/Ext4)

Zepher Ashe — Sun, 08 Mar 2026 00:41:02 +0000

Why Online Growth Is Safe for SAN (Fibre Channel), LVM2, and File System (XFS)

Modern enterprise Linux storage stacks are designed for non-disruptive, online capacity expansion, even while filesystems are mounted and under active I/O.

This document explains why each layer—SAN/FC, LVM2, and XFS—fully supports this behaviour.

ME4 LUN → dm-multipath → LVM2 (PV/VG/LV) → XFS/ext4

1. SAN / Fibre Channel (FC) – Non-Disruptive LUN Expansion

Modern enterprise SAN systems (such as Dell EMC ME4) support live, nondisruptive LUN expansion:

Designed for nondisruptive expansion; brief I/O stalls may occur.
No unmount or downtime
Host simply rescans the SCSI bus
Multipath handles updated path geometry automatically

Dell’s ME4 Linux best-practices guide states:

“Resize tasks can be done online without disrupting the applications.”

Source:

https://dl.dell.com/manuals/common/powervault-me4-and-linux-best-practices_en-us.pdf

After expanding the LUN, the host only needs a rescan:

rescan-scsi-bus.sh --resize

multipathd -k
# > multipathd> "resize map mpathX"

Conclusion:

SAN LUN expansion is explicitly engineered to occur online, with no interruption to servers or I/O.

2. LVM2 – Online PV and LV Expansion (Safe While Mounted & Under I/O)

⚠️ Do not proceed unless all paths and the multipath device report the new size.

Ensure the following report ALL paths are resized:
multipath -ll mpathX
blockdev --getsize64 /dev/sdX
blockdev --getsize64 /dev/mapper/mpathX

2.1 PV Resize (`pvresize`) – Safe While Mounted

According to Red Hat’s LVM developers:

“pvresize simply updates the metadata to make LVM aware of the new size.”

“The data area does not change; only the PV extent map is updated.”

— Jonathan Brassow, Red Hat LVM developer

Source:

https://www.redhat.com/archives/linux-lvm/2009-May/msg00040.html

Because it modifies only metadata, pvresize:

Does not touch data blocks
Does not affect the filesystem
Is safe under ongoing I/O
- Provided the block device has been correctly resized on ALL paths
Is routinely run on root filesystems, which cannot be unmounted

Cloud vendor documentation confirming online PV resizing

All three major cloud providers document pvresize as part of their “expand disk without downtime” workflow:

AWS – Expand EBS volumes https://docs.aws.amazon.com/ebs/latest/userguide/recognize-expanded-volume-linux.html
Google Cloud – Resize persistent disks https://cloud.google.com/compute/docs/disks/resize-persistent-disk
Azure – Expand disks *without downtime* https://learn.microsoft.com/en-us/azure/virtual-machines/linux/expand-disks?tabs=ubuntu#expand-without-downtime

These platforms are extremely conservative; documenting pvresize online means it is fully safe and supported.

Conclusion:

pvresize is a safe, online, non-disruptive operation on modern LVM2.

2.2 LV Resize (`lvextend`) – Online and Atomic

lvextend updates:

LVM metadata
device-mapper mappings

These operations are:

Atomic
Metadata is committed atomically via device-mapper table swap
Safe during active I/O
Non-disruptive to mounted filesystems

Cloud vendors (AWS, GCP, Azure) all document lvextend as online, immediately after pvresize.

Conclusion:

LVM2 PV and LV can be grown online, even during live writes, with no unmount required.

3. Online Filesystem Growth

After expanding the block device (SAN → PV → LV), the filesystem must grow to use the new space.

Modern Linux filesystems support online, mounted filesystem expansion.

Filesystem	Online Grow	Notes
XFS v4+	✔	Designed for online growth; cannot shrink
ext4	✔	`resize2fs` supports online grow

3.1 XFS – Designed for Online Filesystem Expansion

Red Hat’s official XFS documentation states:

“The filesystem must be mounted to be grown.”

Source:

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/storage_administration_guide/xfsgrow

This is one of the clearest vendor statements that XFS online growth is not just supported, but required while mounted.

Why XFS online growth is safe:

Journaled metadata operations
Allocation groups expand in place
Grows only while mounted (required)
Designed for SAN arrays, RAID, HPC
Handles heavy concurrent I/O

Conclusion:

XFS is one of the safest and most robust filesystems for online, mounted, high-throughput growth.

4. Network Share Clients Automatically See Grown Filesystems

When an XFS or ext4 filesystem exported over NFSv4 is grown on the server, clients will automatically reflect the new size without requiring remounts.

NFSv4 is stateful and revalidates filesystem attributes, so the updated capacity becomes visible to clients transparently.

Example on the client:

df -h /mnt/nfs_share

🔴 Older Requirements (Why Some Admins Avoid Online Resize)

Before kernel 2.6.31 and early LVM2/LVM1:

pvresize sometimes failed to detect new sizes
Multipath resizing was unreliable
PV resizing sometimes required pvcreate --restorefile
ext2/ext3 could corrupt if device size changed underneath
SAN rescan tools were buggy
UNIX systems generally required unmounting for geometry changes
LVM1 had no reliable online extension

This produced the legacy rule:

“Never resize storage while mounted.”

Modern RHEL9 (2021+) systems:

Online PV expansion = safe
Online LV expansion = safe
Online XFS growth = officially supported
SAN growth = nondisruptive

The entire modern stack is designed for online operation.

Summary Table

Layer	Operation	Safe While Mounted / Under I/O?	Why
SAN / FC	Expand LUN	✔ Yes	Engineered for nondisruptive growth
LVM2 PV	`pvresize`	✔ Yes	Only metadata updated; no data block changes
LVM2 LV	`lvextend`	✔ Yes	Atomic metadata update; safe under I/O
Filesystems	Online growth	✔ Yes	XFS/ext4 support mounted expansion

DEV Community: Zepher Ashe

GitOps for Homelab DNS (Dnsmasq)

Why GitOps?

Introduction

🛠 Benefits of a GitOps DNS Workflow

🚀 Installation & Initial Configuration

1. Install dnsmasq and Git

2. Configure dnsmasq

3. Prepare /etc/hosts.d/ (Optional, Modular Setup)

4. Enable & start service

5. (Optional) Test after starting dnsmasq

🖋 Writing Hostnames

Example hostnames

🔄 GitOps Workflow

1. Repository Structure

2. Secure Access with Deploy Keys

3. Workflow Steps

🔄 Bash Sync Script & Cronjob

Bash Script

Cronjob (every 5 minutes):

🔄 (Optional) GitHub Actions CI Validation

CI Validation

GitHub Actions workflow

Visual DNS Flow

💡 Benefits & Considerations

Benefits:

Security Notes:

🔜 Next Steps / Future Improvements

Conclusion

AI Can Write Code Faster Than I Can Understand It - Here’s My Rule for Fixing That

The TL;DR Version

🧠 1. The Uncomfortable Moment After a Productive AI Session

🧠 2. The Problem: AI Can Create Knowledge Debt

🔍 3. Review Is Not the Same as Consolidation

🛠️ 4. My Rule: 2 Hours Generation, 1 Hour Consolidation

🧩 5. What Counts as Consolidation?

1. Read the Code Manually

2. Run the Tests Yourself

3. Modify One Thing Without AI

4. Document the Architecture in My Own Words

5. Use AI as a Tutor, Not a Generator

✅ 6. The AI Agent Consolidation Checklist

⚠️ 7. When AI Writes Infrastructure, “Mostly Understand It” Is Not Good Enough

🎯 8. The Portfolio Angle: AI-Assisted Is Fine, AI-Owned Is Not

🚩 9. Warning Signs That AI Is Becoming Harmful

🔁 10. A Healthier AI-Agent Workflow

Final Reflection

Running Local AI (Self-hosted) Coding Assistants in VS Code with Ollama and GitHub Copilot

Introduction

Architecture

Step 0 - Prerequisites

Step 1 - Install Ollama

Linux

Verify

Step 2 - Pull a Model

Best Starting Models

Step 3 - Test the Model

Step 4 - Confirm API Endpoint

Step 5 - Configure VS Code

Step 6 - Select Your Model

Important Limitation

Homelab Setup

Small Setup

Step 7 - Remote Access (Optional)

SECURITY (VERY IMPORTANT)

Step 8 - Configure Nginx Reverse Proxy (Optional)

Install Nginx

Create Reverse Proxy Configuration

Verify Reverse Proxy

Step 9 - Testing

References

Official VS Code Docs

Official Ollama VS Code Integration

GitHub Copilot BYOK Docs

VS Code BYOK Announcement

vLLM

Ollama

Stop Writing Python Like JavaScript: Common Language-Switching Mistakes

Overview

1. Respect Language Idioms

3. Prepare `/etc/hosts.d/` (Optional, Modular Setup)

`public_network`

`cluster_network`