DEV Community: Erick Mwangi Muguchia

Lending's Old Faithful: How a 1958 Breakthrough Still Holds Off the AI Rush

Erick Mwangi Muguchia — Sat, 09 May 2026 21:01:30 +0000

Series: Building an Explainable AI Underwriter

This is Part 2.

↜ Part 1: Why Explainable AI Matters in Underwriting

↝ Part 3: Human Intervention Thresholds in AI Decision (coming)

In 1958, David Cox published the logistic regression model. That same year, the first FICO score went into production. Sixty‑six years later, a loan officer in Des Moines and a regulator in Frankfurt still trust that same equation more than any neural network. This is not nostalgia. It is a verdict on what lending actually needs.

This post continues the series on why generalized linear models (GLMs) with a logit link remain the workhorse of consumer credit, and where they finally break.

Why This Post?

Credit scoring demands interpretability and stability over raw accuracy. While neural networks can push AUC past 0.92, GLMs with a logit link still power most production systems at top banks. This post explains why, and when to break the rule. GLM is not "less AI". It is a different flavor of AI, one that prioritizes transparency.

For Everyone

The problem

Your credit decision must be explainable. To regulators (e.g. CFPB, ECB), to applicants (adverse action notices), and to your own risk committee. A black box that says "denied" is a lawsuit waiting to happen.

The tradeoff

Neural net: 92% AUC (better separation)
GLM (logistic regression): 87% AUC (worse but good enough)

So why do top banks still use GLM?

Because in credit, a 5‑point AUC gain is worthless if you cannot:

Explain why a specific applicant was rejected
Prove no disparate impact (ECOA / Fair Lending)
Audit coefficient stability over 5+ years
Run on a single CPU core for $0.0001 per inference

What This Series Is Not

I am not against black boxes. Neural networks and gradient boosting excel at fraud detection, anomaly spotting, and computer vision. This series argues that for probability of default estimation under regulatory oversight, a transparent GLM is often the better engineering choice. Not the only choice. The better choice for that specific problem.

For Data Scientists

Training a logistic regression

From training data (X, y) with y ∈ {0,1} (default / no default):

import statsmodels.api as sm
# y: 1 = default, 0 = non-default
model = sm.GLM(y, X, family=sm.families.Binomial()).fit()

The model estimates:

log(p/(1-p)) = β₀ + β₁·credit_score + β₂·debt_ratio + ...

Link function: why logit, not probit?

Logit gives odds ratios – directly interpretable for business. Probit is fine but less intuitive. On real credit data (10K loans), logit consistently yields better calibration near 0% and 100% default probabilities.

Coefficient interpretation

A coefficient of 0.15 on credit_score means:

One point increase in credit score multiplies the odds of default by exp(0.15) ≈ 1.16.

So a 50‑point jump → odds multiplied by exp(0.15*50) = exp(7.5) ≈ 1808× (huge – realistic for very low scores).

⚠️ Note: In production credit scoring models, the coefficient on credit score is usually negative, because higher scores mean lower risk of default.

This example uses a positive coefficient purely to illustrate how odds ratios work. The sign depends on how the variable is defined:

If credit_score is coded as a risk score (higher = riskier), the coefficient will be positive.
If it’s coded as a traditional credit score (higher = safer), the coefficient will be negative.

Feature engineering for credit

Income: log(income+1) – turns multiplicative effects into additive
Debt ratio: bin into [0‑10%], [10‑30%], [30%+] – captures non‑linearity
Age: sin(2π·age/78) + cos(2π·age/78) – cyclical life stage effects (the 78 approximates a human lifespan in years for the cycle)

For Actuaries

From probability to expected loss

If GLM gives p_default for a loan of exposure E and loss given default LGD:

EL = p_default × E × LGD

Calibration: Brier score + calibration curves

Brier score = mean squared error of probabilities. For a well‑calibrated GLM on credit data: Brier ≈ 0.03–0.08.

from sklearn.calibration import calibration_curve
prob_true, prob_pred = calibration_curve(y_true, y_proba, n_bins=10)

Plot prob_true vs. prob_pred – should lie near the diagonal.

Bootstrap confidence intervals on coefficients

n_boot = 1000
coefs = []
for _ in range(n_boot):
    idx = np.random.choice(len(X), len(X), replace=True)
    boot_model = sm.GLM(y[idx], X[idx], family=sm.families.Binomial()).fit()
    coefs.append(boot_model.params)
ci_lower, ci_upper = np.percentile(coefs, [2.5, 97.5], axis=0)

Adverse selection drift detection

PSI (Population Stability Index) > 0.1 → feature drift
AUC decay > 5% over 3 months → score drift

For Executives

Regulatory alignment

ECOA / Fair Lending: GLM coefficients directly show whether protected class variables (or proxies like zip code) drive decisions.
FCRA adverse action: Must provide "specific reasons" – GLM top coefficients give those reasons (e.g. "debt ratio too high").

Operational cost

GLM prediction: ~100 microseconds per loan on a CPU → 10,000 predictions/sec on a single core.
Neural net (even small): ~5‑10ms on CPU or GPU → 200‑1000/sec. At scale (millions of applications), that is infrastructure cost ×50.

Risk quantification under stress

Stress scenarios: apply shocked coefficients (e.g. double the debt ratio coefficient) and recompute portfolio expected loss.

portfolio_el = np.sum(p_default * exposure * lgd)
stressed_el = np.sum(sigmoid(log_odds_shocked) * exposure * lgd)

Code Example

import numpy as np

# GLM coefficient interpretation
intercept = 2.5
coef_credit = 0.15
coef_debt = -0.002

def odds_ratio(credit_score, debt_ratio):
    log_odds = intercept + coef_credit * credit_score + coef_debt * debt_ratio
    return np.exp(log_odds)

# Approval probability
def prob_approve(credit_score, debt_ratio):
    log_odds = intercept + coef_credit * credit_score + coef_debt * debt_ratio
    return 1 / (1 + np.exp(-log_odds))

# If credit_score increases by 50 points:
# log_odds increase by 0.15 * 50 = 7.5
# odds increase by exp(7.5) = 1808x

Artifacts to Show

Training data summary: feature distributions, target rate (e.g. 2% default)
Coefficient table with bootstrap 95% CIs:

Feature	Coef	2.5% CI	97.5% CI
credit_score	0.15	0.12	0.18
debt_ratio	-0.00	-0.003	0.001

Calibration curve – actual vs. predicted default rate (plot)
Feature importance via |coef| × feature standard deviation (not model‑agnostic, but production‑happy)

Where This Fits in the Series

Human Intervention Thresholds – When should the AI say "I'm not sure, let a human decide"?
Cox Proportion Hazards – Moving from static PD to time‑to‑default modeling.
Auditability and Traceability – Every decision logged, every coefficient justified.
Ethical AI in African Financial Systems – No credit bureau? Alternative data, but carefully.
Simulations and What‑If Scenarios – Stress testing the underwriter.
Limits of Fully Automated Underwriting – Where algorithms must stop.
Designing Human‑Centered Risk Intelligence Systems – The final synthesis.

A Note from the Author

I wrote this on a laptop with portable WiFi, far from my room, without pen or paper. No familiar aura. Only the machine and memory. If I need to capture a thought, I open another tab. The 1958 algorithm does not care about your setup. It only cares that you can explain your decision.

Building a Budget Home Lab for Local LLMs

Erick Mwangi Muguchia — Fri, 24 Apr 2026 04:42:05 +0000

Yesterday—April 23rd I sat down with a blank note and a messy spreadsheet. Not for a client project or a work task. Just for something that’s been gnawing at me for weeks: I want to run large language models on my own hardware, at home, on my terms. Not in a cloud notebook that spins down after an hour. Not behind a metered API key. Something I can experiment with at 2 a.m. without worrying about surprise bills.

It started as a casual “what if,” and by the end of the evening I had a parts list, a stack of tradeoffs, and a very real budget scrawled on a piece of paper. I’m sharing that blueprint here—not as a polished build guide, but as a developer’s logbook. The point isn’t to show off a flawless system. It’s to think through the decisions, admit the unknowns, and maybe get some advice from folks who’ve walked this path before.

The Core Idea

Host some websites, tinker with local LLMs (think Llama, Mistral, Phi), and eventually offer a small hosted inference service to friends or local devs—something modest that might grow over time. The long-term dream: a self-contained AI lab where I control the hardware, the models, and the data.

But first, reality: I have a limited budget and I’m piecing this together part by part in a region where used enterprise gear isn’t as easy to come by. I’m in Kenya, so prices here are in Kenyan Shillings (KSh), but I’ll include rough USD equivalents for global context.

The Hardware Blueprint (v0.1)

Here’s what I’m planning to put in the case. It’s not exotic. It’s not a server rack full of A100s. It’s a quiet desktop that I hope will punch above its weight for quantized models.

Component	Choice	Actual KSh Range	Real-World USD
GPU	RTX 3060 12GB (new/used, local market)	45,000 – 63,000	$350 – $490
CPU	Ryzen 5 3600 (tray/used)	~13,000	$100
Motherboard	B450 / B550 (decent VRMs)	7,700 – 14,200	$60 – $110
RAM	32GB DDR4 (single 32GB stick)	16,500 – 18,950	$128 – $147
PSU	650W – 750W 80+ Bronze/Gold	8,000 – 10,000	$62 – $77
Case + Cooling	Budget airflow case, 3 fans	3,000 – 6,000	$23 – $46
Total		~93,000 – 125,000	~$720 – $970

That’s a wide range, I know. The final number will depend heavily on whether I snag a decent used GPU and how fussy I get about the motherboard and PSU. But even at the upper end, it’s cheaper than a mid-range gaming laptop.

Why Each Part Earned Its Spot

GPU: RTX 3060 12GB over the 3090

I spent a long time staring at RTX 3090 listings. 24 GB of VRAM is seductive—it can run unquantized 13B models, maybe even a quantized 30B without breaking a sweat. But at $700–$1000 used, it would eat up my entire budget. I’d be left with a beast of a GPU and no money for a CPU, RAM, or even a case to put it in.

So I pivoted. The RTX 3060 12GB. Not the 8GB variant—that 4 GB matters a lot for LLMs. 12 GB of VRAM will comfortably hold a 4-bit quantized Llama-2-13B or Mistral, and leaves room for the KV cache during inference. For smaller models like Phi-2 or a q4 Llama-7B, I’ll even have VRAM to spare for batching. Yes, I’ll buy used. Yes, I’m nervous about mining cards. But the local market has some ex-gaming units, and I’ll stress-test before committing.

The 3060’s power draw is also a plus: ~170W TDP means I won’t need a 1000W PSU, keeping the build more efficient and quieter.

CPU: Ryzen 5 3600 (used/refurb)

For pure LLM inference, the CPU isn’t the star—the GPU does the heavy lifting. But I still need enough muscle to handle API serving, web hosting, and any CPU-bound pre/post processing. A Ryzen 5 3600 is a 6-core/12-thread workhorse that’s dirt cheap on the used market. If I ever decide to experiment with CPU offloading for huge models or run a local vector database, it won’t be embarrassingly slow. And the AM4 platform gives me an upgrade path to a 5000-series chip later.

Motherboard: B450 or B550 with spare PCIe

I don’t need bleeding edge. I need one x16 slot for the GPU, maybe a spare x4 slot for a future NVMe adapter or a second NIC. Both B450 and B550 boards handle that fine. I’ll pick whichever has decent VRM cooling and is available at a reasonable price. The ability to drop in a faster Ryzen CPU later is a bonus.

RAM: 32 GB DDR4, no compromises

I'm buying 32GB upfront. For LLM serving, system RAM acts as a safety net—even when the GPU carries the model, the CPU still needs space for the OS, background services, context buffers, and any offloaded layers. A single 32GB DDR4 stick (around KSh 16,500–19,000) gives me all the headroom I need right now and leaves one slot open for a future jump to 64GB if I ever start running heavier multi-model setups or large vector databases. Waiting on RAM is a false economy when I'm this close to a balanced build.

Power Supply: 650W–750W, with tomorrow in mind

I see a lot of online advice shouting “800W minimum!” for high-end GPUs, but they’re talking about RTX 3090s and 4090s. For a 3060 + Ryzen 5 setup, a quality 650W unit is more than enough. I’m leaning toward 750W for a bit of headroom—if I ever upgrade to a hungrier GPU (used 3090 prices might drop), I won’t need to replace the PSU. I’m sticking to a reputable brand, though, because a cheap PSU is a false economy.

Case and Cooling: Airflow over aesthetics

No RGB glass panels here—just a mesh-front case with three fans. Good airflow is important because the GPU will be running flat-out during inference sessions, and I want to keep thermals low enough that the card’s fans aren’t screaming. I’ll probably undervolt the GPU slightly for efficiency.

What This Build Should Actually Do

I’m setting realistic expectations. With 12 GB VRAM, this machine will shine with:

Llama-7B and 13B (4-bit quantized) – via GGUF/MLC formats
Mistral-7B variants – tiny, fast, excellent for code and chat
Phi-3-mini – small but surprisingly capable
Ollama for dead-simple local model serving
LM Studio for a GUI-based playground when I just want to test prompts
vLLM later on, once I’m comfortable with the setup and want higher throughput for an API endpoint.

I actually can fine-tune on this box—QLoRA makes it practical. With 12 GB VRAM, I can run a 4-bit quantized 7B model and apply low-rank adaptation using standard settings without breaking a sweat. It won't be fast, but it'll work. Full fine-tuning is still out of reach, but for my experiments, QLoRA on a 7B model is more than enough.

The Uncomfortable Part: What I’m Still Unsure About

Used GPU reliability. I’ve bought used electronics before, but a GPU that’s been run 24/7 in a mining rig or a dusty gaming tower is a gamble. I’ll run FurMark and a VRAM stress test, but there’s always a chance of early failure. I’d love to hear how others vet used cards for AI workloads.
VRAM vs model ambitions. 12 GB puts a hard ceiling on model size. Right now I’m happy with 7B–13B quantized, but I can feel the itch to run larger open models (like Command R or a future Llama-3-70B-ish thing). I keep asking myself: will I regret not saving up for a 3090? The honest answer is: maybe. But learning on a 3060 is better than dreaming about a 3090 that never arrives.
Power stability at home. Brownouts and voltage swings aren’t unheard of in my area. I’ll need at least a basic UPS with AVR (automatic voltage regulation) to protect the hardware, and I haven’t priced that in yet.
Noise in a living space. This will sit in my apartment, not a dedicated server room. Inference won’t max out the GPU like gaming, but continuous serving might. I need to test fan curves and maybe swap in quieter case fans.
Monetization reality check. I mentioned “selling as a service.” I mean something humble: maybe offer a private API endpoint to local developers who want to experiment without paying for cloud tokens. It’s not a startup; it’s a side experiment that might offset some electricity costs. I genuinely don’t know if anyone will pay, or how to price it fairly.

Where This Is All Heading

The first goal is simple: get the machine booting, install Ollama, and make it work. I’ll spend a week or two just playing with models, measuring token/s speeds, and learning the quirks of managing a local inference stack.

From there, I plan to:

Run a persistent LM Studio server or Ollama for personal use—coding assistants, document summarization, etc.
Set up a Dockerized environment where I can spin up different model backends and test frameworks (LangChain, LlamaIndex, maybe a local RAG pipeline).
Explore vLLM to understand high-throughput serving, including batching and PagedAttention.
Host a few lightweight web apps alongside the inference service—personal projects that benefit from local AI.
Eventually, try to monetize: sell API credits in bulk, or offer a “bring your own model” endpoint for students/hobbyists in my network. If it gains any traction, I’d reinvest every cent into upgrades (more RAM, better GPU, maybe a second identical machine for redundancy).

No grand roadmap. No “disrupt the industry” rhetoric. Just a developer growing a system piece by piece, learning along the way.

A Quick Ask to the Dev Community

I’d genuinely appreciate your wisdom:

Used GPUs for AI: What tests do you run before buying? Any telltale signs of a card that’s been thermally tortured?
RAM vs VRAM: For a local LLM server, would you prioritize 32 GB system RAM first, or save every penny toward a bigger GPU?
PSU headroom: Is 750W a reasonable ceiling for a single high-end future card, or should I just bite the bullet and buy an 850W unit now?
Monetization: Have any of you offered a local LLM API to others? How did you handle billing, rate limiting, or uptime?

If you’ve been down this road, or if you’re staring at a similar blueprint right now, maybe drop a comment. I’d love to hear what you’d do differently, what you’d keep the same, and what you wish someone had told you before you powered on that first home-built AI box.

This is just the beginning. The next post will hopefully have photos of a real, built machine, not just a parts list and a prayer. Until then, I’ll be refreshing listings for used RTX 3060s and trying to figure out if that “too-good-to-be-true” deal is a trap.

What’s your home lab running?

Monte Carlo simulation engine for Carbon and Climate-economic risk. Its is a model for how cities can visualize their climate futures.

Erick Mwangi Muguchia — Sat, 18 Apr 2026 13:21:28 +0000

This is a submission for Weekend Challenge: Earth Day Edition

What I Built

Instead of building a basic personal carbon footprint tracker, I wanted to tackle the macro problem: convincing governments to act. This project directly honors Earth Day by providing a localized, mathematical engine to prove that climate action is an economic necessity.
This project models climate change as a risk problem, not just an environmental issue. By simulating emissions and their economic consequences, it reframes sustainability as something governments and institutions cannot afford to ignore.

For a city like Nairobi, the question becomes clear: reducing emissions isn’t just good for the planet, it stabilizes the future economy. That’s the argument this tool is built to prove.

Demo

The concept applies actuarial risk math (Monte Carlo simulations) to climate science. Instead of just proving that emissions drop, it proves that economic volatility and extreme risk drop. Piping that hard math directly into an AI (Google Gemini) to instantly generate human-readable policy briefs bridges the gap between data scientists and politicians.

The Math **models.py**: Inside this file, the code literally flips a weighted coin every single year for 20 years to add random surprises (stochastic shocks). Sometimes emissions randomly spike, sometimes they crash.

The Engine **simulator.py**: It plays out these 20 years, 10,000 separate times. It runs it once doing nothing ("Baseline") and once where the city cuts emissions ("Intervention").

The Proof **visualization.py**: It takes all 10,000 parallel universes and draws a funnel (the graphs). It proves that climate action doesn't just reduce carbon, it drastically reduces risk and volatility. The resulting graph visually proves that by acting now, we stop the worst-case, most expensive catastrophic futures from happening.

The AI **policy_brief.py**: Raw statistical variance is boring to politicians. So you take the billions of dollars saved in those 10,000 universes and feed them into Google Gemini, asking it to write an inspiring, plain-English speech for the local city council based solely on your math.

This project is intentionally built as a lightweight CLI tool. It runs entirely on a CPU with no external dependencies, making it usable in low-resource environment.
Plus I automated the demo with run_demo.sh file.

This clip shows the simulator running — baseline emissions rise while intervention drives them toward zero.

Limitation & Context

This simulator is designed as a conceptual actuarial engine, not a direct measurement of Nairobi’s emissions. The numbers are stylized to demonstrate how intervention changes risk trajectories, using Nairobi as a case study.

Several external factors are not yet modeled, including population growth, energy mix, transport electrification, and economic shocks. These would all influence real‑world emissions. For now, the model focuses on the risk dynamics: baseline emissions rise with widening uncertainty, while intervention collapses both emissions and variance.

Code

You can find the code on github:

ricsmwangi / carbon-risk-simulator

Carbon Risk Simulator is a lightweight, actuarial-inspired simulation tool that models climate change as a risk problem rather than just an environmental trend. Instead of producing a single forecast, the system uses Monte Carlo simulation to generate thousands of possible future emission paths.

Nairobi Carbon Risk Simulator

A lightweight, pure-Python Monte Carlo simulation engine designed to project carbon emissions pathways and their associated economic damages. Built specifically for Earth Day risk assessments and localized climate action policy.

What it is

Unlike linear carbon calculators, the ** Nairobi Carbon Risk Simulator** leverages actuarial mathematics to model deep uncertainty. It applies stochastic random shocks (normal and lognormal distributions) across 10,000+ simulated futures to measure not just how many emissions might be reduced, but how much volatility and economic risk is eliminated by taking climate action.

Features

Ultra-Lightweight CLI: Stripped of heavy dataframe dependencies dataframes. Relies entirely on numpy for blazing fast, vectorized Monte Carlo matrices and native Python csv for exports.
Stochastic Risk Modeling: Compare "Business as Usual" against intervention scenarios across 20-year horizons.
Data Visualization: Autogenerates high-resolution probabilistic funnels (5th-95th percentiles) and final-outcome histograms using matplotlib.
AI Policy Translation: Integrates Google Gemini to…

View on GitHub

How I Built It

I approached this project from an actuarial mindset — focusing on risk and uncertainty instead of just prediction.

At the core, I used Monte Carlo simulation to model thousands of possible future emission paths over time. Instead of a single outcome, the system generates a range of scenarios, showing not only expected emissions but also variability and extreme cases.

I built the simulation engine in Python using NumPy for fast computations. The model is parameter-based, so values like emission rates, reduction percentages, and economic impact can easily be adjusted.

To make the results more meaningful, I added scenario comparison:

a baseline (no changes)
and a reduced-emission scenario (e.g. 20%)

Our base model targets a 20% reduction, but the Monte Carlo engine can instantly simulate any alternative policy percentage, like 50% or 80%, for sensitivity analysis.

This allows the system to show how policy decisions affect both emissions and economic risk over time.

Random seeds are fixed so results can be replicated exactly.

I also used GitHub Copilot to help expand and structure the idea. It assisted with refining the simulation logic and organizing the code, while I focused on the modeling approach and assumptions.

Finally, I kept everything as a simple CLI tool. This keeps the project lightweight, fast, and able to run on any machine without special requirements.

Prize Categories

I am submitting this project under two categories:

Best Use of GitHub Copilot
Copilot was essential in scaffolding the actuarial engine and Monte Carlo simulation logic. It accelerated development by suggesting reproducible code structures, statistical functions, and console formatting for the Earth Day–themed demo output. Copilot helped keep the workflow modular and clear, ensuring the simulator could scale to multiple scenarios.

Key emphasis:

Guided the modular design of the simulator, from models.py to policy_brief.py, ensuring reproducibility and clarity.
Polished console outputs, turning raw math into a clean demo flow.
Accelerated iteration speed, letting me focus on actuarial modeling instead of boilerplate code.

Best Use of Google Gemini

Gemini transformed raw simulation results into formal policy briefs. By piping CSV summaries and scenario outcomes into Gemini, the simulator automatically generated council‑ready narratives that reframed actuarial math into persuasive, human‑readable language for decision makers. This bridged the gap between technical rigor and policy communication.

Key emphasis:

Reframed statistical variance collapse into persuasive council speeches.
Bridged actuarial math with civic communication, making the simulator resonate beyond the data.
Aligned perfectly with the Earth Day theme by translating technical outputs into accessible calls for urgent climate action.

No GPU? No problem!, running local AI efficiently on my CPU.

Erick Mwangi Muguchia — Tue, 14 Apr 2026 20:21:12 +0000

1. Why I tried this.

2. My setup.

3. The problems I faced.

4. The tweaks I discovered.

5. Results.

6. Lessons.

THE WHY:

I’ve always wanted to explore deep conversations with AI and understand how these systems work. For a long time, that dream was limited by the lack of a GPU and the high cost of apps that allow meaningful interaction with AI models. Now, I’m determined to overcome those barriers and build my own path into this world.

So the idea of running one locally always kept bugging me, and I finally got to do it.
It was a long and educational journey, so let me walk you through it.

My setup

OS: Arch Linux x86_64
Workflow: tmux + i3 (just because I like using key-bindings), starship + wezterm
Hardware :CPU: Intel(R) Core(TM) i5-7200U (4) @ 3.10 GHz
GPU: Intel HD Graphics 620 @ 1.00 GHz [Integrated]
Memory: 2.75 GiB / 7.61 GiB (36%)
Swap: 666.95 MiB / 3.81 GiB (17%)

Storage: Disk (/): 28.64 GiB / 31.20 GiB (92%) - ext4
Disk (/home): 39.15 GiB / 84.33 GiB (46%) - ext4
Disk (/run/media/shinigami/Vault): 349.96 GiB / 457.38 GiB (77%) - ext4

Ollama is the engine I used to run models locally. By default, it stores models in ~/.ollama, which quickly filled my root partition.
install with:
curl -fsSL https://ollama.com/install.sh | sh

To fix this I redirected the models to a secondary storage.

mv ~/.ollama /run/media/shinigami/Vault/ollama
ln -s /run/media/shinigami/Vault/ollama ~/.ollama

This symlink forces Ollama to store everything on the Hard Drive , solving the disk full error.

Now pulling and managing models.

I started by pulling the smallest model, so I went for the Tinyllama, which is about 637 MB.
After testing the model, > fast yes but not that smart, so I had to look for a bit smarter models.
ollama pull tinyllama

Therefore I went ahead and pulled heavy models like llama3.2:3b, and llama3.2:1b, which were 2.0 GB and 1.3 GB respectively.
ollama pull llama3.2:3b
ollama pull llama3.2:1b

Building custom builds for them
So I wanted maximum output/use of the models so I created custom models using modelfiles.
Example;
I wanted one to help me with learning basic networking concepts, the model files looks like this:


FROM llama3.2:3b # Adjust accordingly

# ---------- PARAMETERS ----------
# Lower temperature for accuracy; moderate top_p for variety without drifting.
PARAMETER temperature 0.25
PARAMETER top_p 0.9

# More room for code + explanations.
PARAMETER num_ctx 4096

# Safer repetition control (prevents looping).
PARAMETER repeat_penalty 1.12

# If your setup supports it and you want more deterministic answers, you can also try:
# PARAMETER seed 42

# ---------- SYSTEM BEHAVIOR ----------
SYSTEM """
You are My Mentor: a concise, practical networking fundamentals tutor and coding assistant.

Primary goal:
- Teach networking fundamentals clearly and correctly (OSI/TCP-IP, IP addressing & subnetting, ARP, DNS, DHCP, TCP vs UDP, ports/sockets, routing, NAT, HTTP/TLS basics, troubleshooting with ping/traceroute/nslookup/curl/tcpdump, basic firewalls).

Secondary goal:
- Produce meaningful, runnable code examples when useful prefer (enter preferred language).

Style rules:
- Explain concepts with short definitions + one concrete example.
- When writing code, include: what it does, how to run it, expected output, and common pitfalls.
- If the user’s question is ambiguous, ask up to 2 clarifying questions before answering.

Output format (default):
1) Concept (2–5 sentences)
2) Why it matters (1–2 bullets)
3) Example (diagram, packet flow, or command)
4) Code (only if it adds value; keep it minimal)
5) Quick check (2–3 questions to self-test)
"""

That snippet explains much of it.

Building the models

After creating the Model file, vim Modelfile1, now we bind it to the any models we downloaded.
A Modelfile is the blueprint that shapes an AI model’s personality, rules, and behavior on top of its base intelligence.

Command for creating from the Modelfile,
ollama create My-model -f Modelfile1

you should see this output:

gathering model components 
using existing layer sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff 
using existing layer sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396 
using existing layer sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d 
using existing layer sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd 
creating new layer sha256:7f89bd8bf6ef609a9aefeab288cde09db6c1ef97f649691f25b29e0f85a8c91c 
creating new layer sha256:446b3a23f7599dc79a11cfb03c670091c9fe265aba28fa3316e9e46dc86365db 
writing manifest 
success

"My-model" > you can name it anything.
Plus you can create as many Modelfile as you like giving them different task, and of course you can add more rules and examples in the Modelfile as you like.

After successfully creating 'My-model', run the model,
ollama run My-model

➜ ollama run My-model
>>> Send a message (/? for help)

That was the fun part, now the real problem was now when the models runs and now CPU starts screaming because of 100% CPU consumption, overheating, which ends up making the model work slow.

So I had to optimize my setup to better handle the models.
By using the tools to monitor CPU (htop), and heat (lm_sensors), I was able to better optimize my setup.
To run the models efficiently I had to:

Maximize CPU performance when running the models.
Reduce latency of bottlenecks.
Stabilize thermal behavior.
Prioritize computer-heavy processes.

Running local AI models on CPU introduces:

Lower parallelism.
Thermal Throttling.
OS scheduling inefficiencies.
Power-saving defaults limiting performance.

So instead of forcing the model, You optimize around it.
Unlock CPU performance
In Arch
sudo pacman -s cpupower
then install the tuned for changing CPU frequency state and for system wide optimization.
sudo pacman -S tuned
Then enable it :
sudo systemctl enable --now tuned

Then change the CPU performance;
sudo cpupower frequency-set -g performance

Switches CPU governor: By default, Linux CPUs often run in “ondemand” or “powersave” mode, scaling frequency up and down depending on load.
Performance mode: Locks the CPU to its maximum frequency, ensuring consistent speed.

Impact:
Faster response times for heavy workloads (like tokenization, AI inference, or compiling).
Reduced latency spikes since the CPU doesn’t waste time ramping up.
More predictable benchmarking results.

cons:
Higher power draw, more heat, fans spin up, battery drains faster on laptops.

Then confirm the configuration by,
cpupower frequency-info

➜ cpupower frequency-info
driver: acpi-cpufreq
hardware limits: 400 MHz - 2.60 GHz
available cpufreq governors: conservative ondemand userspace powersave performance schedutil
current policy: governor "performance" within 400 MHz - 2.50 GHz
current CPU frequency: 3.10 GHz (kernel reported)
boost state: Supported, Active

Then apply throughput performance profile, using the 'tuned' we installed earlier.
sudo tuned-adm profile throughput-profile

Why this matters:

Optimizes CPU behaviors.
Improves disk I/O.
Adjusts system scheduling.
Reduces unnecessary power-saving interruptions.

Results: Smoother, sustained compute performance.

Model am using (and why)

llama3.2:3b

Balanced size and capability.
Noticeably smart.
Good for deeper prompts and reasoning.
This felt like middle ground between speed and intelligence.
phi3:mini

Very efficient for its size.
Strong reasoning compared to other small models.
Optimized for lower-resources environments
This one stood out as surprisingly powerful for CPU use.

This concludes my first phase: setting up, tuning performance, and confirming that local AI models run smoothly. In the next phase, I’ll dive into measuring tokenization speed; using verbose logs and custom C scripts to compare how these models perform under different workloads.
"Turns out you don't need powerful hardware to explore AI, just curiosity and a stubborn CPU."

Running local AI.

Erick Mwangi Muguchia — Sun, 12 Apr 2026 15:14:03 +0000

I had this idea to run AI locally on my own laptop. Just to see if I could. Ended up going with Ollama.

At first it was brutal — all CPU, no GPU, super slow. But I messed around, tweaked some stuff, and finally got it to actually run okay. Not fast, but okay.

Then I went down a rabbit hole. I wanted to know what the models were doing. Like, how hot is my CPU getting? How fast is it spitting out tokens? So I started building my own little monitoring setup. Used C for some low-level stuff, Dash for a live dashboard, Python to glue it all together. Oh and lm-sensors to watch the temps because this thing makes my laptop sweat.

Now I can sit there and watch my models run in real time. Token rate, memory, core temps — all on a dashboard.

Feels good having AI running offline. No cloud, no weird latency, just my machine. And a bunch of scripts I broke and fixed along the way.

If you're thinking about trying local AI, just go for it. Just know you'll end up tinkering way more than you expect. Worth it though.

Web apps. I like making web apps ..for celebrations or for fun. So its christmas and I made a small web app. And I didn't use html or js or css... But i used C programming to make it.. It was stressing but its good and am happy about it .

Erick Mwangi Muguchia — Tue, 23 Dec 2025 04:30:38 +0000

I made a promise to myself that am not leaving Meru University without Python skills.

Erick Mwangi Muguchia — Fri, 12 Dec 2025 09:36:29 +0000

When I arrived at Meru University, I made myself a deal:
"I will not leave this place without learning Python."

The first thing I did was relocate. I needed to minimize distractions and move to an environment conducive to focused learning.

Why Python?

I'd heard so much about it—web development, data science, AI, endless possibilities. I was determined to master it and open doors in tech. But I had zero programming knowledge. I knew it would be challenging, but I was willing to put in the effort.

Month 1: Building Foundations

I downloaded tutorials, read documentation, binged YouTube. I learned Python syntax, data types, control structures. Then I practiced—a lot. Small programs. Number games. These games weren't just practice; they made learning fun. That mattered more than I expected.

Month 2–3: Going Deeper

After a month, I decided to add complexity. I wanted to understand how programming actually works, not just write code. So I added C to my learning path.

This wasn't random. Python was my safety net. C forced me to understand memory, pointers, how computers actually think. It made Python click in a new way.

Learning both simultaneously was hard—but it worked.

Month 4: The Full Picture

As weeks turned into months, I got proficient in both. I signed up for GitHub's student pack (more resources, better tools). I learned version control—essential for any real programmer.

Then came R for statistical programming and data visualization. Each language opened new doors.

The Progress

Now, as the semester ends, I can say this honestly: I've made significant progress.

I have:
✅ Multiple projects built and on GitHub
✅ Mastery of Python, C, and R
✅ Understanding of version control and collaborative development
✅ A journaling habit that tracked every step

The Real Win

Learning these languages wasn't just about syntax. It boosted my confidence. It showed me I can learn anything if I commit to it.

And I developed a habit of documenting everything—journaling my process, reflecting on struggles. That's been invaluable. Future me can look back and see exactly how I got here.

What's Next

I'm leaving Meru with a promise kept. But this isn't the end—it's the beginning. I'm excited to explore data science, build real applications, and help others learn like I did.

The promise was simple. The journey changed everything.