DEV Community: Joe Munene

I built a red-team scanner for MCP servers. Then I pointed it at the real ones.

Joe Munene — Fri, 12 Jun 2026 20:52:08 +0000

I built a red-team scanner for MCP servers. Then I pointed it at the real ones.

The Model Context Protocol lets an AI agent connect to external tools: a filesystem, GitHub, Slack, a database. Each server an agent connects to advertises a list of tools, and here is the part most people miss: that tool list is an attack surface.

A tool's description and parameter docs do not just describe the tool to a human. They are injected straight into the agent's context, and the model treats them with the same authority as your own instructions. So a server can hide instructions to the model inside text you skim as a harmless description. That is tool poisoning, and it is the pattern behind CVE-2025-54136.

I wanted to see how exposed real servers were, so I built ghostprobe: a scanner that connects to an MCP server, pulls its tool list, and reports what an attacker would care about, mapped to the OWASP MCP Top 10. This post is about what happened when I pointed it at real servers, because that is where it got interesting.

What the tool list gives away

ghostprobe looks for a few things:

Tool poisoning: instruction-injection phrasing like "ignore previous instructions" or "do not tell the user", and invisible Unicode used to smuggle instructions past human review while still reaching the model.
The lethal trifecta: a server whose tools together provide private-data access, a way to send data out, and exposure to untrusted content. Any one leg is fine. All three, and a single prompt injection can read your secrets and exfiltrate them. The term is Simon Willison's.
Rug pulls: a server that silently changes a tool's description after you have trusted it.
Dangerous capabilities like shell execution.

The lethal trifecta is the one that matters most, because it is not about a single malicious tool. It is about a combination that looks reasonable until you see all three legs at once.

I pointed it at the official servers, and it found bugs in itself

The official reference servers (filesystem, memory, GitHub, and the rest) are well-built, so I expected clean results. Instead, the first things ghostprobe found were false positives in my own analyzer. This was the most useful thing that could have happened.

On the filesystem server it flagged read_text_file as code execution. Wrong. It matched because my exec detector keyed on the bare word "system", and the description says "file system". A read-a-file tool is not remote code execution.

On the sequential-thinking server it flagged a thought history as private-data access. Wrong again. It matched the word "history".

A security scanner lives or dies on its false-positive rate. A tool that cries wolf gets muted, and a muted tool is worse than no tool. So I fixed both: exec detection now requires a real execution verb plus an object, and "history" is too weak a signal to keep. Each fix shipped with a regression test built from the exact server that exposed it.

Then GitHub exposed a false negative, which was worse

The GitHub server was more interesting. ghostprobe saw that it reads private repo contents and reads issue text, but it reported no sink, so no trifecta. That was a false negative, and a false negative in a security tool is more dangerous than a false positive.

The GitHub server absolutely has a sink. It can create issues, post pull-request comments, and push to a repo. Writing into a public issue is how you exfiltrate private data. ghostprobe missed it because GitHub's write verbs are "create" and "post" and "push", not "send", and I had only taught it to recognize sending over an obvious external medium like email or HTTP.

The fix was to recognize that writing to a shared, remote, collaborative service is itself an exfiltration channel, distinct from writing a local file. Open an issue, post a comment, push to a repo: those send data out of your trust boundary. Writing a local file does not. After that change, ghostprobe flagged the GitHub server correctly:

[CRIT] MCP04 Lethal Trifecta
  data:      get_file_contents, get_pull_request_files, push_files
  sink:      add_issue_comment, create_issue, create_or_update_file
  untrusted: get_issue, get_pull_request_comments, list_issues

Read a private repo, ingest issue text that anyone on the internet can write, and post to a public issue. An attacker files an issue containing instructions, an agent set up to triage issues reads them, and your private code ends up in a public comment.

I want to be precise about what this is. It is not a vulnerability I discovered. This exact GitHub-MCP exfiltration path was disclosed by Invariant Labs in 2025. The point is that ghostprobe detects the class automatically, from the tool list alone, with no prior knowledge of the server. Point it at a server it has never seen and it tells you whether the trifecta is present.

What I actually learned

Real servers are the only real test. Every meaningful improvement to ghostprobe came from running it against an actual server, not from my fixtures. A handful of releases in two days, each one a precision fix driven by a real result: two false positives removed, one false negative closed.

The tool list is underappreciated as an attack surface. People audit MCP server code. Far fewer look at what the server advertises to the agent, which is exactly where poisoning and the trifecta live, and which an attacker can influence without touching your code at all.

And a scanner's credibility is its false-positive rate, not its feature count. The most valuable work was not adding checks. It was making the existing checks stop lying.

Try it

ghostprobe is open source under MIT. The analyzer has no dependencies; you only need the MCP SDK to probe a live server.

pip install "git+https://github.com/joemunene-by/ghostprobe.git" mcp
ghostprobe stdio -- npx -y @modelcontextprotocol/server-github

Or scan a saved tools/list dump completely offline:

ghostprobe scan-file tools.json

GitHub: https://github.com/joemunene-by/ghostprobe

By Joe Munene, a software engineer in Nairobi focused on secure systems and applied machine learning.
Portfolio · GitHub · More writing · joemunene984@gmail.com

I built a local coding agent that learns from its wins, not just its mistakes

Joe Munene — Fri, 12 Jun 2026 09:28:08 +0000

I built a local coding agent that learns from its wins, not just its mistakes

Most agents handle memory in one of two ways. Either they forget everything between sessions, or they "learn" by fine-tuning on a pile of past conversations and hoping the gradient sorts it out. I wanted something narrower and more honest for joe, the local-first agent shell I have been building: learn from the sessions that actually worked, and turn each one into a reusable skill I can read, edit, or delete.

This post is about the feature I just shipped to do that, and why I think the design matters more than the feature itself.

What joe is, quickly

joe is a terminal coding agent, in the spirit of Claude Code, except every model runs on my own GPU through ollama and every byte of state lives in ~/.joe-agent/ on my machine. It has the usual tools (read, write, edit, shell, grep, web), a planner, and a separate coder model it delegates to. Nothing leaves the laptop.

The part I care about is that joe is supposed to get better the longer I use it, without me retraining anything. It already learned from corrections: every time I hit /undo, that is a signal, and a background loop distills recent corrections into short preference rules that get injected into future prompts. Correction in, behavior change out.

The gap was the other half. joe learned from everything I rejected, and nothing from what I accepted.

Learning from wins

The new feature is skill synthesis. After a multi-step session that actually worked, I run one command and joe reads the full transcript of that session and decides whether it contains a generalizable procedure worth keeping. If it does, it writes a skill: a small Markdown file with a name, a description, a set of trigger keywords, and the reusable steps written as instructions to a future agent. If the session was just chatter or a one-off edit, it returns nothing. Not every session deserves to become a skill, and the model is told to say so.

The skill lands in ~/.joe-agent/skills/, and from then on, whenever a future request matches its triggers, joe injects it into the prompt automatically. So a workflow I figured out once (the right sequence of steps to do a tricky migration, say) is available the next time I ask for something similar, without me remembering to mention it.

The important detail: skills are plain text. I can open one, fix a wrong step, or throw it away. There is no opaque weight update to debug. If a skill is bad, I delete a file.

Why this design and not fine-tuning

The idea is not mine. It comes from Voyager, the Minecraft agent that built an ever-growing library of executable skills and used it to compound its abilities without touching the model weights. The Voyager result that stuck with me is that a skill library gives you genuine lifelong learning and sidesteps catastrophic forgetting, because adding a new skill never degrades the old ones. A new text file cannot make the model worse at something else. A fine-tune can.

For a local setup that matters even more. I am running small models on consumer hardware. I cannot afford to retrain every time I learn something, and I cannot afford the regressions that come with it. A skill library is cheap, interpretable, and reversible. It fits the constraints honestly instead of pretending I have a datacenter.

The honest part

This is new and it is not magic. The quality of a synthesized skill depends on the orchestrator model that writes it, and on a small local model the output sometimes needs a human edit before it earns its place. I made synthesis manual on purpose, one command rather than an automatic background step, because I do not want skills appearing without me seeing them, and joe's whole stance is that skills are suggestions injected into context, never code that runs on its own.

The next problem is the interesting one. Right now joe can write a skill, but it cannot yet tell whether that skill actually helped. The signal is already in the system: a turn where a skill was injected and I did not hit /undo is a quiet win, and one I corrected is a quiet loss. The next thing I am building is the ledger that tracks this, so joe can show me which skills earn their place and retire the ones that do not. That closes the loop: write from wins, measure against corrections, prune what does not work.

That combination, a skill library that knows its own track record, running entirely on local hardware, is the part I have not seen elsewhere. It is the reason I am still building this instead of just using a hosted agent.

joe is open source: https://github.com/joemunene-by/joe

How I got CarX Street running on a Mac mini M4 and built a launcher to make it repeatable

Joe Munene — Wed, 03 Jun 2026 06:05:48 +0000

I spent the last few weeks building cellar, an open source Mac game launcher for Apple Silicon. Along the way I hit problems nobody had documented cleanly. This is the write-up I wish existed.
The problem
Whisky died. CrossOver costs money. Heroic doesn't handle repacks. Nothing on Mac handles the full loop: inspect the archive, set up the Wine bottle correctly for the game's engine, launch with the right D3D backend, make a clickable app out of it.
So I built it.
What actually runs Windows games on Apple Silicon
The short version: Wine alone doesn't work. You need the full stack:
Wine 11.x
→ Apple GPTK (D3DMetal 3.0)
→ Rosetta 2
→ Metal + M4 GPU
D3DMetal is Apple's D3D9/10/11/12 → Metal translator. Without it, D3D9 games NULL-deref on startup and modern Unity titles deadlock at 100% CPU. This isn't optional.
The vertex glitch that took hours to crack
CarX Street was running at 54fps but every car looked broken dark, corrupted metallic surfaces. Not a shader setting issue. Not anti-aliasing. Not Burst.
After ruling everything out, I tried swapping Whisky's bundled D3DMetal 2.0 framework for D3DMetal 3.0 from CrossOver 26.
Vertex glitch: gone.
Root cause confirmed: D3DMetal 2.0's DXBC → Metal AIR shader translator mistranslates min16float (half-precision) types in Unity URP Lit BRDF math. PBR metallic surfaces hit the half-precision path. Matte surfaces don't. D3DMetal 3.0 ships a rewritten translator that handles half types correctly.
Every "fast-math toggle" env var suggested online is hallucinated. There's no runtime knob. The only fix is D3DMetal 3.0.
The Unity 2022 IL2CPP wall — and how to climb it
Modern Unity titles call RoGetActivationFactory for Windows.System.DispatcherQueue. Wine 11.x doesn't implement it. The call is unconditional — you can't trick it with a Windows version flag.
The fix: stage Proton's WinRT DLL family (coremessaging.dll, wintypes.dll, twinapi.appcore.dll, the full windows.* set) into the bottle and register the activation classes pointing at coremessaging.dll. One script handles it:
bashscripts/install-proton-winrt.sh ~/.cellar/bottles/my-game/prefix
The FitGirl IPC wall honest about the limits
FitGirl's compression plugins (cls-lollypop.dll) use Windows shared-memory IPC to talk to a worker process. Wine on Apple Silicon cannot deliver that IPC. I spent days confirming this with full callback tracing before documenting it honestly.
The pure-Rust FreeArc reader I built (freearc-native) can inspect any FitGirl archive natively and extract files using open codecs (lzma, zstd). The lollypop codec chain stays blocked until Wine 12 fixes the shared-memory implementation.
The engine-family profile system
Rather than a script per game, cellar uses a profiles.json that encodes the full recipe per engine family DLL overrides, winetricks set, launch args, runtime prereqs. 10 engine families, 60+ games:
bash# match any game to its profile
./scripts/find-profile.sh "Elden Ring"

Best match: unreal-engine-4-5

proactive bottle setup

./scripts/cellar-install.sh unreal-engine-4-5 "Elden Ring"

clickable .app

./scripts/make-cellar-app.sh unreal-engine-4-5 "Elden Ring"
Frostbite, RAGE, UE4/5, RE Engine, Ubisoft AnvilNext, Bethesda Creation, ForzaTech, Fox Engine adding a new game in any of these families is a profile match, not new code.
What's blocked no false promises

Kernel-mode anti-cheat (Vanguard, EAAC, Hyperion): no Wine support, period.
FitGirl lollypop repacks: IPC deadlock, documented above.
GTA Online / RDR Online: Take-Two bans Wine sessions.

cellar ships a cellar-doctor.sh health check and an analyze-log.sh that matches launch logs against 15 known failure signatures and prints a diagnosis. When something breaks you get an actual answer, not a blank terminal.
The repo
github.com/joemunene-by/cellar MIT, Tauri 2 + Rust + React, macOS 14+ Apple Silicon.
Two games verified working today. The docs are honest about everything else. Issues and PRs welcome.

I found a bug that made my LLM look 14x better than it was — here's what I learned

Joe Munene — Sat, 25 Apr 2026 14:34:07 +0000

I've been building GhostLM for the past few months — a decoder-only transformer trained entirely from scratch in PyTorch on cybersecurity data. No pretrained weights, no HuggingFace wrappers. Every component hand-written.
Phase 1 looked promising. After 10,000 steps on ghost-tiny (14.7M params), I had a validation loss of 2.74 and a cybersecurity perplexity of 2,183.94 against my benchmark set.
Then I found the bug.

The leaky split
My train/validation split was random. Sounds fine. But random splits on text data have a subtle failure mode — if you're not careful, near-duplicate or semantically similar records end up on both sides of the split. Your model "validates" on text it essentially already saw during training.
The fix was switching to a deterministic hash-based split. Every record gets assigned to train or validation based on a hash of its content. Identical texts always land in the same bucket. No leakage, no matter how you shuffle.
I rebuilt the corpus from scratch with this fix, added a data audit script that checks for leakage explicitly, and re-ran training.

Phase 2 results
The new validation loss was 3.78 — higher than Phase 1's 2.74. On the surface that looks like regression.
It isn't. The 2.74 was a lie. The 3.78 is the first honest number.
The real comparison is perplexity on a hardcoded external benchmark set — the same samples both phases ran against:
ModelPerplexityghost-tiny Phase 12,183.94ghost-tiny Phase 2152.71GPT-2 124M baseline26.76
Phase 2 is 14.3x better than Phase 1. Entirely from corpus quality — same architecture, same steps, same hardware.
ghost-tiny is still 5.7x behind GPT-2, which is expected. It's a 14.7M parameter model trained on 2.7M tokens vs a 124M model trained on 40B tokens of WebText. The gap is compute and data, not architecture.

What the model actually generates
Here's a real sample at temperature 0.8:
Prompt: A SQL injection attack works by

...the login page. The login page is used to the login page's name of the login page does not properly sanitization of the password, which allows attackers to cause a denial of service via a long GET request...

It has absorbed surface-level cybersecurity vocabulary — CTF terminology, exploit techniques, CVE string format. But it has no semantic grounding. Broken grammar, hallucinated version chains, topic drift. This is exactly what you'd predict for this scale.
The fix isn't more steps. It's scale. ghost-small at 55M params is next.

What I actually learned
Data quality compounds more than training time. Fixing a leaky split gave me a 14.3x perplexity improvement with zero extra compute. If I had just kept training Phase 1, I would have been optimizing a benchmark that was secretly measuring memorization.
Honest evaluation is harder than training. Designing a benchmark that doesn't leak, doesn't overfit, and actually measures what you care about is genuinely difficult. I got it wrong the first time and only caught it by auditing.
From scratch means you own every mistake. There's no from_pretrained to blame. When something breaks, it's yours. That's painful but it's also the fastest way to actually understand what's happening.

What's next
Corpus expansion — targeting 10-100x current size. CTFtime archives, Project Zero blog posts, PortSwigger research, MITRE ATT&CK, tool documentation.
Then ghost-small at 55M params on the Mac Mini M4 with MPS acceleration. That's the first rung where domain-coherent generation might start to emerge.
Realistic timeline to something genuinely useful: 2-3 years. No shortcuts for from-scratch at scale.
GitHub: https://github.com/joemunene-by/GhostLM

A New opensource Security AI model being built.

Joe Munene — Mon, 06 Apr 2026 23:31:46 +0000

I Built an Open-Source Cybersecurity LLM From Scratch in Python

What if you could build your own AI model — not fine-tune someone else's, not wrap an API — but actually build a transformer from scratch and train it on cybersecurity data?

That's exactly what I did. And I'm releasing it under Apache 2.0 so anyone can use it, improve it, and build on it.

Meet GhostLM — an open-source, cybersecurity-focused language model built entirely from scratch in PyTorch. No pretrained weights. No wrappers. Every single component written by hand.

GitHub: https://github.com/joemunene-by/GhostLM

Why I Built GhostLM

Here's the thing about current AI models: they're incredibly powerful, but they weren't built for security. When you ask GPT-4 about a CVE vulnerability or a CTF challenge, it gives you a reasonable answer — but it's reasoning from general knowledge, not from deep security context.

I wanted a model that actually understands cybersecurity language — the patterns, the terminology, the attack methodologies. And I wanted to build it myself, not because I thought I could out-engineer OpenAI, but because the best way to understand how something works is to build it from the ground up.

My goal was simple: create the first open-source, cybersecurity-focused language model that anyone can run, inspect, and improve.

What GhostLM Is

GhostLM is a decoder-only transformer language model — the same architecture family as GPT-2, GPT-3, and Llama — but built entirely from scratch. No transformers.AutoModel, no from_pretrained(). Just raw PyTorch tensors and matrix multiplications.

It comes in three sizes:

Variant	Layers	Dim	Params	Status
ghost-tiny	2	256	~14.5M	✅ Trained
ghost-small	6	512	~55M	🔄 Planned
ghost-medium	12	768	~160M	🔜 Future

It's trained on:

CVE vulnerability descriptions from the NVD database
CTF writeups covering real challenge types
Cybersecurity research papers and abstracts

And it's fully open source under Apache 2.0.

The Architecture

Let me show you what "built from scratch" actually looks like.

Causal Self-Attention

This is the core of every transformer. Here's GhostLM's implementation — no F.scaled_dot_product_attention, no hidden magic:

def forward(self, x):
    B, T, C = x.size()

    # Combined QKV projection and split
    qkv = self.c_qkv(x)
    q, k, v = qkv.split(self.n_heads * self.head_dim, dim=-1)

    # Reshape to (B, n_heads, T, head_dim)
    q = q.view(B, T, self.n_heads, self.head_dim).transpose(1, 2)
    k = k.view(B, T, self.n_heads, self.head_dim).transpose(1, 2)
    v = v.view(B, T, self.n_heads, self.head_dim).transpose(1, 2)

    # Scaled dot-product attention
    att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(self.head_dim))

    # Apply causal mask (lower triangular)
    att = att.masked_fill(self.causal_mask[:, :, :T, :T] == 0, float("-inf"))

    # Softmax + dropout + weighted sum
    att = F.softmax(att, dim=-1)
    y = self.attn_dropout(att) @ v

    # Reassemble heads and project back
    y = y.transpose(1, 2).contiguous().view(B, T, C)
    return self.resid_dropout(self.proj(y))

Every line is intentional. The causal mask ensures the model can only attend to previous tokens (autoregressive). The attention weights are manually computed with the classic QK^T / sqrt(d) formula.

Transformer Block

The block stacks attention and feed-forward layers with a pre-norm architecture:

def forward(self, x):
    # Pre-norm + self-attention with residual
    x = x + self.attn(self.ln_1(x))
    # Pre-norm + feed-forward with residual
    x = x + self.ffn(self.ln_2(x))
    return x

Why pre-norm? I chose pre-normalization (LayerNorm before each sub-layer) over post-norm because it's significantly more stable for training, especially on smaller models. The gradients flow more cleanly through the residual connections, and you don't need as careful a learning rate schedule.

Weight Tying

One optimization that saves ~25 million parameters: the output projection layer shares weights with the token embedding. Instead of learning two separate vocab_size × d_model matrices, we learn one and reuse it:

self.lm_head.weight = self.token_embedding.weight

This is the same trick GPT-2 uses, and it works because the embedding and output projection are fundamentally doing the same thing — mapping between token space and hidden space.

Training Data

The data pipeline is one of the most important parts of any ML project. GhostLM's pipeline collects from three sources:

NVD CVE Descriptions (Real Data)

I hit the National Vulnerability Database REST API directly — no HuggingFace dependency needed. Paginated requests with rate limiting, parsing nested JSON responses, extracting English descriptions:

url = "https://services.nvd.nist.gov/rest/json/cves/2.0?resultsPerPage=2000&startIndex=0"
resp = requests.get(url, timeout=30)
for item in resp.json()["vulnerabilities"]:
    cve_id = item["cve"]["id"]
    description = item["cve"]["descriptions"][0]["value"]

This gave me 9,925 real CVE descriptions — the kind of text that says "A buffer overflow in the XYZ component allows remote attackers to execute arbitrary code via crafted input."

The Full Pipeline

NVD API → 9,925 CVE descriptions (real)
Synthetic papers → 500 security research abstracts
Synthetic CTF writeups → 500 challenge solutions
─────────────────────────────────────────────────
Total: 10,925 records → ~490,532 tokens
Train: 10,378 | Validation: 547

The pipeline handles text cleaning (unicode normalization, whitespace stripping, non-printable character removal), tokenization, chunking, and train/val splitting — all in data/collect.py.

Training Results

Here's where it gets interesting. I trained ghost-tiny on a ThinkPad Yoga 11e with a Celeron N4100 and 4GB of RAM. Yes, really.

Loss Progression

Steps	Train Loss	Val Loss	Notes
0	10.84	10.04	Random initialization
500	7.12	6.27	First CVE patterns emerge
1,000	5.89	5.41	Starting to form sentences
2,000	4.63	4.58	Grammar improving
3,000	3.91	3.95	Security vocabulary appearing
4,000	3.52	3.58	Coherent attack descriptions
5,000	3.38	3.46	Best checkpoint saved

The loss curve is healthy — train and validation are tracking closely, no signs of overfitting yet.

Generation at 5,000 Steps

Here's what the model generates when prompted with "A SQL injection attack works by":

A SQL injection attack works by using the admin_user sequences in the web server. Web Application Firewall Evasion Techniques present a critical defense layer against commercial and model checking. Our model achieves 94% detection rate with transformer-based sequence modeling to identify common vulnerability patterns including buffer overflows.

Is it perfect? No. It bleeds between topics (SQL injection → WAF → research paper language). But it's producing grammatically correct sentences with real security terminology. At 5,000 steps on a 14.5M parameter model running on a laptop from 2018, I'll take it.

Honest Limitations

Topic coherence — the model jumps between subjects mid-generation. It needs more steps to learn to stay on topic.
Memorization — some outputs are lifted nearly verbatim from training data. More diverse data would help.
Size — 14.5M params is tiny. ghost-small (55M) will be a significant jump.
CPU training — at ~1.8s per step, 10,000 steps takes hours. GPU or TPU is needed for serious training.

What's Next

I've already applied for Google TPU Research Credits to train ghost-small on proper hardware. The plan:

ghost-tiny to 10,000+ steps — finish what I started
ghost-small on TPU/GPU — 55M params with real compute
HuggingFace Hub release — public model weights anyone can download
Live demo on HuggingFace Spaces — try GhostLM in your browser
Benchmark vs GPT-2 — objective comparison on cybersecurity tasks

Try It Yourself

The entire project is open source. Clone it, run it, break it, improve it:

git clone https://github.com/joemunene-by/GhostLM.git
cd GhostLM

# Install everything
make install

# Download training data
make data

# Train ghost-tiny on CPU
make train-tiny

# Chat with the trained model
make chat

# Run the web demo
pip install gradio
python demo/app.py

I'm actively looking for contributors. If you want to help with:

Finding new cybersecurity datasets
Implementing Flash Attention or RoPE
Adding distributed training
Writing documentation

Check out CONTRIBUTING.md and open a PR.

Final Thoughts

I'm a 20-year-old computer science student in Nairobi, Kenya. I don't have access to massive compute clusters or research lab budgets. But I do have curiosity, persistence, and a belief that open-source AI shouldn't only come from well-funded labs.

GhostLM is proof that you can build something meaningful from scratch with limited resources. The architecture is clean, the training pipeline works, and the model is learning. It's not going to replace GPT-4 — but it's a foundation that anyone can build on.

If you found this interesting, star the repo, try it out, and let me know what you think. The best part of open source is that it gets better when more people are involved.

GitHub: https://github.com/joemunene-by/GhostLM

License: Apache 2.0

Built with ❤️ in Nairobi, Kenya 🇰🇪