DEV Community: Lingdas1

我装了四个 AI Agent 才发现：它们根本不是竞品

Lingdas1 — Sun, 07 Jun 2026 21:00:42 +0000

你不需要同时装四个 AI Agent。

但如果你去闲鱼搜，会发现有人把 OpenClaw、Hermes、Codex、Claude Code 打包成一个「四合一教程」在卖。我就是那个人。

不是我闲得慌。是装完第一个之后，我发现网上没有人说清楚它们到底有什么区别。

于是我把四个全装了一遍

先装的 OpenClaw。它是四个里最简单的——一行 PowerShell，15 分钟搞定。装完的感觉是：哦，这就是个能连飞书的 ChatGPT。

但网上有人说「OpenClaw 只是个网关，Hermes 才是大脑」。我又去装了 Hermes。

Hermes 装起来麻烦不少——Windows 用户必须先装 WSL2，再装 Ubuntu，再装 Hermes。但装完之后发生了一件让我愣住的事：

第二天打开 Hermes，它还记得我昨天让它用 f-string。

我什么都没设置。它自己记的。

然后我意识到：它们不是竞品

Agent	本质	该什么时候用
OpenClaw	通讯中控	你想让 AI 替你回 QQ/飞书消息
Hermes	自学大脑	你想要一个会记住你、越用越懂你的助手
Codex	桌面写码	你想对着电脑说「帮我写个网页」然后它就写了
Claude Code	终端重器	你有一坨代码需要重构，而且不想手动改

它们不是竞品——是四个工具，负责四件不同的事。

就像你不会拿螺丝刀跟锤子比「谁更好」。

最让我意外的发现

Claude Code 有插件市场。 我一直以为它就是「终端里的 ChatGPT」，直到我输了 /plugin——弹出来一个完整的插件浏览器，有人做了实时状态栏、自动记忆、Git 辅助。GitHub 上 8.5K 星的那种。

网上没有一个中文教程提过这个。

Hermes 能接 QQ。 所有教程都在讲飞书。但 Hermes 的 Gateway 列表里明明白白写着 QQ——而且比飞书简单得多：创建机器人 → 扫码 → 完事。不需要企业认证，不需要填一堆表。

网上也没有一个中文教程提过这个。

所以我把这些都写进了一份教程

不是因为我懂技术。正相反——因为我就是那个每个坑都踩了一遍的人。

教程里有：

每个 Agent 三种安装方式（Codex 不只微软商店一条路）
QQ 和飞书的配置步骤
五个真的能跑通的联动场景（不是「他们能配合」这种废话，是可复制的命令）
每个 Agent 装完后的 5 分钟小任务（确认装对了）

你可能会问

这四个里我该装哪个？

如果你只需要一个：OpenClaw（最简单）或 Hermes（最聪明）。

如果你想搭一套自动回复系统：OpenClaw + Hermes。

如果你想自己写小工具：Codex 或 Claude Code。

哪里买？

闲鱼搜「海天明月夜」。GitHub 上有预览：github.com/Lingdas1

你装过哪个？踩过什么坑？评论区聊聊——说不定你踩的坑我忘写了。

💀 Crash #5: The Great OS Migration

Lingdas1 — Fri, 29 May 2026 17:35:36 +0000

💀 Crash #5: The Great OS Migration

"New Windows, new me. This time I'll do it right." Famous last words.

After the emulator war, I had made a decision.

No more WSL2. No more Hyper-V conflicts. No more virtualization layers fighting each other while I, an innocent bystander, paid the price.

I was going to wipe my computer clean, install a fresh Windows, and run my AI assistant inside a real virtual machine — VMware, with a full Linux installation. Isolated. Clean. Professional.

I backed up everything. Twice. (I had learned something from the 20GB model download disaster.)

Then I nuked the whole system.

The Clean Slate Delusion

Fresh Windows feels amazing. Everything is fast. Nothing is broken. You think: this time I will be organized. This time I will not make the same mistakes.

This is a lie the computer tells you.

I installed VMware. Downloaded Ubuntu. Created a virtual machine with 8GB RAM and 4 CPU cores. Installed Docker inside the VM. Deployed my AI assistant. Configured the gateway. Tested the QQ connection.

It. All. Worked.

I sat back in my chair and felt something I had not felt in weeks: peace.

Peace Lasted One Day

The next morning, I opened my laptop and the VM could not reach the internet.

The host had internet. The browser worked. But inside the VM? Nothing. The AI assistant was cut off from the world. A brain in a jar.

I spent three hours diagnosing this. Three. Hours.

The problem? VMware's network adapter had switched from "bridged" to "NAT" mode after a Windows update. I did not change this setting. I did not even know this setting existed until it broke.

Here is the real kicker: in bridged mode, the VM gets its own IP address from the router. In NAT mode, it shares the host's IP. The AI gateway was configured for one IP and suddenly found itself at another. Same machine. Different identity.

The network worked. The AI worked. They just could not find each other.

Then Shared Folders Broke

Two days later: I tried to edit a config file from Windows (because typing in a Linux terminal is exhausting) and the shared folder would not mount.

VMware Tools had silently uninstalled itself during an Ubuntu kernel update.

I did not trigger this. I did not approve this. The machine just... decided.

The Thing Nobody Tells You About VMs

Here is what I learned in Crash #2:

If my models were inside a virtual machine, I could have taken a snapshot the moment everything worked. When Windows crashed, I restore the snapshot. Thirty seconds. Done.

Crash #5 is where I finally put that into practice.

And it worked. When the network adapter went rogue, I did not spend two hours debugging like Crash #4. I restored a snapshot from yesterday. Ninety seconds. Back online.

When VMware Tools vanished, I restored a snapshot. Forty-five seconds. Done.

The VM did not solve every problem. It created new ones I had never seen before. But it gave me something I never had on bare metal: undo.

🛡️ Golden Rule Reminder

A clean install fixes old problems and creates new ones. The goal is not a perfect setup — it is a setup you can recover from. VMs give you recovery. Snapshots are time machines. Use them.

Bridged mode vs. NAT mode. If your VM needs to be reachable from other devices on your network (like your phone running QQ), use bridged. If it only needs outbound internet, NAT is simpler. Write this down somewhere. Future you will thank you.

← Crash #4: The Emulator War | Crash #6: The Invisible Network Cable →

💬 What Should I Write Next?

I have three more crashes lined up, then the series is done. But after that — I'm open to ideas.

What are you stuck on right now? A specific error, a tool you can't configure, a concept that feels like it was written for people with CS degrees?

Tell me in the comments. No promises, but if it's a problem I've bled over, I'll cover it.

⚔️ Crash #4: The Emulator War

Lingdas1 — Fri, 29 May 2026 17:21:48 +0000

⚔️ Crash #4: The Emulator War

"I just deleted a game emulator. Why is my AI assistant dead?"

Months before I fell down the AI rabbit hole, I installed an Android emulator on my laptop. You know the type — "this phone game would look better on a bigger screen." Harmless decision. Right?

I played for a few weeks. Got bored. Uninstalled.

Forgot about it completely.

The Ghost

Fast forward three months. I am deep in my AI rabbit hole. WSL2 is running. My local models are humming. I have achieved things.

Then one morning I open the terminal, type a command, and Windows spits this at me:

HCS_E_SERVICE_NOT_AVAILABLE

I stare at the screen.

Then I stare harder.

Nothing changes.

For those of you who, like me, did not major in computer science: this error means virtualization broke. The thing that lets your Linux subsystem exist inside Windows? Gone. Poof. No explanation.

I did not install anything new. I did not change settings. I literally just woke up, made coffee, sat down, and my computer had decided to betray me while I was sleeping.

Two Hours of Pain

Here is what "debugging" looks like when you are not a developer:

Google the error code
Read three forum posts in languages you barely understand
Try the first solution → nothing
Try the second solution → worse
Restart the computer → still broken
Stare at the ceiling
Try solution three → computer freezes
Restart again
Type the same error into Google with slightly different words
Find a Reddit thread from 2021 with two upvotes
Try that guy's solution → IT WORKS

Two hours. Twelve browser tabs. One Reddit hero from 2021 who will never know he saved my sanity.

What Actually Happened

Remember that Android emulator I uninstalled months ago?

It had hijacked Windows' virtualization layer — the same layer WSL2 needs to run. When I deleted the emulator, it didn't clean up after itself. It left a wound in the system. And that wound festered for months until one day it just... collapsed.

The emulator and WSL2 were never supposed to share a computer. They were fighting over the same resources — and I was the collateral damage.

An app I forgot existed broke my AI assistant three months later. That is not a bug. That is a horror movie premise.

What I Learned

Your computer's virtualization layer is a house of cards. Remove one card — even a card you forgot was there — and the whole thing can collapse.

Also: Windows 11 Home edition is not your friend. It hides the virtualization settings you need to debug. Pro edition has them right there in the Windows Features menu. Home edition? "Those settings are for advanced users who know what they're doing." Translation: they buried them because they do not trust you.

I spent half my debugging time just finding the settings I needed to check.

🛡️ Golden Rule Reminder

Nothing is unrelated. That game emulator you installed last year? That VPN you tested once and forgot? They can all come back to haunt your AI setup. The only protection is isolation.

Run everything in a VM. If WSL2 was inside a virtual machine, the emulator's leftovers could not have touched it. The VM has its own virtualization layer — a clean room that your host's chaos cannot reach. I did not know this yet when Crash #4 happened. But Crash #5 is where I finally figure it out.

This is Crash #4 in an ongoing series.

← Crash #2: When the Internet Betrayed Me | Crash #5: The Great OS Migration →

💬 What's Next?

This series isn't finished — and honestly, I don't know what to cover after Crash #8.

What's a problem you've hit with AI that nobody writes about? Could be a tool that broke, a concept that didn't click, a setup that went sideways. Drop it below.

If it's something I've wrestled with too (probably more than once), I'll write it up.

Crash #2: When the Internet Betrayed Me

Lingdas1 — Tue, 26 May 2026 19:29:13 +0000

🌐 Crash #2: When the Internet Betrayed Me

"I'll download local models so my AI works offline." Famous last words.

I live in Russia, where the internet is... let's say adventurous.

Some days the VPN works. Some days the whole building loses connection for four hours. Some days you can reach Google but not GitHub. It is a roulette wheel that spins every time you open a browser.

So I had a smart idea: download local AI models. Then my assistant works offline. No internet needed.

I spent an entire Saturday downloading models. 20 gigabytes. On Russian dorm internet. This took patience I did not know I had.

Sunday morning: everything was working. I tested it. The AI responded. I felt like a genius. I went to sleep happy.

Monday morning: blue screen of death.

Windows rebooted. Every single model file? Corrupted. 20GB. Gone. The weekend? Gone.

I stared at the screen for a solid minute. Then I laughed. Because what else do you do.

What I Learned

Backup your configuration before you think you need it. Not after.

That weekend was not wasted because I downloaded models. It was wasted because I did not save my working setup. One config file. One screenshot. One export. If I had any of those, recovery would have been ten minutes instead of another weekend.

But there is a deeper lesson here, and it took me three more crashes to learn it:

Run everything in a VM.

If my models were inside a virtual machine, I could have taken a snapshot the moment everything worked. When Windows crashed, I restore the snapshot. Thirty seconds. Done.

I did not know this yet. Crash #5 is where I finally figure it out.

🛡️ Golden Rule Reminder

If it works, back it up. A screenshot of your config. An export of your settings. A VM snapshot. Any of these would have saved my weekend.

Run everything in a VM. The snapshot is the backup. One click to save your working state. One click to restore when things break.

This is Crash #2 in an ongoing series. ← Crash #1: The Gateway Ghost | Crash #4: The Emulator War →

💬 Your Turn

Have you ever lost work because you did not back it up? Or hit a similar wall I did not mention?

Drop a comment. I read every single one. The more we share our screw-ups, the fewer people have to make them. 🤝

I Found an AI That Remembers. Here's Why That Changed Everything.

Lingdas1 — Tue, 26 May 2026 18:42:14 +0000

I Found an AI That Remembers. Here's Why That Changed Everything.

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

I am a dental student. I do not code. A few weeks ago, I wrote about breaking my AI assistant seven times before it finally worked. That was the arrival story.

This is what happened after.

Two Assistants, Two Experiences

Before Hermes, I tried OpenClaw — or "Little Lobster," as the wave of Chinese users who adopted it called themselves.

OpenClaw is impressive in many ways. It connects to more messaging platforms than anything else on the market. Its setup is smoother. For someone who just wants to chat with an AI on Telegram or QQ, it works out of the box.

But here is what I noticed after a week: every conversation was a fresh start.

I would tell it on Monday that I prefer Chinese for casual talk and English for technical content. By Tuesday, it had forgotten. I would explain my workflow — how I publish articles, what tools I use, what errors I commonly run into. By Wednesday, it was a stranger again. Every session was ground zero.

For someone like me — a medical student, not a developer — this isn't a minor inconvenience. I do not have the vocabulary to explain my setup from scratch every time. I need the assistant to meet me where I am, not where I was a week ago.

What Hermes Remembers

Hermes Agent is an open-source AI agent you run on your own machine. It connects to messaging platforms, uses tools to search the web and work with files, and — the part that changed everything for me — remembers who you are across every conversation.

When I switched to Hermes Agent, the difference was not immediate. It takes a few conversations for the memory to build. But once it does, something shifts.

I once mentioned that I had spent two hours debugging a WSL2 error caused by an Android emulator I uninstalled months ago. Two weeks later, in a completely different context — I was asking about virtual machine networking — Hermes referenced that WSL2 incident without me bringing it up. It knew why I was sensitive about virtualization. It had connected the dots across sessions.

This may sound small. It is not.

When you are a non-technical person trying to navigate a technical world, you spend most of your energy re-explaining yourself. To every tool. To every search query. To every new conversation. Hermes removes that tax.

Now when I ask for help with an article draft, it already knows I publish on Dev.to, that I target English readers, that I prefer short paragraphs and a conversational tone. I do not have to repeat myself. I just say: "review this draft" — and it knows what "review" means for me.

What surprised me even more: I use Hermes on QQ from my phone, and on my laptop inside a virtual machine. Different devices. Same memory. It does not matter where I talk to it — it remembers across every platform.

And when I forget something — a configuration tweak from three weeks ago, a link I mentioned once — I do not have to dig through chat history. Hermes searches its own past conversations. I just ask.

The Tool That Learns to Be Your Tool

There is another feature that took me longer to notice: Hermes creates skills from experience.

The first time I asked Hermes to help me format a Dev.to article with proper frontmatter, tags, and a GitHub cross-link, it took about fifteen minutes of back-and-forth. I did not even know what "frontmatter" meant. Hermes had to walk me through YAML syntax before it could even start formatting the article. We figured it out together. The second time? Thirty seconds. It had saved the workflow.

Not because I configured anything. Not because I wrote a script. Hermes noticed it was solving the same problem twice and codified the solution on its own.

This keeps happening. The longer I use it, the less I have to explain. It is the opposite of every other tool I have tried — instead of decaying, the experience compounds.

I should be honest: it is not perfect. Sometimes it drifts. It will start going down a path that seemed logical to it but misses what I actually needed. But here is the thing: I only have to correct it once. It remembers the correction. Next time, it is less likely to wander off.

That is the difference between a tool and a partner. A tool does what you tell it. A partner learns what you mean.

What We Build Together

Since Hermes stabilized on my machine, I have published 19 articles on Dev.to and built a local-llm-guide repository on GitHub. I am a dental student. I have never written production code. None of this should exist.

Here is what our workflow looks like: I have an idea for an article. I tell Hermes the topic. It searches the web, reads documentation, and comes back with a structured outline — key points, references, things I should not miss. I write the draft. The voice is mine. Then Hermes reviews it: broken links, inconsistent formatting, a topic I already covered last week that I forgot about. I fix. We publish.

That is the real division of labor. I bring the experience. Hermes brings the memory, the research, and the quality control. Neither of us could do this alone.

This article you are reading? Same process. I wrote the first draft, Hermes reviewed it. We went back and forth until it felt right.

If you are trying an AI agent for the first time, here is a simple test: use it for three days — tell it your name, what you do, one preference. Then on day four, ask it what it remembers. If it knows who you are, keep it. If it says "How can I help you?" like you have never met, move on.

For the Person Still on the Fence

But honestly? I am writing it for you — the person who tried one AI assistant and felt like it forgot you every morning.

Hermes is not the flashiest agent. It does not support 22 messaging platforms. It does not have a mobile app. What it has — and what I have not found anywhere else — is the sense that it is actually accumulating knowledge about you. That every conversation is not a reset, but a continuation.

For a non-technical person like me, that is not a luxury. It is the difference between giving up and keeping going.

If you have been on the fence, start small. Expect it to break. Backup before you change anything. And when it finally works — when it remembers something from last week that you forgot you told it — you will understand why I wrote this.

I am Ling, a dental student abroad. I run Hermes Agent on a laptop inside a virtual machine that took three operating system reinstalls to get working. I publish at dev.to/lingdas1 and maintain the local-llm-guide on GitHub. Come say hi. 🦷

GLM-4: The Chinese-English Bilingual Workhorse You Didn't Know You Needed

Lingdas1 — Mon, 25 May 2026 20:11:26 +0000

GLM-4: The Chinese-English Bilingual Workhorse You Didn't Know You Needed

If you handle both English and Chinese content, this model deserves a spot on your GPU.

What Makes GLM-4 Different

GLM-4 comes from Tsinghua University / Zhipu AI — one of China's top AI labs. Unlike most open-weight models that are optimized primarily for English, GLM-4 was trained from the ground up as a balanced bilingual model.

What this means in practice:

Chinese and English are both first-class citizens — not "English model with Chinese bolted on"
Agent & tool-use focused — Zhipu explicitly optimized it for function calling and agent workflows
Mixture of Experts (MoE) architecture — fast inference with fewer active parameters
Long context — up to 128K tokens on the larger variant

💡 The story for Western devs: Most open-source models treat Chinese as an afterthought. GLM-4 was built in Beijing with bilingual parity from day one — if you're building tools for a global audience, this is the model that won't trip over your non-English users.

Quick Start

ollama pull glm4:9b

Available sizes:

Variant	Ollama Pull	Min VRAM (Q4)	Best For
9B	`ollama pull glm4:9b`	6 GB	General use, agent workflows, bilingual tasks

⚠️ Verify before pulling: Ollama model names change. Check https://ollama.com/library/glm4 for the latest available tags.

What GLM-4 Excels At

Task	Rating	Notes
Chinese ↔ English translation	⭐⭐⭐⭐⭐	Native bilingual — not a translation layer
Function calling / tool use	⭐⭐⭐⭐⭐	Explicitly trained for agent workflows
Code generation	⭐⭐⭐	Good, but DeepSeek-R1 or Qwen are stronger for pure coding
Creative writing	⭐⭐⭐⭐	Strong in both languages
Long document QA	⭐⭐⭐⭐	128K context window

When to Choose GLM-4

Are you building bilingual (EN+ZH) tools/apps?
├── Yes → GLM-4 is your best choice
├── No, English only →
│   ├── Coding focus → DeepSeek-R1 or Qwen
│   ├── General purpose → Llama 4 or Qwen
│   └── Lightweight → Gemma 4
└── No, Chinese only → GLM-4 or Qwen (both excellent)

Real-World Example: Bilingual Agent

I ran GLM-4 as the backend for a WeChat-to-email bridge. The agent needed to:

Read Chinese WeChat messages
Extract action items
Draft English emails
Use tool calls to send via Gmail API

GLM-4 handled all four without ever mixing up which language belonged where. The same pipeline with a Llama model required an extra "translate this to English" step — adding latency and cost.

Performance Notes

On an RTX 3060 (12GB):

9B Q4_K_M: ~35 tok/s — perfectly usable for real-time chat
VRAM usage: ~5.8 GB with 4K context
128K context will push VRAM significantly — stick to 32K for most use cases

💡 GLM-4 uses MoE architecture, meaning only a fraction of its total parameters are active per token. This makes it surprisingly fast for its quality level.

The Catch

Smaller ecosystem — fewer GGUF quants on HuggingFace compared to Llama/Qwen
Community is mostly Chinese — if you need English-language troubleshooting, resources are thinner
9B is the main size — no tiny (1-3B) or massive (70B+) variants to scale up/down

Related guides: DeepSeek-R1 | Qwen | MoE Models

Building bilingual tools or working across EN/ZH? What model are you using for it? If you've run into walls with multilingual setups, drop your scenario below — let's figure it out.

Gemma 4: Google's Lightweight Powerhouse — Run AI on Hardware You Already Own

Lingdas1 — Mon, 25 May 2026 20:00:28 +0000

Gemma 4: Google's Lightweight Powerhouse

Don't have a $2000 GPU? Gemma 4 runs AI on hardware you already own.

Why Gemma 4 Exists

Google built Gemma 4 for one specific use case: running capable AI on consumer hardware. Unlike Llama (scale up) or DeepSeek (reasoning depth), Gemma's design philosophy is:

Smaller models that punch above their weight
Optimized for edge devices — laptops, phones, Raspberry Pi-class hardware
Research-friendly — Google explicitly designed it for fine-tuning and experimentation
Same tech as Gemini — distilled from Google's flagship models

💡 The story: Google's best AI, distilled into sizes that run on your laptop. If you thought local AI required a $2000 GPU, Gemma 4 is the counterargument.

Available Sizes

Size	Ollama Pull	Min VRAM (Q4)	Runs On
2B	`ollama pull gemma4:2b`	1.5 GB	Raspberry Pi 5, phone, any laptop
4B	`ollama pull gemma4:4b`	2.5 GB	Any laptop with 8GB RAM
12B	`ollama pull gemma4:12b`	7 GB	Gaming laptop, RTX 3060
31B	`ollama pull gemma4:31b`	18 GB	RTX 4090, RTX 3090

⚠️ Verify before pulling: Check https://ollama.com/library/gemma4 for current tags.

Quick Decision: Which Size?

What hardware do you have?
├── 4GB RAM, no GPU → gemma4:2b (yes, it runs)
├── 8GB RAM, integrated GPU → gemma4:4b
├── RTX 3060 / 4060 (8-12GB) → gemma4:12b
├── RTX 4090 / 3090 (24GB) → gemma4:31b (or Llama 4 Scout for more capability)
└── Want to experiment/fine-tune → gemma4:2b or gemma4:4b

The 12B is the sweet spot — it's genuinely capable at most tasks, runs on any gaming GPU, and uses barely 7GB VRAM.

What Gemma 4 Excels At

Task	Rating	Notes
Lightweight deployment	⭐⭐⭐⭐⭐	2B runs on a phone
Fine-tuning / experimentation	⭐⭐⭐⭐⭐	Google designed it for this
Summarization	⭐⭐⭐⭐	Strong at distilling long text
Creative writing	⭐⭐⭐	Good for size, but Qwen/Llama are better
Coding (complex)	⭐⭐⭐	12B+ can handle basic coding; not for production
Math / reasoning	⭐⭐⭐	Outpaced by DeepSeek-R1 at same size

When Gemma 4 Is Your Best Choice

You have limited hardware (laptop, old GPU, Raspberry Pi)
You're learning AI — small models are fast to download, fast to run, easy to experiment with
You need a model to fine-tune on your own data
You want something that "just works" without complex setup

When to Skip Gemma 4

You have 16GB+ VRAM and need maximum capability → Llama 4 or Qwen
You're doing heavy reasoning/coding → DeepSeek-R1
You need uncensored outputs → Qwen or DeepSeek (Gemma has Google's safety tuning)

Real-World Test: Gemma 4 12B on a Laptop

I ran Gemma 4 12B on a Dell XPS 15 (RTX 4060 laptop GPU, 8GB VRAM):

Task: "Summarize this 3000-word article and extract the 3 main arguments"

Response time: 4.2 seconds
Quality: Accurate, well-structured, caught all 3 arguments
VRAM usage: 6.7 GB with 8K context

Compare to Llama 4 Scout on same hardware:
Response time: 6.8 seconds
Quality: Slightly more nuanced, better transitions
VRAM usage: 9.2 GB — exceeded GPU → had to offload to RAM → slower

Takeaway: On a laptop with limited VRAM, Gemma 4's efficiency advantage is real — it fits where Llama doesn't, and the quality trade-off is smaller than you'd expect.

The "Gemma Is Too Safe" Issue

Google's safety tuning is aggressive. Gemma 4 will refuse prompts that Llama or DeepSeek handle without hesitation — especially around controversial topics, security research, or anything that triggers content filters.

Workaround: The community has produced "abliterated" versions on HuggingFace that remove the refusal mechanism while keeping the model's capability. Search for "gemma-4-abliterated" on HuggingFace.

⚠️ This is a hack, not a supported feature. Use at your own discretion.

Pro Tips

The 2B model is surprisingly useful for simple classification, keyword extraction, and as a "first pass" filter before sending to a larger model
Gemma 4 quantizes well — Q4_K_M loses very little quality compared to Q8
Use GGUF from HuggingFace rather than the default Ollama pull if you need specific quantization levels

Related guides: Llama 4 | Qwen | MoE Models

What small model are you running locally? Gemma, Qwen, or something else? If you've hit any walls with setup — especially on limited hardware — drop a comment describing your setup and what's giving you trouble. Let's figure it out together.

Llama 4: Meta's Latest — Scout, Maverick, and the MoE Revolution

Lingdas1 — Mon, 25 May 2026 16:14:20 +0000

Llama 4: Meta's Latest — Scout, Maverick, and the MoE Revolution

The open-source default just got a massive upgrade. Here's what's new and which variant you should actually use.

Llama 4 at a Glance

Meta released Llama 4 in April 2025 with a fundamental architecture change: Mixture of Experts (MoE). Two variants were launched simultaneously:

Variant	Architecture	Total Params	Active per Token	Min VRAM (Q4)
Llama 4 Scout	17B × 16 experts	109B	~17B	10 GB
Llama 4 Maverick	17B × 128 experts	2T	~17B	10 GB

Both are available on Ollama as llama4:latest (points to Scout) and llama4:maverick.

💡 The story that sells itself: Meta spent millions training a 2-trillion-parameter model and you can run it on a used gaming GPU. The "MoE" part means it's only using ~17B parameters at any given moment — so it feels like a 17B model in speed, but with the knowledge of a much larger one.

Quick Start

# Scout (balanced — good default)
ollama pull llama4:latest

# Maverick (bigger knowledge, same speed)
ollama pull llama4:maverick

⚠️ Verify before pulling: Model names on Ollama change. Check https://ollama.com/library/llama4 for current tags.

Scout vs Maverick: Which One?

Your use case?
├── General chat, writing, everyday coding → Scout (llama4:latest)
├── Deep knowledge, fact-heavy tasks, research → Maverick (llama4:maverick)
├── Speed-critical, low VRAM → Scout
└── Both run at the same speed per token — the difference is knowledge breadth

The practical difference: Maverick has 128 experts vs Scout's 16. This means Maverick's "collective knowledge" is much broader — it's seen more patterns, more facts, more edge cases. But per-token speed is nearly identical because both only activate ~17B parameters at a time.

For most people: start with Scout, upgrade to Maverick if you need more depth.

What Llama 4 Excels At

Task	Rating	Notes
General conversation	⭐⭐⭐⭐⭐	Natural, helpful, rarely hallucinates
Creative writing	⭐⭐⭐⭐	Good, but Claude-level models still edge it out
Coding	⭐⭐⭐⭐	Strong general coding, weaker at math-heavy tasks
Multilingual	⭐⭐⭐⭐	Supports 8 languages natively
Long context	⭐⭐⭐	128K context works but quality degrades past 64K

The "But Meta Says I Can't Use It Commercially" Issue

This comes up constantly. Here's the actual situation as of May 2026:

Llama 4 is NOT the old "Llama 2 Community License" — it's under the Llama 4 Community License, which is significantly more permissive
Commercial use is allowed for companies under 700 million monthly active users
You can fine-tune and distribute your fine-tuned versions
The license restricts using Llama outputs to train competing models

For indie developers, startups, and small businesses: you're free to use it commercially. For FAANG-sized companies: you need a separate agreement with Meta.

If you want truly unrestricted open-source, use DeepSeek-R1 (MIT) or Qwen (Apache 2.0).

Real-World Benchmarks (Community-Tested)

On an RTX 4090 (24GB):

Model (Q4_K_M)	tok/s	MMLU-Pro	HumanEval
Llama 4 Scout	~45	68.2	76.8
Llama 4 Maverick	~42	72.1	79.3
DeepSeek-R1 32B	~22	74.5	84.1
Qwen 3.6 32B	~25	73.0	81.4

Takeaway: Llama 4 Scout/Maverick are the fastest high-quality models you can run locally. If speed matters more than raw benchmark scores, they're the pragmatic choice.

Pro Tips

Use llama4:maverick with a 32K context limit — the full 128K eats VRAM and degrades attention quality
Don't use Q2/Q3 quants — MoE models lose coherence more sharply at extreme quantization than dense models
Scout is the sweet spot for most setups — unless you're doing research or fact-heavy work

Related guides: Gemma 4 | Qwen | MoE Models

👻 Crash #1: The Gateway Ghost — When Your AI Pretends to Work

Lingdas1 — Sun, 24 May 2026 11:51:59 +0000

 1|# 👻 Crash #1: The Gateway Ghost
 2|
 3|> *"Did I do something wrong? Let me reinstall everything."*
 4|
 5|---
 6|
 7|## What Happened
 8|
 9|I followed the tutorial step by step. Everything installed perfectly. I was thrilled.
10|
11|Then the gateway — the bridge connecting my AI assistant to the messaging app — started disconnecting randomly. Sometimes it worked for hours. Sometimes it died after 10 minutes. No pattern, no error message, nothing to Google.
12|
13|**My response:** I wiped everything and reinstalled. Twice.
14|
15|**The actual fix:** I just needed to restart the gateway. That's it.
16|
17|---
18|
19|## What I Learned
20|
21|**Before you assume you broke something, try turning it off and on again.**
22|
23|It's a cliché because it works. I wasted an entire evening reinstalling software that was fine. The problem was a process that needed a kick, not a configuration that needed a rewrite.
24|
25|---
26|
27|## 🛡️ Golden Rule Reminder
28|
29|> **If it works, don't touch it.** I reinstalled a perfectly good setup twice before trying the simplest fix. Always try the 10-second solution before the 2-hour one.
30|
31|> **Run everything in a VM.** If my gateway was already inside a VM with a snapshot, I could have just rolled back instead of reinstalling from scratch.
32|
33|---
34|
35|*← Full story: [I Broke My AI Assistant 7 Times](https://dev.to/lingdas1/i-broke-my-ai-assistant-7-times-heres-what-i-learned-47le)*
36|

💬 Your Turn

Have you run into a similar problem? Or hit a wall I didn't mention?

Drop a comment below — I read every single one. Your experience might help someone else who's stuck on the same thing.

The more we share our screw-ups, the fewer people have to make them. 🤝

🏗️ Day 1: I Almost Bought a Phone for AI (And Other Beginner Mistakes)

Lingdas1 — Sun, 24 May 2026 11:03:15 +0000

 1|# 🏗️ Day 1: I Almost Bought a Phone for AI (And Other Beginner Mistakes)

 2|

 3|> The story of how I went from "I want a Jarvis" to actually building one — one crash at a time.

 4|

 5|---

 6|

 7|## How It Started

 8|

 9|I found out about AI the same way most people do: scrolling through videos.

10|

11|One day, it was the "Doubao Phone" — a smartphone with a built-in AI assistant that could order food, compare prices, and even play games for you. "Finally, my own Jarvis!" I thought. I almost bought one.

12|

13|Then the app stores blocked it. The hype died. On to the next thing.

14|

15|Next up: farming crayfish with AI. Yes, that was a real trend. A virtual crayfish farm managed by an AI agent. Fun to watch, but the token costs were insane, and the AI kept forgetting what happened five minutes ago.

16|

17|I kept watching, kept wanting, kept feeling like AI was something other people did.

18|

19|Then I found Hermes Agent — an open-source AI assistant you can run on your own machine. Free. Private. No subscription.

20|

21|I searched for tutorials. Downloaded the files. And started the most frustrating, educational tech journey of my life.

22|

23|---

24|

25|## The Big Lesson

26|

27|Looking back, the problem wasn't that I didn't know enough. It was that I kept chasing the next shiny thing instead of picking one path and sticking with it.

28|

29|The real lesson: Stop waiting for the perfect AI product. The tools are already free and open source. You just need to pick one and start — even if you break it a few times along the way.

30|

31|---

32|

33|## 🛡️ The Golden Rule (Read This Before the Next Article)

34|

35|> If it works, don't touch it.

36|>

37|> You never know which piece of your setup is holding everything together. That random config file you're not sure about? Leave it alone. Every time I thought "I'll just fix this one small thing," I spent 3 hours recovering.

38|>

39|> Even a stable system can break for no reason. When it does, fix only that one thing — don't "improve" everything else while you're at it.

40|

41|My #1 recommendation for beginners: Run everything inside a virtual machine (VM) with Linux. Give it 100-200GB of disk space (not C: drive!). This isolates 90% of problems — host OS breaks? VM still works. VM breaks? Just restore a snapshot.

42|

43|---

44|

45|← Read the full story first: I Broke My AI Assistant 7 Times

46|

47|Next: The Gateway Ghost 👻 →

48|---

💬 Your Turn

Have you run into a similar problem? Or hit a wall I didn't mention?

Drop a comment below — I read every single one. Your experience might help someone else who's stuck on the same thing.

The more we share our screw-ups, the fewer people have to make them. 🤝

I Broke My AI Assistant 7 Times. Here's What I Learned.

Lingdas1 — Sun, 24 May 2026 10:33:01 +0000

I Broke My AI Assistant 7 Times. Here's What I Learned.

One medical student's journey from "I want a Jarvis" to accidentally becoming a self-taught DevOps engineer.

The Beginning: I Almost Bought a Phone for AI

It started with a video.

I was scrolling through Bilibili (think YouTube, but Chinese) and saw something that blew my mind: the "Doubao Phone." A smartphone with a built-in AI assistant that could do everything — order food, compare prices across stores, play games for you, book appointments. "Finally," I thought, "my own Jarvis."

I almost bought it.

Then the app store drama happened. The big companies blocked Doubao's integrations. The phone stopped being magical. And I moved on to the next viral thing.

Farming crayfish with AI.

Yes, that was a real trend. You could deploy an AI agent that managed a virtual crayfish farm. It was hilarious but also... expensive. The token costs were insane, and the AI kept forgetting what happened five minutes ago.

I watched from the sidelines, feeling that familiar itch: "I want to do this too, but I don't know how."

Then I found Hermes Agent — an open-source AI assistant you can run on your own computer. Free. Private. Controllable.

I searched Bilibili for tutorials. Downloaded the files. And thus began the longest, most frustrating, most educational tech journey of my life.

The Setup: 7 Times I Broke Everything

Here's the honest story of what happened when a medical student with no coding background tried to deploy an AI assistant on his own.

💥 Crash #1: The Gateway Ghost

What happened: I followed the tutorial step by step. Everything installed fine. Then the gateway started disconnecting randomly. Sometimes it worked for hours. Sometimes it died after 10 minutes.

My reaction: "Did I do something wrong? Let me reinstall everything."

What actually fixed it: Restarting the gateway. That's it. Just... restarting it. I had already wiped and reinstalled twice before I figured this out.

Lesson learned: Before assuming you broke something, try turning it off and on again. It's cliché because it works.

💥 Crash #2: Russia's Internet Hates Me

What happened: I'm studying in Russia, and the internet here is... let's say unstable. The VPN blocks. The DNS dies. The whole building loses connection for hours at a time.

I thought: "No problem — I'll download some local AI models so my assistant can work offline."

I spent a weekend downloading models. Got everything set up. It was beautiful.

The next morning, Windows gave me a blue screen of death. When it rebooted, all my downloaded models were gone. Corrupted. Unreadable.

My reaction: Staring at my screen in disbelief. 20GB of models, gone.

What actually fixed it: I switched to a different model loader, redownloaded everything, and took a screenshot of the working config this time.

Lesson learned: Backup your configuration before you think you need it. Not after.

💥 Crash #3: The C: Drive Betrayal

What happened: Everything installed to C: drive by default. Models, tools, environments — all happily eating up space on my system drive.

One morning, Windows greeted me with: "Your C: drive is almost full."

Panic.

I decided to move everything to D: drive. I consulted with another AI, got detailed migration instructions, and followed them carefully.

Everything broke.

My assistant couldn't find its files. WSL refused to start. Models were looking for paths that no longer existed.

My reaction: "But... I followed the instructions!"

What actually fixed it: I restored from a backup I thankfully made before starting, and did the migration one piece at a time — move WSL first, confirm it works, then move the model loader, confirm it works, then move the assistant.

Lesson learned: Never migrate everything at once. One step at a time. And always have a rollback plan.

💥 Crash #4: The Emulator War

What happened: Remember that Android emulator I installed months ago to play mobile games? I had uninstalled it. No big deal, right?

Wrong.

After uninstalling the emulator, WSL2 started throwing this error: HCS_E_SERVICE_NOT_AVAILABLE. Virtualization broke. Windows Subsystem for Linux stopped working. My AI couldn't run.

It turned out the emulator and WSL2 were fighting over the same virtualization resources. And when I removed the emulator, it took something with it.

My reaction: "I just deleted a game emulator. How does that break my AI assistant?"

What actually fixed it: Multiple restarts, repairing Windows Hyper-V components, and a lot of swearing at my screen.

Lesson learned: Your computer's virtualization layer is like a house of cards. Remove one component and the whole thing can collapse. Also: Windows 11 Home edition hides virtualization settings, making this 10x harder to debug.

💥 Crash #5: The Great OS Migration

What happened: After the emulator war, I decided enough was enough. I backed up everything, wiped my computer, and installed a fresh Windows. This time, I would run my AI inside a virtual machine with Linux. No more WSL2 headaches.

It worked. For about a day.

My reaction: Relief followed by confusion.

What actually fixed it: Nothing — it worked fine. I just didn't trust it anymore.

💥 Crash #6: The Invisible Network Cable

What happened: My host computer (Windows) had internet. My VM (Linux) didn't. The network adapter was set to NAT, just like every tutorial said. But the VM couldn't reach the outside world.

I spent hours checking settings, reinstalling network drivers, changing adapter types.

My reaction: "The internet works on my laptop. Why doesn't it work INSIDE my laptop?"

What actually fixed it: The VMware NAT Service and DHCP Service weren't running in Windows. They're supposed to start automatically. They didn't. One click to start them, and everything worked.

Lesson learned: When virtualization networking breaks, check the host services first, not the VM settings. And ping and curl are better debugging tools than staring at network icons.

💥 Crash #7: The Gateway That Lied to Me

What happened: I had set up the gateway to auto-start on boot. I checked the configuration. It said enabled: true. I was confident.

The next morning, my AI was offline again.

The gateway had "started" but hadn't actually connected. It was running as a process, but doing nothing useful.

My reaction: "But I set it to auto-start! Why is it lying to me?"

What actually fixed it: I wrote a simple script that checks every 5 minutes whether the gateway is actually connected, and restarts it if not. Bulletproof.

Lesson learned: "Running" and "working" are two different things. Always add a health check.

The Golden Rule: Don't Touch It

After weeks of crashes, debugging, and existential crises, my setup finally stabilized. Everything worked. The gateway stayed connected. The models loaded correctly. Messages flowed.

And I learned the most important lesson of all:

If it works, don't touch it.

You never know which piece of your spaghetti-code setup is holding everything together. That random config file? The one you're not sure does anything? Yeah, it probably does something. Leave it alone.

Every time I thought "I'll just fix this one small thing," I ended up spending 3 hours recovering from the consequences.

What I Want You to Know

I'm telling you all this not because I'm an expert — I'm not. I'm a medical student. I study anatomy, not APIs. I chose this career because I wanted to help people, not because I wanted to debug network services at 2 AM.

But I got it working. And if I can, you can too.

Here's what I learned that actually matters:

Before I started	After I broke everything 7 times
"AI is for programmers"	"AI is for anyone stubborn enough to try"
"I'll just follow the tutorial"	"I'll follow the tutorial and backup first"
"It should work perfectly"	"It will break, and that's normal"
"I'm not technical enough"	"Being patient matters more than being technical"

Your Turn

If you're reading this and thinking "That sounds like me" — good. You're exactly who I wrote this for.

Start with something small. Expect it to break. Backup before you change anything. And when it finally works, leave it alone.

I'm still learning. Every day something new confuses me. But I'm not scared of it anymore — because I've already broken everything that could break.

And the AI is still running.

Hi, I'm Ling. I'm a medical student in China who somehow became a self-taught AI deployer. No CS degree, no big tech job — just a laptop, broken internet, and way too much stubbornness.

This is the first of my "Real People, Real AI" series. ⭐ Star the GitHub repo to get notified when the next one drops.

P.S. — If you've broken your own AI setup in a creative way, leave a comment. Misery loves company. 😄

What Is an LLM? (No, It's Not Magic — Here's What's Actually Happening)

Lingdas1 — Sun, 24 May 2026 09:36:12 +0000

What Is an LLM? (No, It's Not Magic — Here's What's Actually Happening)

The plain-English guide to understanding AI — no jargon, no code, just the stuff that matters.

My grandfather called it "the thinking computer."

I showed him ChatGPT, and he asked: "Does it... think? Like a person?"

It's a good question. And honestly, most explanations of AI are terrible at answering it. Either they're too technical ("a transformer-based neural network with self-attention mechanisms" — whatever that means) or too mystical ("it's like a digital brain!" — no, it's not).

So let me explain what an LLM actually is. No jargon. No magic. Just the truth.

The Analogy: A Chef Who's Tried Every Recipe

Imagine the world's most experienced chef. This chef has read every cookbook ever written. Every recipe from every culture. Every food blog. Every handwritten note from every grandmother.

You ask this chef: "Can you make me something with chicken, lemon, and garlic?"

The chef has never made that exact dish before, but they've read millions of recipes. They know what works. They know chicken + lemon + garlic usually means a Mediterranean-style dish. They know garlic should be minced, not whole. They know lemon juice goes in near the end, not the beginning.

So they create a new recipe, perfectly reasonable, that has never existed before.

That's what an LLM does.

It's not "thinking." It's not "conscious." It has read an unimaginable amount of human text — books, articles, conversations, code — and learned the patterns of how we write and reason.

When you ask it a question, it doesn't "look up" an answer. It generates one, word by word, based on everything it has learned.

What LLM Actually Stands For

Large Language Model.

Let's break that down:

Language — It works with words. Text in, text out. That's its native language (pun intended).
Model — A mathematical representation of patterns. Think of it as a super-complex set of probabilities: "After the word 'I', the next word is usually a verb, and after 'I want to', the next word is often 'go' or 'get' or 'make'..." × a billion.
Large — Really, really large. These models have been trained on most of the public internet. The biggest ones have learned patterns from trillions of words.

What It's NOT

Let me clear up some common confusion:

Myth	Truth
🧠 "It thinks like a human"	❌ No. It predicts words based on patterns. No consciousness, no feelings, no self-awareness.
📚 "It knows everything"	❌ It knows what it was trained on, which has a cutoff date. It doesn't "know" anything — it generates plausible text.
🎯 "It's always right"	❌ It can be confidently wrong. It's great at sounding correct even when it's making things up.
📝 "It copies from the internet"	❌ It doesn't store copies of web pages. It learned patterns and generates original text based on those patterns.

Why "Large" Matters

Imagine two chefs:

Chef A has read 10 recipes. They know how to make exactly 10 dishes.
Chef B has read 10 million recipes. They understand cuisine at a deep level.

LLMs work the same way. The "large" in "Large Language Model" refers to:

The amount of training data — billions of web pages, books, and documents
The number of parameters — think of these as "connections" in the model. A 7-billion-parameter model (small) has learned 7 billion patterns. A 70-billion-parameter model (large) has learned 70 billion.

More parameters = more pattern recognition = better reasoning (usually).

But here's the good news: you don't need the biggest model. A 7-billion-parameter model, running on a laptop, can handle most everyday tasks just fine. It's like having Chef B-lite — still experienced, still useful, much more practical.

How It Actually Works (The Simplest Explanation)

When you type a message, here's what happens:

You type: "What is the capital of France?"

Step 1: The model breaks your question into tokens (words and pieces of words).
         ["What", " is", " the", " capital", " of", " France", "?"]

Step 2: The model starts predicting the answer, one word at a time.
         "The" → "capital" → "of" → "France" → "is" → "Paris" → "."

Step 3: Each word is chosen based on probability.
         "The capital of France is..." → P(Paris) = 95%, P(Lyon) = 2%, P(Marseille) = 1%
         → It picks "Paris" (the most probable)

Step 4: Done! "The capital of France is Paris."

It's not magic. It's a very, very sophisticated version of your phone's autocomplete — trained on the entire internet.

Why This Matters to You (a Regular Person)

Here's why understanding this matters:

1. You Don't Need to Be a Programmer

If you understand that an LLM predicts words based on patterns, you already understand enough to use it. The tools are designed for everyone now.

2. You Can Run It on Your Laptop

Because LLMs are just math (very complicated math, but still math), they can run on any computer. A smaller model on your laptop is slower than ChatGPT — but it's private, free, and always available.

3. You Should Be Skeptical

Knowing that LLMs can be confidently wrong helps you use them better. Always fact-check important information. Use AI as a brainstorming partner, not an encyclopedia.

4. You're Not Left Behind

The people who benefit most from AI aren't programmers — they're writers, students, small business owners, artists, and curious people who ask good questions. That's probably you.

The Different Types of AI (In Two Sentences)

Type	What It Does	Example
LLM	Understands and generates text	ChatGPT, Claude, DeepSeek
Image generator	Creates pictures from descriptions	Midjourney, DALL-E, Stable Diffusion
Voice AI	Understands and generates speech	Siri, Whisper
Recommendation	Predicts what you'll like	TikTok, Netflix, YouTube

This series focuses on LLMs — the text-based AI that can write, explain, analyze, and assist. It's the most useful type for everyday tasks.

What You Can Actually DO with This Knowledge

Now that you know what an LLM is:

You can use one right now, for free — Ollama + a small model on your laptop
You know the limits — It's not magic, it's pattern recognition. Use it as a tool, not an oracle.
You can explain it to others — When your friends say "AI is taking over," you can say "Actually, it's just really good autocomplete, trained on a lot of data."

What's Next

Now that you know what an LLM is, the next guide shows you how to actually run one:

👉 Part 3: "Step-by-Step: Run Your First AI Model in 10 Minutes" — (coming next)

No terminal commands you don't understand. No unexplained jargon. Just a simple walkthrough with screenshots.

Hi, I'm Ling. I'm a medical student who got tired of feeling left behind by AI. I started learning, broke things, fixed them, and now I'm sharing what I've learned — in plain English, for regular people.

Found this useful? ⭐ Star the GitHub repo to get notified when new guides drop. Or leave a comment — I'd love to hear what questions you still have.