DEV Community: Starmorph AI

How to Build Karpathy's LLM Wiki: The Complete Guide to AI-Maintained Knowledge Bases

Starmorph AI — Thu, 23 Apr 2026 03:55:39 +0000

TL;DR: Andrej Karpathy's LLM Wiki is a pattern — not a product — where an LLM agent builds and maintains a structured markdown knowledge base from your raw sources. Three-layer architecture: raw/ (immutable sources), wiki/ (LLM-generated pages), and CLAUDE.md (schema). Three operations: ingest (process new sources), query (ask questions), lint (health checks). It replaces RAG with plain markdown for personal/team-scale knowledge. This guide covers the complete setup with Claude Code and Obsidian.

In April 2026, Andrej Karpathy posted on X about a workflow shift: instead of using LLMs primarily for code generation, he had been using them to build personal knowledge bases. The post went viral — 16+ million views — and the follow-up GitHub Gist hit 5,000+ stars within days. It touched a nerve because it solved a problem every knowledge worker has: knowledge bases that collapse under their own maintenance weight.

This guide breaks down the pattern, shows you how to build one from scratch with Claude Code and Obsidian, compares it to RAG, and surveys the community implementations that emerged within a week.

Why Knowledge Bases Collapse

Every developer has a graveyard of abandoned knowledge systems. Notion databases with 200 pages and no updates since month three. Bookmarks folders with 500 links and no summaries. Obsidian vaults with promising graphs that went stale. The problem isn't the tools — it's the maintenance cost.

Building a knowledge base has three steps: collect (easy), organize (hard), maintain (impossible at scale). The grunt work of filing, cross-referencing, summarizing, and updating is where systems die. Adding a single new article means reading it, creating a summary, linking it to existing concepts, updating related pages, and checking for contradictions with existing knowledge. Nobody does this consistently.

Karpathy's insight is simple: LLMs are uniquely good at exactly this kind of bookkeeping. They can read a document, identify key concepts, create structured summaries, generate cross-references, update indexes, and flag contradictions — tirelessly, consistently, at near-zero marginal cost. The human curates what goes in; the LLM does everything else.

"The LLM writes and maintains all of the data of the wiki. I rarely touch it directly." — Andrej Karpathy

At the time of the post, Karpathy's wiki on a single research topic had grown to approximately 100 articles and 400,000 words — longer than most PhD dissertations — without him writing any of it directly.

The Three-Layer Architecture

The LLM Wiki has a deliberately simple structure:

my-research/
├── raw/                    # Layer 1: Immutable source documents
│   ├── articles/
│   ├── papers/
│   ├── repos/
│   ├── data/
│   ├── images/
│   └── assets/
├── wiki/                   # Layer 2: LLM-generated markdown
│   ├── index.md            # Content catalog (updated on every ingest)
│   ├── log.md              # Append-only chronological record
│   ├── overview.md
│   ├── concepts/           # Concept pages
│   ├── entities/           # Entity pages
│   ├── sources/            # Source summaries
│   └── comparisons/        # Comparison pages
├── outputs/                # Dated reports, presentations
├── CLAUDE.md               # Layer 3: Schema configuration
└── .gitignore

Layer 1: Raw Sources (`raw/`)

Your curated collection of source documents — articles, papers, code repos, datasets, images. The LLM reads these but never modifies them. They serve as the verification baseline: every claim in the wiki traces back to a file in raw/.

Think of raw/ as immutable input. You can use the Obsidian Web Clipper browser extension to convert web articles to markdown and drop them directly into raw/articles/.

Layer 2: The Wiki (`wiki/`)

LLM-generated markdown pages organized by type:

concepts/ — Concept pages (e.g., attention-mechanism.md, rag.md)
entities/ — Entity pages (e.g., openai.md, anthropic.md)
sources/ — Source summaries (one per ingested document)
comparisons/ — Comparison pages (e.g., rag-vs-fine-tuning.md)

Two structural files are critical:

index.md — Content catalog. Updated on every ingest. The LLM reads this first to navigate the wiki.
log.md — Append-only operation log. Records every ingest, every page update, every contradiction found.

The LLM maintains everything in this directory. Humans mostly read; the LLM mostly writes.

Layer 3: The Schema (`CLAUDE.md`)

The most important file in the system. It defines the wiki's structure, naming conventions, page templates, and operational workflows. It transforms a generic LLM into a disciplined knowledge worker.

Named CLAUDE.md because Karpathy uses Claude Code as his primary agent, but the concept applies to any LLM agent with file access.

The Three Operations: Ingest, Query, Lint

The LLM Wiki pattern defines three core operations. Karpathy frames the system using a compiler analogy: raw/ is source code, the LLM is the compiler, wiki/ is the executable output, lint is tests, and queries are runtime.

Ingest

You drop a new source into raw/ and tell the LLM to process it.

> I added a new article to raw/articles/. Please ingest it.

The LLM:

Reads the document and discusses key takeaways
Creates a summary page in wiki/sources/
Cascades updates across 10-15 related wiki pages
Creates new concept or entity pages if needed
Updates index.md with new entries
Appends to log.md with affected pages and noteworthy findings

A single ingest operation can touch dozens of wiki pages as the LLM traces implications across the knowledge graph.

Query

You ask questions against the wiki. The LLM searches index.md, reads relevant pages, and synthesizes answers with [[wiki-link]] citations.

> What are the key differences between sparse and dense retrieval?

The LLM navigates via the index instead of brute-force loading all documents into context. Valuable answers optionally get filed as permanent wiki pages — knowledge compounds.

Lint

Periodic health checks. The LLM scans for:

Contradictions — claims that conflict between pages
Orphan pages — wiki pages with no incoming links
Missing concepts — topics referenced but not yet given their own page
Stale claims — assertions superseded by newer sources
Investigation gaps — areas where more research is needed

Think of it as eslint for knowledge. You can schedule lint operations (daily, weekly) or run them ad hoc.

> Please lint the wiki. Focus on contradictions and stale claims.

Setting Up Your LLM Wiki with Claude Code

Step 1: Create the directory structure

mkdir -p ~/research/my-topic/{raw/{articles,papers,repos,data,images},wiki/{concepts,entities,sources,comparisons},outputs}
touch ~/research/my-topic/wiki/index.md
touch ~/research/my-topic/wiki/log.md

Step 2: Initialize Git

cd ~/research/my-topic
git init
echo "outputs/*.pdf" >> .gitignore

Version control is essential. Every wiki update becomes a trackable diff. You can revert bad ingests, review how concepts evolved, and use git log as an audit trail.

Step 3: Create the CLAUDE.md schema

This is the critical step. See the full schema section below for a complete template.

Step 4: Add your first sources

Drop markdown files, PDFs, or code into raw/. Use the Obsidian Web Clipper or a tool like Markdownload to convert web articles.

Step 5: Run Claude Code and ingest

cd ~/research/my-topic
claude

> I've added 3 articles to raw/articles/. Please ingest them all,
> create wiki pages, and update the index.

Claude Code will read each source, create structured wiki pages, establish cross-references, and update the index — all in a single operation.

The Schema: Your Most Important File

The schema file (CLAUDE.md) is what makes the pattern work. Without it, the LLM produces inconsistent output. With it, the LLM becomes a reliable knowledge worker. Here is a production-ready template:

# Research Wiki: [Your Topic]

## Project Structure

- `raw/` — Immutable source documents. Never modify files here.
- `wiki/` — LLM-generated and maintained markdown pages.
- `wiki/index.md` — Master content catalog. Update on every operation.
- `wiki/log.md` — Append-only operation log.
- `outputs/` — Generated reports, presentations, lint results.

## Page Types and Conventions

Every wiki page must have YAML frontmatter:

    ---
    title: "Page Title"
    type: concept | entity | source-summary | comparison
    sources:
      - raw/papers/filename.md
    related:
      - "[[related-concept]]"
    created: YYYY-MM-DD
    updated: YYYY-MM-DD
    confidence: high | medium | low
    ---

### Naming

- Filenames: kebab-case matching the concept (e.g., attention-mechanism.md)
- Cross-references: use [[wikilinks]] for all internal links
- Source references: always link back to raw/ file paths

## Workflows

### Ingest

1. Read the source document in raw/
2. Discuss key takeaways with the user
3. Create wiki/sources/[source-name].md summary
4. Update or create concept/entity pages as needed
5. Update wiki/index.md with new entries
6. Append to wiki/log.md

### Query

1. Read wiki/index.md to identify relevant pages
2. Read those pages and synthesize an answer
3. Cite sources using [[wikilinks]]
4. If the answer is novel and valuable, offer to save it as a new wiki page

### Lint

1. Scan all wiki pages for contradictions
2. Identify orphan pages (no incoming links)
3. Flag missing concepts referenced but not created
4. Find stale claims superseded by newer sources
5. Save results to outputs/lint-YYYY-MM-DD.md

Customize this template for your domain. A machine learning wiki might add conventions for tracking paper citations and benchmark results. A competitive intelligence wiki might add conventions for confidence levels and source freshness.

Using Obsidian as the Frontend

Obsidian is the recommended frontend for viewing and navigating the wiki. Open the wiki/ directory as an Obsidian vault and you get:

Graph View

Every [[wikilink]] the LLM creates becomes a visible connection in Obsidian's graph view. As the wiki grows, the graph reveals natural knowledge clusters — which concepts are central, which are isolated, where the gaps are.

Backlinks

Click any wiki page and see every other page that references it. This is enormously valuable for understanding how concepts connect without having to manually maintain relationship lists.

Dataview Queries

If you install the Dataview plugin, you can query across all wiki pages:

```

dataview
TABLE type, confidence, updated
FROM "concepts"
WHERE confidence = "low"
SORT updated ASC


```
```

`

This query surfaces your least-confident knowledge — the areas where more research is needed.

### QMD for Search

Tobi Lutke (Shopify CEO) built [QMD](https://github.com/tobi/qmd), a local search engine for markdown files. It uses hybrid BM25/vector search with LLM re-ranking. Karpathy recommends it as the search layer for LLM Wikis. It's available as both a CLI and an MCP server, so Claude Code can use it to navigate large wikis efficiently.

## LLM Wiki vs RAG: When to Use Which

This is the biggest conceptual distinction in the pattern. Karpathy positions the LLM Wiki as a simpler alternative to RAG for personal and team-scale knowledge.

| Dimension                | RAG                                            | LLM Wiki                                 |
| ------------------------ | ---------------------------------------------- | ---------------------------------------- |
| **State**                | Stateless — each query is independent          | Stateful — knowledge compounds over time |
| **Infrastructure**       | Vector DB, embedding pipeline, retrieval logic | Folder of `.md` files                    |
| **Cross-references**     | Discovered ad-hoc per query                    | Pre-built by the LLM, always available   |
| **Maintenance**          | Embedding updates, index rebuilds              | LLM updates pages on every ingest        |
| **Token cost per query** | High (retrieve + re-rank + generate)           | Low (read index + targeted pages)        |
| **Traceability**         | Chunk-level citations (often lossy)            | Source-level citations back to `raw/`    |
| **Scale sweet spot**     | Enterprise (millions of documents)             | Personal/team (sub-100K tokens of wiki)  |
| **Contradictions**       | Undetected — conflicting chunks coexist        | Flagged during lint operations           |

### When RAG wins

- You have millions of documents and can't pre-compile them all
- Documents change frequently and re-ingesting the entire wiki is impractical
- You need sub-second query latency at scale
- Your knowledge base is shared across many teams with different access levels

### When LLM Wiki wins

- You have fewer than ~100-200 source documents
- You want knowledge to compound — each ingested source improves all future queries
- You care about traceability (every claim links to a raw source)
- You want zero infrastructure beyond a folder and an LLM
- You value consistency checks (lint) over raw retrieval speed

The LLM Wiki is essentially a **manual, traceable implementation of Graph RAG** — each claim links back to sources, relationships are explicit, and the structure is human-readable. But unlike Graph RAG, it requires no graph database, no entity extraction pipeline, and no ontology engineering.

## Tooling and Infrastructure

### Minimum Viable Stack

| Tool                           | Purpose                                        | Required?   |
| ------------------------------ | ---------------------------------------------- | ----------- |
| Claude Code (or any LLM agent) | Wiki compiler — reads sources, generates pages | Yes         |
| A folder                       | Storage for `raw/`, `wiki/`, `CLAUDE.md`       | Yes         |
| Git                            | Version control for the entire knowledge base  | Recommended |

That's it. No vector database, no embedding pipeline, no cloud service. The entire system runs on markdown files and an LLM.

### Recommended Stack

| Tool                     | Purpose                                                     | Link                                                                 |
| ------------------------ | ----------------------------------------------------------- | -------------------------------------------------------------------- |
| **Claude Code**          | Primary LLM agent                                           | [claude.ai](https://claude.ai)                                       |
| **Obsidian**             | Wiki frontend — graph view, backlinks, search               | [obsidian.md](https://obsidian.md)                                   |
| **QMD**                  | Semantic search over markdown (BM25 + vector + LLM re-rank) | [github.com/tobi/qmd](https://github.com/tobi/qmd)                   |
| **Obsidian Web Clipper** | Convert web articles to markdown for `raw/`                 | [obsidian.md/clipper](https://obsidian.md/clipper)                   |
| **Dataview**             | Structured queries across wiki frontmatter                  | [Obsidian plugin](https://github.com/blacksmithgu/obsidian-dataview) |
| **Marp**                 | Convert markdown wiki pages to presentation slides          | [marp.app](https://marp.app)                                         |
| **Git**                  | Version control and change tracking                         | Built-in                                                             |

### Claude Code Skills for Wiki Management

You can create Claude Code skills to standardize wiki operations:

```markdown
# /wiki-ingest skill

Read all new files in raw/ that aren't already in wiki/sources/.
For each new file:

1. Create a summary in wiki/sources/
2. Update or create concept and entity pages
3. Update wiki/index.md
4. Append to wiki/log.md
   Report what changed.
```

```markdown
# /wiki-lint skill

Scan the entire wiki/ directory.
Check for:

- Contradictions between pages
- Orphan pages (no incoming [[wikilinks]])
- Missing concepts (referenced but no page exists)
- Low-confidence pages that haven't been updated recently
  Save results to outputs/lint-[today's date].md
```

The community has already built several skill packages. [wiki-skills](https://github.com/kfchou/wiki-skills) and [karpathy-llm-wiki](https://github.com/Astro-Han/karpathy-llm-wiki) both provide drop-in Claude Code skills implementing the pattern.

## Community Implementations

Within a week of Karpathy's post, the community built multiple implementations. Here are the most notable:

| Project               | Description                                                                | Link                                                                                     |
| --------------------- | -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **llmwiki**           | Upload docs, connect Claude via MCP, have it write your wiki               | [github.com/lucasastorian/llmwiki](https://github.com/lucasastorian/llmwiki)             |
| **obsidian-wiki**     | Framework for AI agents to build Obsidian wikis using the Karpathy pattern | [github.com/Ar9av/obsidian-wiki](https://github.com/Ar9av/obsidian-wiki)                 |
| **second-brain**      | LLM-maintained personal knowledge base for Obsidian                        | [github.com/NicholasSpisak/second-brain](https://github.com/NicholasSpisak/second-brain) |
| **llm-wiki-compiler** | Compiles markdown knowledge files into topic-based wikis                   | [github.com/ussumant/llm-wiki-compiler](https://github.com/ussumant/llm-wiki-compiler)   |
| **CacheZero**         | One `npm install` implementation of the pattern                            | [Hacker News](https://news.ycombinator.com/item?id=47667723)                             |
| **wiki-skills**       | Claude Code skills implementing the Karpathy pattern                       | [github.com/kfchou/wiki-skills](https://github.com/kfchou/wiki-skills)                   |
| **LLM Wiki v2**       | Extended pattern with memory lifecycle and confidence scoring              | [Gist](https://gist.github.com/rohitg00/2067ab416f7bbe447c1977edaaa681e2)                |

### Real-World Results

User `vbarsoum` on Hacker News [shared results](https://news.ycombinator.com/item?id=47640875) from applying the pattern to three business books (~155K words): chapter-level granularity produced **210 concept pages** with approximately **4,600 cross-references** and unprompted synthesis across sources. The system wasn't just summarizing — it was identifying patterns and connections across books that the user hadn't seen.

### LLM Wiki v2: Extended Pattern

Developer `rohitg00` [extended the pattern](https://gist.github.com/rohitg00/2067ab416f7bbe447c1977edaaa681e2) with lessons from building an agent memory system. Key additions:

- **Memory lifecycle:** Confidence scoring, supersession tracking, retention decay (Ebbinghaus forgetting curve)
- **Consolidation tiers:** Working memory → episodic memory → semantic memory → procedural memory
- **Knowledge graph structure:** Typed entities and relationship categories ("uses," "depends on," "contradicts," "supersedes")
- **Multi-agent governance:** Shared vs private knowledge scoping for parallel agents

These extensions become relevant as wikis grow beyond ~100-200 pages, where simple index navigation starts to degrade.

## The Intellectual Lineage

Karpathy's Gist explicitly references Vannevar Bush's 1945 essay ["As We May Think"](https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/), which described a hypothetical device called the **Memex** — a mechanical desk that would store and cross-reference all of a person's books, records, and communications with associative trails between related items.

The Memex never worked because maintenance was manual. Every cross-reference had to be created by hand. Bush imagined operators building "trails" through knowledge, but nobody actually does this at scale.

The LLM Wiki solves the maintenance problem: **"The wiki stays maintained because the cost of maintenance is near zero."** The LLM creates and updates cross-references automatically on every ingest. The human focuses on what matters — deciding what to read and what questions to ask.

### Karpathy's Evolution

The LLM Wiki represents the third phase of Karpathy's thinking about human-AI collaboration:

1. **Vibe Coding** (Feb 2025) — Accept AI-generated code without reviewing it line-by-line. Trust the model, test the output.
2. **Agentic Engineering** (Jan 2026) — Humans orchestrate AI agents rather than writing code directly.
3. **LLM Knowledge Bases** (Apr 2026) — AI manages knowledge, not just code. The human is a curator, not a writer.

Each phase shifts more cognitive labor to the LLM while keeping humans in the loop for judgment and direction.

### Related Efforts

- **Jeremy Howard's `llms.txt`** — A [website-level standard](https://llmstxt.org/) for helping external LLMs understand your site. Outward-facing (help LLMs understand you) vs the LLM Wiki's inward-facing (use LLMs to understand your domain). Both share the philosophy that markdown is the ideal format for LLM consumption.
- **Simon Willison's `docs-for-llms`** — [Build scripts](https://github.com/simonw/docs-for-llms) to create LLM-friendly concatenated documentation. Focused on making existing docs consumable rather than having the LLM generate new knowledge.
- **Tobi Lutke's QMD** — The [local search engine](https://github.com/tobi/qmd) Karpathy recommends. Built by the Shopify CEO, which signals adoption at the highest levels of tech leadership.

## Criticisms and Limitations

The pattern is not without critics. Key concerns from the [Hacker News discussion](https://news.ycombinator.com/item?id=47640875):

### "The grunt work IS the learning"

User `qaadika` argued that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is where genuine understanding forms. By handing this to an LLM, you surrender the cognitive process that creates deep knowledge. You end up with a comprehensive wiki you haven't actually internalized.

**Counter:** The wiki is a reference system, not a replacement for thinking. Karpathy still reads sources, discusses takeaways with the LLM, and makes judgment calls about what to include. The LLM handles logistics, not insight.

### Context window degradation

Multiple users reported that quality degrades when the wiki grows beyond what fits in context. Despite 1M+ token context windows, practical degradation starts around 200K-300K tokens. The LLM starts missing connections or producing inconsistent pages.

**Mitigation:** This is why the index/navigation pattern matters. Instead of loading the entire wiki, the LLM reads `index.md` (a few thousand tokens), identifies relevant pages, and reads only those. Hierarchical navigation sidesteps brute-force context stuffing.

### Model collapse risk

`devnullbrain` raised concerns about information degradation through repeated LLM rewriting — the wiki version of model collapse. Each rewrite potentially introduces subtle errors that compound over time.

**Mitigation:** The immutable `raw/` layer is the safeguard. Every claim in the wiki should trace back to a source in `raw/`. Lint operations check for drift. And Git provides full history to identify when claims changed.

### Complexity ceiling

`kubb` warned that these systems collapse beyond certain complexity thresholds when neither the agent nor the developer maintains sufficient comprehension of the whole.

**Mitigation:** This is a real constraint. The pattern works best for personal/team knowledge at the 50-200 source scale. Beyond that, you likely need the extensions from LLM Wiki v2 (hybrid search, multi-agent governance) or a proper RAG pipeline.

## Sources

### Research Papers

- [A-MEM: Agentic Memory for LLM Agents (2025)](https://arxiv.org/abs/2502.12110)
- [Agentic Retrieval-Augmented Generation: A Survey (2025)](https://arxiv.org/abs/2501.09136)
- [Survey on Knowledge-Oriented RAG (2025)](https://arxiv.org/abs/2503.10677)
- [PersonalAI: Knowledge Graph Storage for LLM Agents (2025)](https://arxiv.org/abs/2506.17001)
- [LLM-Empowered Knowledge Graph Construction Survey (2025)](https://arxiv.org/abs/2510.20345)
- [Deep Research: A Survey of Autonomous Research Agents (2025)](https://arxiv.org/abs/2508.12752)
- [Integrating LLMs with Knowledge-Based Methods Survey (2025)](https://arxiv.org/abs/2501.13947)

### Primary Sources

- [Karpathy's LLM Wiki Gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)
- [llms.txt Specification](https://llmstxt.org/)
- [QMD — Local Markdown Search](https://github.com/tobi/qmd)
- [docs-for-llms (Simon Willison)](https://github.com/simonw/docs-for-llms)

### Articles and Coverage

- [VentureBeat — Karpathy shares LLM Knowledge Base architecture](https://venturebeat.com/data/karpathy-shares-llm-knowledge-base-architecture-that-bypasses-rag-with-an)
- [Analytics India Magazine — Karpathy Moves Beyond RAG](https://analyticsindiamag.com/ai-news/andrej-karpathy-moves-beyond-rag-builds-llm-powered-personal-knowledge-bases)
- [DAIR.AI Academy — LLM Knowledge Bases](https://academy.dair.ai/blog/llm-knowledge-bases-karpathy)
- [MindStudio — How to Build a Personal Knowledge Base](https://www.mindstudio.ai/blog/andrej-karpathy-llm-wiki-knowledge-base-claude-code)
- [MindStudio — LLM Wiki vs RAG Comparison](https://www.mindstudio.ai/blog/llm-wiki-vs-rag-markdown-knowledge-base-comparison)
- [Analytics Vidhya — LLM Wiki Revolution](https://www.analyticsvidhya.com/blog/2026/04/llm-wiki-by-andrej-karpathy/)

### Community Projects

- [lucasastorian/llmwiki](https://github.com/lucasastorian/llmwiki) — Open-source LLM Wiki implementation
- [Ar9av/obsidian-wiki](https://github.com/Ar9av/obsidian-wiki) — Obsidian + LLM Wiki framework
- [NicholasSpisak/second-brain](https://github.com/NicholasSpisak/second-brain) — LLM-maintained second brain
- [kfchou/wiki-skills](https://github.com/kfchou/wiki-skills) — Claude Code wiki skills
- [Astro-Han/karpathy-llm-wiki](https://github.com/Astro-Han/karpathy-llm-wiki) — One-skill LLM Wiki

### Hacker News Discussions

- [LLM Wiki — example of an "idea file"](https://news.ycombinator.com/item?id=47640875)
- [Show HN: LLM Wiki Open-Source Implementation](https://news.ycombinator.com/item?id=47656181)
- [Show HN: CacheZero — Karpathy's idea as one NPM install](https://news.ycombinator.com/item?id=47667723)

---

*Originally published at [StarBlog](https://blog.starmorph.com/blog/karpathy-llm-wiki-knowledge-base-guide)*

LLM Model Names Decoded: A Developer's Guide to Parameters, Quantization & Formats

Starmorph AI — Sat, 11 Apr 2026 00:05:46 +0000

TL;DR: "B" = billions of parameters. "IT" = instruction tuned. "Q4_K_M" = 4-bit quantization, a common default. "GGUF" = the format for Ollama and local tools. "MoE" = only a fraction of parameters activate per token. This guide decodes every component of LLM model names, explains quantization formats and file types, and points you to the best resources for researching which model fits your hardware and use case.

If you've ever stared at a Hugging Face model page and seen something like unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF and wondered what any of that means — this guide is for you.

The open-weight model ecosystem has exploded. Gemma 4, Qwen 3.5, Llama 4, DeepSeek, Mistral — every family ships dozens of variants across different sizes, architectures, quantization levels, and file formats. Picking the right one for your hardware and use case shouldn't require a PhD.

I wrote this as a companion to my local LLM inference tools guide, which covers how to run models. This guide explains what all those cryptic suffixes mean and points you toward the best resources for researching which model fits your setup.

Anatomy of a Model Name

Let's decode a real model name, piece by piece.

Take bartowski/Qwen3.5-32B-Instruct-GGUF-Q4_K_M:

Component	Value	Meaning
Organization	`bartowski`	Who published this variant (community quantizer)
Family	`Qwen3.5`	Model family and version (Alibaba's Qwen, generation 3.5)
Size	`32B`	32 billion parameters
Training	`Instruct`	Instruction-tuned (follows prompts)
Format	`GGUF`	File format (for Ollama, LM Studio, llama.cpp)
Quantization	`Q4_K_M`	4-bit precision, K-quant method, medium block size

Here's another: google/gemma-4-26B-A4B-it

Component	Value	Meaning
Organization	`google`	Official release from Google
Family	`gemma-4`	Gemma generation 4
Size	`26B-A4B`	26B total params, 4B active (Mixture of Experts)
Training	`it`	Instruction tuned

The general pattern: [Org/] Family-Version-Size [-Active] -Training [-Format] [-Quantization]

Not every model follows this exactly — naming is more convention than standard. But once you know the components, you can decode anything.

Parameters: What the Numbers Mean

The "B" in model names stands for billions of parameters — the trainable numerical weights that a neural network learns during training. More parameters generally means more knowledge capacity, but also more memory required.

Size Tiers

Tier	Parameter Range	RAM Needed (Q4_K_M)	Best For
Tiny	1-3B	2-3 GB	Edge devices, quick tasks, mobile
Small	4-9B	3-6 GB	General chat, summarization, simple coding
Medium	13-14B	8-10 GB	Strong coding, reasoning, creative writing
Large	27-32B	18-22 GB	Complex reasoning, nuanced writing
Extra Large	70B+	40+ GB	Near-frontier quality, research

The rule of thumb for Q4_K_M GGUF: take the parameter count in billions, multiply by roughly 0.6, and that's your approximate file size in GB. A 7B model is ~4GB, a 32B is ~19GB, a 70B is ~40GB.

You'll also see "M" for millions — 278M means 278 million parameters. These are tiny models for embedding, classification, or on-device use.

Bigger Isn't Always Better

A well-trained 14B model frequently outperforms a mediocre 70B. Training data quality, architecture choices, and fine-tuning matter as much as raw parameter count. Phi-4-reasoning at 14B beats DeepSeek-R1 (671B total) on some math benchmarks. Qwen2.5-Coder at 14B scores ~85% on HumanEval, competitive with models 5x its size.

The best way to evaluate this is hands-on experimentation. Browse the Ollama model library, check Hugging Face trending models, or explore what's popular on OpenRouter — then try a few models at your hardware tier and see what works for your workflow.

Further reading: AI Model Parameters Explained · LLM Model Sizes Guide · Phi-4 Reasoning Technical Report

Training Variants: Base vs Instruct vs Chat

When you see -base, -instruct, -it, or -chat in a model name, it tells you how the model was fine-tuned after initial pretraining.

Base (Pretrained)

Trained on massive text corpora via next-token prediction
Completes text patterns but doesn't follow instructions reliably
Like a student who's read every book but hasn't learned to answer exam questions
When to use: Fine-tuning your own model, research, text completion

Instruct / IT (Instruction Tuned)

Fine-tuned on instruction-response pairs (supervised fine-tuning)
Follows user prompts reliably: "Summarize this," "Write a function that..."
The standard variant for most use cases
When to use: Coding, Q&A, summarization, analysis — virtually everything

Chat

Further optimized for multi-turn conversations with RLHF or DPO
Better at maintaining context across a conversation
When to use: Chatbot applications, interactive assistants

Other Training Suffixes

Suffix	Meaning
`-DPO`	Trained with Direct Preference Optimization (alignment technique)
`-RLHF`	Trained with Reinforcement Learning from Human Feedback
`-reasoning` / `-thinking`	Optimized for chain-of-thought reasoning
`-vision` / `-VL`	Supports image input (vision-language)
`-coder`	Fine-tuned specifically for code generation

For general use, always pick the instruct/IT variant. Base models are for researchers and fine-tuners. If you're running a model in Ollama or LM Studio, you want instruct.

Further reading: Base vs Instruct vs Chat Models (Medium) · Foundation vs Instruct vs Thinking Models · Choosing the Right Model (BentoML)

Quantization Demystified

Quantization reduces the numerical precision of model weights — storing each weight in fewer bits. This shrinks file size and speeds up inference at the cost of some accuracy.

Precision Formats

Full-precision models store each weight as a 16-bit or 32-bit floating point number. Quantization compresses these down:

Format	Bits per Weight	Description	Typical Use
FP32	32	Full precision, gold standard	Training reference
BF16	16	Brain Float 16 (same range as FP32, lower precision)	Default for LLM training
FP16	16	Half precision (narrower range than BF16)	GPU inference
FP8	8	8-bit float	Cutting-edge training/inference
INT8	8	8-bit integer, fixed-point	Post-training quantization
INT4 / FP4	4	4-bit, aggressive compression	Local inference on constrained hardware

When you see BF16 or FP16 in a model name, it means the weights are stored at that precision — no quantization applied. These are the highest-quality downloads but also the largest files.

GGUF Quantization Levels

GGUF files use a naming scheme: Q [bits] _ [method] _ [size] — for example, Q4_K_M.

Q = quantized
Number = bits per weight (2, 3, 4, 5, 6, 8)
K = K-quant method (smarter bit allocation across layers)
S / M / L = Small / Medium / Large block size

Level	Bits	Size (7B model)	Quality	Recommendation
Q2_K	2	~2.7 GB	Poor — significant loss	Emergency only
Q3_K_S	3	~2.9 GB	Fair — noticeable degradation	Very constrained hardware
Q3_K_M	3	~3.1 GB	Fair	Tight budgets
Q4_K_S	4	~3.6 GB	Good	Budget hardware
Q4_K_M	4	~3.8 GB	Good — 92% quality retention	The mainstream default
Q5_K_S	5	~4.6 GB	Very good	Between Q4 and Q6
Q5_K_M	5	~4.8 GB	Very good — near-imperceptible loss	When you have extra RAM
Q6_K	6	~5.5 GB	Excellent	Quality-sensitive tasks
Q8_0	8	~7 GB	Near-lossless	When VRAM isn't a concern
F16	16	~14 GB	Perfect	Maximum quality baseline

The sweet spot for most users is Q4_K_M. It's the default quantization in Ollama, retains ~92% of the original model's quality, and cuts file size by roughly 75% compared to FP16.

What K-Quant Actually Does

K-quants use a two-level quantization scheme. Weights are grouped into 32-weight blocks, packed into 256-weight "super-blocks." Per-block scale factors are computed, then those scales are quantized again (double quantization). This preserves more information than naive bit reduction.

The S/M/L suffix controls which layers get extra precision:

S (Small): All tensors at the base bit-width — smallest file
M (Medium): Some attention and feed-forward tensors get higher bit-width — better quality, slightly larger
L (Large): More tensors at higher bit-width — best quality, largest file

For example, Q4_K_M stores most tensors at 4-bit but promotes half of the attention and feed-forward weights to 6-bit.

I-Quants (Importance Matrix)

A newer family of quantization (IQ2_M, IQ3_M, IQ4_XS) uses importance matrices to identify and protect critical weights during quantization. IQ4_XS can compress more aggressively than Q4_K_M with comparable quality. You'll see these from quantizers like unsloth.

GPU-Native Quantization Methods

GGUF isn't the only game in town. If you have an NVIDIA GPU, these formats run faster:

Format	Creator	Key Advantage	Hardware
AWQ	MIT / NVIDIA	Activation-aware, ~95% quality at 4-bit, fastest with Marlin kernel	NVIDIA GPU only
GPTQ	Frantar et al.	First practical LLM quantization, wide tool support	NVIDIA GPU only
EXL2	turboderp	Per-layer mixed bit-widths (2-8 bit), fastest interactive inference	NVIDIA GPU only

These methods produce files stored as safetensors (not GGUF) and run through tools like vLLM, ExLlamaV2, or HuggingFace Transformers. They're GPU-only — no CPU fallback.

When to use what:

On CPU or mixed CPU/GPU → GGUF (Q4_K_M default)
On NVIDIA GPU, maximum throughput → AWQ with Marlin kernel
On NVIDIA GPU, maximum quality-per-byte → EXL2

Further reading: GGUF Quantization Explained (WillItRunAI) · K-Quants and I-Quants Guide · GPTQ vs AWQ vs EXL2 vs llama.cpp · AWQ Paper (MLSys 2024) · Quantization Methods Compared

Model Formats: GGUF vs Safetensors vs Others

The file format determines which tools can load the model. This is one of the most common sources of confusion.

GGUF

Created by: Georgi Gerganov (llama.cpp project)
Extension: .gguf
What it is: A single-file format packaging weights, tokenizer, and metadata. Designed for local inference with extensive quantization support.
Runs on: Ollama, LM Studio, llama.cpp, KoboldCpp
Pros: Single-file portability, CPU-friendly, quantization from 2-bit to 8-bit
Cons: Requires conversion from safetensors, slower than GPU-native formats on NVIDIA

Safetensors

Created by: Hugging Face
Extension: .safetensors
What it is: A secure serialization format — pure data, no executable code. Replaced PyTorch's pickle format which had arbitrary code execution vulnerabilities.
Runs on: vLLM, HuggingFace Transformers, TGI, SGLang
Pros: Secure, fast loading (76x faster than pickle on CPU), the standard for training/fine-tuning
Cons: Full-precision models require substantial VRAM

MLX

Created by: Apple Machine Learning Research
Extension: .safetensors (MLX-converted)
What it is: Apple Silicon-native format leveraging unified memory. No data copying between CPU and GPU.
Runs on: MLX framework, LM Studio (Mac), Ollama (Mac, since March 2026)
Pros: Optimized for Apple Silicon, leverages all system RAM
Cons: Apple Silicon only

Others

Format	Use Case	Note
ONNX	Cross-platform/mobile/browser deployment	Not commonly used for LLMs
TensorRT	Maximum NVIDIA GPU throughput	GPU-architecture-specific, not portable
PyTorch .bin	Legacy	Being replaced by safetensors everywhere

The Key Insight

GGUF is for local inference. If you're using Ollama, LM Studio, or llama.cpp, you need GGUF (or MLX on Mac).

Safetensors is for everything else — GPU inference with vLLM, training, fine-tuning, and as the canonical format on HuggingFace.

You cannot fine-tune from GGUF. If you want to fine-tune, start with the safetensors version, train with LoRA/QLoRA, then convert the result to GGUF for serving.

Further reading: Common AI Model Formats (HuggingFace Blog) · What is GGUF? Complete Guide · Safetensors Security Audit · MLX GitHub · Ollama: Importing Models

Format Compatibility Matrix

Which tools support which formats — at a glance:

Format	Ollama	LM Studio	vLLM	llama.cpp	ExLlamaV2	HF Transformers
GGUF	✅	✅	—	✅	—	—
Safetensors	✅ (auto-converts)	✅	✅	—	—	✅
AWQ	—	—	✅	—	—	✅
GPTQ	—	—	✅	—	✅	✅
EXL2	—	—	—	—	✅	—
MLX	✅ (Mac)	✅ (Mac)	—	—	—	—

Ollama can import safetensors models via a Modelfile and auto-converts them to GGUF. On Apple Silicon, Ollama now uses MLX as its backend (since March 2026).

Architecture: Dense vs Mixture of Experts

You'll see "MoE" in model descriptions and encoded in names like 35B-A3B or 8x7B. This is an architectural choice that fundamentally changes the size-to-performance equation.

Dense Models

Every parameter is used for every token. A 32B dense model activates all 32 billion parameters on every input.

Examples: Gemma 4 31B, Qwen3.5-27B, Llama 3.1 70B
Naming: Just the parameter count — 32B, 70B
RAM required: Proportional to total parameter count

Mixture of Experts (MoE)

The model contains multiple "expert" sub-networks. A router selects only a few experts per token — the rest stay idle.

Examples: Qwen3.5-35B-A3B (35B total, 3B active), Llama 4 Scout (109B total, 17B active)
Naming: Total-B-A-Active-B format (e.g., 35B-A3B) or described in model card
RAM required: Based on total parameters (all experts must be in memory)
Compute cost: Based on active parameters (only selected experts run)

Model	Total Params	Active Params	Experts	Behavior
Qwen3.5-35B-A3B	35B	3B	MoE	Large-model knowledge, small-model speed
Qwen3.5-122B-A10B	122B	10B	MoE	Near-frontier quality
Qwen3.5-397B-A17B	397B	17B	MoE	Frontier-class open model
Llama 4 Scout	109B	17B	16	10M token context window
Llama 4 Maverick	400B	17B	128	Beats GPT-4o on many benchmarks
Gemma 4 26B-A4B	26B	4B	MoE	Near-31B quality at 4B compute
DeepSeek-V3	671B	37B	MoE	Strong coding + general
GLM-5	744B	40B	MoE	MIT licensed, trained on Huawei chips

The tradeoff: An MoE model gives you the knowledge capacity of a much larger model at a fraction of the compute cost per token. But you still need enough RAM to hold all the parameters — the router needs access to every expert, even if it only activates a few at a time.

Practical example: Qwen3.5-35B-A3B has 35B total parameters (needs ~20GB at Q4_K_M) but runs at the speed of a 3B model. Compare that to a 3B dense model that needs ~2GB but has far less knowledge capacity. The MoE trades memory for intelligence.

Further reading: A Visual Guide to Mixture of Experts · MoE LLMs: Key Concepts (Neptune.ai) · NVIDIA MoE Blog

Community Fine-Tunes and Variants

Beyond official releases, a vibrant community creates derivative models. These suffixes tell you what was done:

Common Derivative Suffixes

Suffix	Meaning	Example
-distilled / -Distill	Smaller model trained to mimic a larger "teacher" model	`DeepSeek-R1-Distill-Qwen-32B`
-abliterated	Safety refusal behavior surgically removed post-training	`Llama-3.2-abliterated`
-uncensored	Trained on unfiltered data to remove guardrails	`Dolphin-Mixtral-8x7B`
-reasoning	Optimized for chain-of-thought reasoning	`Phi-4-reasoning`
-LoRA	Fine-tuned with Low-Rank Adaptation (adapter weights only)	Various community models

Key Community Contributors

Name	Role	Known For
bartowski	GGUF quantizer	Most prolific quantizer on HuggingFace — multiple quant levels for every major release
unsloth (Daniel Han)	Fine-tuning framework + quantizer	Dynamic 2.0 quantization with per-layer optimization, 2-5x faster fine-tuning
Nous Research (Teknium)	Fine-tuning lab	Hermes series — premium fine-tunes with minimal content filtering
Eric Hartford	Fine-tuner	Dolphin uncensored model family
TheBloke	GGUF/GPTQ quantizer	Pioneer of community quantization (less active since 2024, bartowski inherited the role)
mlx-community	MLX converters	Pre-converted models for Apple Silicon users

Distillation Explained

Distillation is a technique where a smaller "student" model is trained to replicate a larger "teacher" model's outputs. The most famous example: DeepSeek-R1-Distill-Qwen-32B — a Qwen 2.5 32B model fine-tuned on 800,000 chain-of-thought reasoning samples generated by DeepSeek-R1 (671B). The result outperforms OpenAI o1-mini on multiple benchmarks despite being ~20x smaller.

When you see "-Distill" in a name, it means: this model learned its skills from a bigger model, not just from raw data.

Further reading: Abliteration Explained (HuggingFace Blog) · DeepSeek-R1 Distilled Models · LoRA vs QLoRA (Modal) · Unsloth Dynamic 2.0 GGUFs · bartowski on HuggingFace

The 2026 Model Landscape

The open-weight ecosystem moves fast. Here's where the major families stand as of April 2026.

Gemma 4 (Google) — Apache 2.0

Natively multimodal across all sizes. The 26B MoE achieves near-31B quality with only 4B active parameters.

Model	Params	Architecture	Context	Modalities
Gemma 4 E2B	2.3B	Dense	128K	Text, Image, Video, Audio
Gemma 4 E4B	4.5B	Dense	128K	Text, Image, Video, Audio
Gemma 4 26B-A4B	26B total / 4B active	MoE	256K	Text, Image, Video
Gemma 4 31B	31B	Dense	256K	Text, Image, Video

Best for: Multimodal tasks at any size. The E4B is remarkable — audio, video, and image understanding at 4.5B parameters.

Qwen 3.5 (Alibaba) — Apache 2.0

The widest size range of any model family. Features hybrid thinking/non-thinking mode and a new Gated DeltaNet architecture.

Model	Params	Architecture	Context
Qwen3.5-0.8B	0.8B	Dense	262K
Qwen3.5-4B	4B	Dense	262K
Qwen3.5-9B	9B	Dense	262K
Qwen3.5-27B	27B	Dense	262K
Qwen3.5-35B-A3B	35B / 3B active	MoE	262K
Qwen3.5-122B-A10B	122B / 10B active	MoE	262K
Qwen3.5-397B-A17B	397B / 17B active	MoE	262K

Best for: Versatility. 201 languages, strong coding (Qwen2.5-Coder), and the 35B-A3B MoE runs on 8GB+ VRAM with Q4_K_M quantization. The most popular base for community fine-tunes.

Llama 4 (Meta) — Llama Community License

Meta's first MoE generation. Scout's 10M token context window is industry-leading.

Model	Params	Architecture	Context
Llama 4 Scout	109B / 17B active	MoE (16 experts)	10M
Llama 4 Maverick	400B / 17B active	MoE (128 experts)	1M
Llama 4 Behemoth	~2T / 288B active	MoE (16 experts)	TBD (preview)

Best for: Long context use cases. Scout fits on a single H100 GPU with a 10-million-token window.

Other Notable Families

Family	Key Model	Params	Standout Feature
DeepSeek	R1-Distill-Qwen-32B	32B	Best local reasoning via distillation
Phi-4 (Microsoft)	Phi-4-reasoning	14B	Beats 671B models on math benchmarks
GLM-5 (Zhipu AI)	GLM-5	744B / 40B active	MIT license, trained without NVIDIA chips
Mistral	Mistral Large 3	675B / 41B active	Apache 2.0, strong multilingual
Hermes 4 (Nous)	Hermes 4 405B	405B	Minimal content filtering, strong reasoning
MiniMax	M2	229B / 10B active	$0.26/M input — cheapest frontier-class API

Trends Defining 2026

MoE everywhere. Almost every major release uses Mixture of Experts. The pattern: massive total parameters for knowledge, small active parameters for speed.

Hybrid reasoning. Models like Qwen 3.5 can toggle between fast responses and deep chain-of-thought reasoning in a single model. No separate "thinking" variant needed.

Distillation economy. DeepSeek-R1 proved you can get 80%+ of frontier reasoning in a 7-32B model. Everyone is distilling now.

Context windows keep growing. Llama 4 Scout: 10M tokens. Qwen 3.5: 262K native. Gemma 4: 256K.

The landscape changes quickly — check LMSYS Chatbot Arena for current rankings, and browse OpenRouter or the Ollama library to see what the community is actually using.

Further reading: Gemma 4 Announcement (Google Blog) · Qwen 3.5 on GitHub · Llama 4 Models (Meta) · DeepSeek Complete Guide (BentoML) · GLM-5 Guide · Hermes 4 (Nous Research)

How to Read a Hugging Face Model Card

Hugging Face is where most models live. Here's what to look for on a model page.

Repository Name

Format: organization/model-name

google/gemma-4-4b-it → Official Google release, Gemma 4, 4B params, instruction-tuned
bartowski/Qwen3.5-27B-GGUF → Community GGUF quantization by bartowski
unsloth/DeepSeek-R1-Distill-Llama-8B → Unsloth's optimized version

Key Files

File	What It Is
`README.md`	Model card — architecture, benchmarks, usage, license
`config.json`	Architecture blueprint (layers, vocab size, attention heads)
`model.safetensors`	The actual weights (may be sharded: `model-00001-of-00003.safetensors`)
`tokenizer.json`	Tokenizer definition
`generation_config.json`	Default generation settings (temperature, top_p)

What to Check Before Downloading

License — Apache 2.0 is most permissive. Llama Community License has commercial restrictions above 700M users. Some models restrict commercial use entirely.
Parameter count and architecture — Dense or MoE? How many active parameters?
Context length — How much text can the model process at once?
Quantization available — Check if bartowski or unsloth have GGUF versions in separate repos.
Benchmark scores — Compare against similar-sized models for your use case (MMLU for general knowledge, HumanEval for coding, GSM8K for math).

Finding the Right Variant

If the official repo is google/gemma-4-31b-it (safetensors, full precision), you'll find quantized versions at:

bartowski/gemma-4-31B-it-GGUF — Standard GGUF quantizations
unsloth/gemma-4-31B-it-GGUF — Dynamic quantization variants
mlx-community/gemma-4-31B-it-MLX — Apple Silicon format

Decision Framework: Finding the Right Model

There's no single "best model" for a given hardware setup — it depends on your task, your quality expectations, and how the model was trained, not just parameter count. The landscape changes quickly and new models regularly reshuffle the rankings. Rather than prescribing specific models, here's a framework for how to research and evaluate your options.

Step 1: Know Your Hardware Limits

Your RAM determines the maximum model size you can load. This table shows approximate upper bounds at Q4_K_M quantization:

Your Setup	Approximate Max Size (Q4_K_M)	Where to Explore
8GB RAM	~7B dense, or small MoE	Ollama library — filter by size
16GB RAM / Mac	~14B dense	LM Studio Discover — browse by hardware compatibility
32GB Mac	~32B dense	HuggingFace Models — check model cards for RAM requirements
64GB+ Mac	70B+ dense, large MoE	OpenRouter — try models via API before downloading
NVIDIA 8-12GB VRAM	~9B dense	Ollama library or vLLM with AWQ
NVIDIA 24GB VRAM	~27B dense	Community benchmarks at LocalLLM.in

These are rough guidelines — actual requirements depend on context length, batch size, and the specific model architecture. MoE models need RAM for their full parameter count even though they only activate a fraction per token.

Step 2: Explore What the Community Is Using

The best way to find the right model is to see what others with similar hardware and use cases are running. Here are the best places to research:

Ollama Model Library — Browse popular models, see download counts, and try them with one command. The tags show available sizes and quantizations.
Hugging Face Trending Models — See what's new and popular. Read model cards for benchmarks, hardware requirements, and community feedback.
OpenRouter — Try models via API before committing to a local download. Great for comparing quality across families before choosing one to run locally.
LM Studio — Visual model browser that shows hardware compatibility. Good for beginners exploring what fits their system.
LMSYS Chatbot Arena — Community-voted rankings across hundreds of models. Useful for comparing quality across model families.
LocalLLM.in — Benchmarks specifically for local inference, organized by VRAM tier.

As of April 2026, some of the most popular open-weight model families include Qwen 3.5, Gemma 4, DeepSeek (V3 and R1 distills), GLM-5, MiniMax M2, Kimi K2.5, and Phi-4 — but this list shifts regularly as new models release. Don't take any single recommendation as definitive. Try a few models yourself and evaluate quality for your specific tasks.

Step 3: Which Quantization?

The ladder, from minimum to maximum quality:

You're very memory-constrained → Q3_K_M (noticeable quality loss, but it runs)
Standard recommendation → Q4_K_M (92% quality, fits most setups)
You have extra RAM → Q5_K_M (near-imperceptible loss)
You have plenty of RAM → Q6_K or Q8_0 (effectively lossless)

General rule: prefer a larger model at lower quantization over a smaller model at higher quantization. A 14B at Q4_K_M almost always beats a 7B at Q8_0.

Step 4: Which Format?

Your Tool	Format to Download
Ollama	GGUF (or let Ollama auto-convert)
LM Studio	GGUF or MLX (Mac)
llama.cpp	GGUF
vLLM	Safetensors (or AWQ for GPU quantization)
Fine-tuning	Safetensors (always start with full precision)
Apple Silicon native	MLX

Quick-Start: Trying Models with Ollama

The fastest way to experiment is with Ollama — one command to download and run. Here are some examples to get started, but browse the full Ollama library to see what's currently popular:

# Browse what's available
ollama list

# Try a small model (fits 8GB+ RAM)
ollama run gemma4:4b

# Try a medium model (fits 16GB+ RAM)
ollama run qwen3.5:9b

# Try a larger model (fits 32GB+ RAM)
ollama run qwen3.5:27b

# Specify a quantization level
ollama run qwen3.5:9b-q5_K_M

# See what Ollama downloaded
ollama list

The Ollama library, LM Studio's model browser, and OpenRouter's model list are all good starting points for discovering what's available. Try a few models at your hardware tier, compare the output quality for your specific use case, and see what works best for you.

Glossary

Quick reference for every abbreviation you'll encounter in model names.

Term	Meaning
B	Billions of parameters
M	Millions of parameters
IT / Instruct	Instruction-tuned — fine-tuned to follow prompts
Base	Pretrained only — raw text completion
Chat	Optimized for multi-turn conversation
GGUF	GPT-Generated Unified Format — single-file format for local inference
Safetensors	HuggingFace's secure tensor serialization
Q4_K_M	4-bit K-quant, medium blocks — the mainstream default
Q8_0	8-bit quantization — near-lossless
F16 / FP16	16-bit floating point — half precision
BF16	Brain Float 16 — default training precision
AWQ	Activation-Aware Weight Quantization — GPU-optimized 4-bit
GPTQ	GPT Quantization — early GPU quantization method
EXL2	ExLlamaV2 format — mixed bit-width GPU quantization
MLX	Apple's ML framework for Apple Silicon
MoE	Mixture of Experts — only a fraction of params active per token
Dense	All parameters active on every token
LoRA	Low-Rank Adaptation — efficient fine-tuning method
QLoRA	Quantized LoRA — fine-tuning with 4-bit base model
DPO	Direct Preference Optimization — alignment technique
RLHF	Reinforcement Learning from Human Feedback
Distilled	Trained to mimic a larger model's outputs
Abliterated	Safety refusals surgically removed
VL	Vision-Language — supports image input
A_B suffix	Active parameters in MoE (e.g., A4B = 4B active)
imatrix	Importance matrix — used during quantization for better quality
K-quant	Mixed-precision quantization with importance-based bit allocation
bpw	Bits per weight — average precision across the model

This guide is part of a series on local AI inference. For tool comparisons and hardware recommendations, see Local LLM Inference in 2026: The Complete Guide. For Apple Silicon-specific advice, see Best Mac Mini for Local LLMs.

Sources

Research Papers

Resources

Originally published at StarBlog

10 CLI Tools Every Developer Should Use with AI Coding Agents

Starmorph AI — Sat, 04 Apr 2026 13:46:13 +0000

TL;DR: The 10 CLI tools covered in this guide are LazyGit, Glow, LLM Fit, Models CLI, Taproom, Ranger, Zoxide, Btop, Chafa, and CSV Lens (plus Eza as a bonus). Install the Homebrew ones with: brew install lazygit glow zoxide ranger btop chafa csvlens eza. These tools help you review AI-generated diffs, render markdown, manage files, monitor your system, and preview images — all from the terminal alongside AI coding agents like Claude Code.

If you're spending more time in the terminal using AI coding assistants like Claude Code, your standard terminal environment might need an upgrade. When an AI agent is editing files, writing code, and traversing your directories, having the right CLI tools helps you monitor changes, navigate faster, and read outputs more effectively.

This is the companion guide to my YouTube video on 10 CLI tools I'm using alongside Claude Code. Every tool below includes installation instructions and the essential commands to get started.

Get the free macOS Bootstrap Script — idempotent setup for Homebrew, Zsh, Node.js, Python, and 30+ dev tools in one command.

1. LazyGit

When an AI agent is making autonomous changes to your codebase, you need a fast way to review what it just did. LazyGit is a terminal UI for git that lets you visually review diffs, stage files, and commit — all without memorizing git commands.

# Install
brew install lazygit

# Launch
lazygit

Key bindings:

Key	Action
Arrow keys	Navigate between panels
`Space`	Stage/unstage file
`c`	Commit staged changes
`p`	Push to remote
`Enter`	View file diff
`?`	Show all keybindings

LazyGit is especially useful after letting Claude Code run — open it up, scan the diff, and commit with confidence.

2. Glow

Claude and other LLMs constantly generate Markdown files — plans, READMEs, documentation. Instead of opening a separate editor, Glow renders Markdown beautifully right in your terminal.

# Install
brew install glow

# Read a file
glow README.md

# Paginated view (scrollable)
glow -p README.md

Glow is perfect for reading Claude Code's plan files, CLAUDE.md configs, or any Markdown output without leaving the terminal. If you want deeper editing capabilities, pair it with NeoVim.

3. LLM Fit

If you run local models, it's hard to know which ones your machine can actually handle. LLM Fit analyzes your hardware — memory, CPU, GPU — and prints a ranked table of which local AI models you can run, estimating memory usage and performance scores.

# Install via Homebrew
brew tap AlexsJones/llmfit
brew install llmfit

# Or via Cargo
cargo install llmfit

# Launch the interactive TUI
llmfit

# CLI mode (table output)
llmfit --cli

# Show detected hardware
llmfit system

This saves you from downloading a 70B parameter model only to discover your machine can't load it. Run llmfit once, know your limits, and pick models accordingly.

4. Models CLI

A terminal dashboard for comparing AI model providers. Models CLI lets you check pricing, context window sizes, and benchmark results for 2000+ models across 85+ providers without opening a browser.

# Install via Homebrew
brew install arimxyer/tap/models

# Or via Cargo
cargo install modelsdev

# Launch the interactive TUI
models

# List all providers
models list providers

# Search for a model
models search "claude sonnet"

# Show model details
models show <model-id>

When you're deciding between GPT-4o, Claude Sonnet, or Gemini for a specific task, this gives you a quick side-by-side comparison from your terminal.

5. Taproom

If you use Homebrew, you know that brew search can be slow and clunky. Taproom is an interactive TUI for Homebrew that lets you browse available casks, installed packages, and formula details.

# Install via Homebrew
brew install gromgit/brewtils/taproom

# Launch
taproom

Use it to filter by installed vs. outdated packages, search for new tools, and manage your Homebrew setup without chaining multiple brew commands.

6. Ranger

When working on remote Linux VMs or navigating deep directory trees, cd and ls get tedious. Ranger is a VIM-inspired file manager that gives you a multi-pane visual view of your directory tree with file previews.

# Install
brew install ranger        # macOS
sudo apt install ranger    # Ubuntu/Debian

# Launch
ranger

Key bindings:

Key	Action
`h/j/k/l`	Navigate (vim-style)
`Enter`	Open file/directory
`q`	Quit
`S`	Open shell in current directory
`yy`	Copy file
`dd`	Cut file
`pp`	Paste file

Ranger is especially useful when you need to visually explore a project structure that an AI agent has been modifying.

7. Zoxide

Zoxide is a smarter cd command. It learns which directories you visit most frequently and lets you jump to them with fuzzy matching instead of typing full paths.

# Install
brew install zoxide

# Add to your shell profile (~/.zshrc)
eval "$(zoxide init zsh)"

# Restart shell, then jump to directories
z projects        # Jumps to most-visited directory matching "projects"
z star            # Jumps to ~/Desktop/sm-core/StarBlog (if that's your habit)
zi                # Interactive fuzzy finder mode

After a few days of normal terminal use, Zoxide learns your patterns. Instead of cd ~/Desktop/sm-core/StarBlog, you just type z star. It's one of those tools that feels invisible once you're used to it — until you try to work without it.

8. Btop

When you're running local AI models or letting Claude Code execute heavy tasks, you need to watch system resources. Btop is a gorgeous, highly customizable system monitor that shows CPU, memory, disk, and network usage in real time.

# Install
brew install btop          # macOS
sudo apt install btop      # Ubuntu/Debian

# Launch
btop

Key bindings:

Key	Action
`m`	Cycle through view modes
`f`	Filter processes
`k`	Kill selected process
`Esc`	Back/close menu

Mac users: check out mactop for Apple Silicon-specific metrics (CPU efficiency/performance cores, GPU usage, Neural Engine).

9. Chafa

Chafa renders images directly in your terminal. If an AI agent generates a chart, diagram, or screenshot, you can view it without leaving the command line.

# Install
brew install chafa         # macOS
sudo apt install chafa     # Ubuntu/Debian

# View an image
chafa image.png

# Control output size
chafa --size=80x40 screenshot.png

# Higher quality with symbols
chafa --symbols all image.jpg

Chafa works best in terminals with good Unicode and color support (iTerm2, Ghostty, Kitty, WezTerm). It's surprisingly useful when you're working over SSH and need a quick visual check.

10. CSV Lens

Data analysis tasks often involve CSV files, which look like a mess in standard terminal editors. csvlens is a TUI built specifically for inspecting CSVs — think less but formatted perfectly for tabular data with columns, search, and sorting.

# Install
brew install csvlens

# View a CSV
csvlens data.csv

Key bindings:

Key	Action
`/`	Search
`S`	Toggle line wrapping
`Tab`	Switch between columns
`q`	Quit
`H/L`	Scroll left/right

When Claude Code generates a CSV report or you're debugging data pipelines, csvlens makes the data actually readable.

Bonus: eza

If you're still using ls, it's time to upgrade. eza is a modern replacement with color coding, file type icons, and git integration built in.

# Install
brew install eza

# Basic usage
eza -la --icons         # List all files with icons and details
eza --tree              # Tree view
eza --tree --level=2    # Tree view, 2 levels deep
eza -la --git           # Show git status for each file

Add these aliases to your ~/.zshrc to make it your default:

alias ls="eza --icons"
alias ll="eza -la --icons --git"
alias lt="eza --tree --level=2 --icons"

Getting Started

You don't need to install all 10 at once. Start with the three that have the highest immediate impact:

LazyGit — review AI-generated code changes visually before committing
Zoxide — stop typing long directory paths forever
eza — make every ls output actually readable

Once those are in your muscle memory, layer in the rest as you need them.

If you want a one-command setup for all of these tools (plus 20+ more), check out the free macOS Bootstrap Script — it installs everything idempotently so you can run it on any new machine.

Originally published at StarBlog

Local LLM Inference in 2026: The Complete Guide to Tools, Hardware & Open-Weight Models

Starmorph AI — Sun, 29 Mar 2026 13:23:31 +0000

TL;DR: Ollama is the fastest path to running local LLMs (one command to install, one to run). The Mac Mini M4 Pro 48GB (~$1,999) is the best-value hardware. Q4_K_M is the sweet spot quantization for most users. Open-weight models like GLM-5, MiniMax M2, and Hermes 4 are impressively capable for a wide range of tasks. This guide covers 10 inference tools, every quantization format, hardware at every budget, and the builders making all of this possible.

I've been setting up local inference on my own hardware recently — an M4 Pro Mac Mini running Ollama — and I wanted to compile everything I've learned into one place. This guide is as much for my own reference as it is for anyone else exploring this space.

The tooling in 2026 has matured to the point where a $600 Mac Mini can run 14B parameter models and a $1,600 setup handles 70B. Whether you want to reduce API costs for simple tasks, keep sensitive data private, build offline-capable apps, or just understand how these models actually work, there are real options now.

I still use Claude Code as my primary coding tool — local models aren't a replacement for frontier cloud inference on complex tasks. But they're genuinely useful for a lot of workflows, and the ecosystem is worth understanding. This guide covers the tools, formats, hardware, and people building the open-source ecosystem.

Get the full 14-page StarMorph Research PDF — detailed comparison tables, hardware buying guide, and thought leader profiles in a premium dark-mode report.

Tool Comparison Matrix

Ten tools, compared across what matters. Stars reflect community adoption as of March 2026.

Tool	Stars	Platforms	Model Formats	GPU Required?	API Compatibility	Best For
Ollama	166k	Mac/Win/Linux	GGUF	No	OpenAI + Anthropic	Developer workflows
llama.cpp	98.6k	All + mobile	GGUF	No	OpenAI	Foundation / power users
Exo	42.7k	Mac/Linux/mobile	MLX / tinygrad	No	Varies	Distributed inference
Jan.ai	41.1k	Mac/Win/Linux	GGUF, MLX	No	OpenAI	Privacy-first desktop
LocalAI	35-42k	Linux/Mac/Win	Multi-format	No	OpenAI + Anthropic	Drop-in API replacement
vLLM	31k+	Linux	safetensors, AWQ, GPTQ	Yes	OpenAI	Production GPU serving
MLX	24.6k	macOS only	safetensors	No (Apple Silicon)	Third-party	Mac-native development
LM Studio	N/A (closed)	Mac/Win/Linux	GGUF / MLX	No	OpenAI	Visual model exploration
KoboldCpp	9.5k	All + Android	GGUF	No	Triple (OAI + Ollama + Kobold)	Creative writing
GPT4All	N/A	Mac/Win/Linux	GGUF	No	OpenAI	Private document chat

Every tool above except LM Studio is open-source. Most build on top of llama.cpp — the foundational C/C++ inference engine that pioneered running LLMs on consumer hardware.

Ollama — The Developer Default

Ollama is the fastest path from zero to running local models. One command to install, one to run, and you get an OpenAI-compatible API on localhost:11434. It's open-source (MIT), written in Go, and has 166k GitHub stars — the largest open-source AI project on GitHub by a wide margin.

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama3

That's it. No Python environments, no CUDA toolkit, no configuration files.

Why developers default to Ollama

OpenAI + Anthropic API compatibility — Claude Code and OpenAI Codex CLI can use Ollama as a local backend. Your existing API client code works with minimal changes.
Largest model registry — 100+ models available with ollama pull. One-command downloads.
Performance — M3 Pro generates 40-60 tok/s on 7B models. Benefits from all llama.cpp optimizations (up to 35% faster from CES 2026 NVIDIA improvements).
Image generation — Added to macOS in January 2026.
Web search + structured outputs — Both added in 2026.

Where Ollama falls short

GGUF-only for native format — safetensors/PyTorch models require a conversion step via Modelfile
No GUI — third-party frontends like Open WebUI fill this gap
Slightly higher overhead than raw llama.cpp (the abstraction layer costs a few percent)
Custom model importing requires creating a Modelfile rather than just pointing at a file

For most developers, Ollama is the right first tool. Start here, then graduate to other tools as your needs become more specific.

Get the free macOS Bootstrap Script — idempotent setup for Homebrew, Zsh, Node.js, Ollama, and 30+ dev tools in one command.

LM Studio — The Visual Explorer

LM Studio is the most beginner-friendly option — a desktop application where you browse models, click to download, and start chatting. Zero terminal knowledge required. Closed-source but free for personal use.

What makes it stand out:

Built-in model browser with one-click downloads from Hugging Face
MLX backend on Apple Silicon for optimized Mac inference
Split-view chat for side-by-side model comparison
v0.4.0 (January 2026) added parallel inference with continuous batching
New headless "llmster" daemon enables server-only deployment on Linux boxes without the GUI

Formats: GGUF (llama.cpp backend), MLX (Apple Silicon only), safetensors. No EXL2 or GPTQ support.

API: OpenAI-compatible on localhost:1234. Python and TypeScript SDKs hit v1.0.0.

LM Studio is ideal for model evaluation — browse, download, compare side-by-side — before deploying with Ollama or vLLM in production.

vLLM — Production GPU Serving

If you're deploying models on GPU infrastructure at scale, vLLM is the industry standard. It's the performance leader with PagedAttention for memory-efficient KV cache management, continuous batching, and speculative decoding.

Benchmarks with Marlin kernels: AWQ achieves 741 tok/s, GPTQ achieves 712 tok/s. vLLM v0.16.0 (February 2026) expanded multi-GPU and multi-platform support to NVIDIA, AMD ROCm, Intel XPU, and TPU.

Formats: The widest range — safetensors, GPTQ, AWQ, FP8, NVFP4, bitsandbytes. This matters because GPU-optimized quantization formats like AWQ achieve better throughput than GGUF on NVIDIA hardware.

The catch: Linux-only for production, requires a dedicated NVIDIA/AMD GPU, complex setup compared to Ollama. Overkill for single-user local inference.

Use vLLM when: You're serving multiple users, need maximum throughput on GPU hardware, or are deploying in production. The common developer workflow is: evaluate models with LM Studio, develop with Ollama, deploy with vLLM.

llama.cpp — The Foundation

llama.cpp is the C/C++ inference engine that everything else builds on. Created by Georgi Gerganov, it pioneered running LLMs on consumer hardware via quantization. In February 2026, the ggml/llama.cpp team joined Hugging Face.

Ollama, LM Studio, GPT4All, and KoboldCpp all use llama.cpp under the hood. It's the engine — they're the interfaces.

Why use it directly?

Maximum control over inference parameters and model loading
Widest platform support: macOS, Windows, Linux, Android, iOS, WebAssembly
Best CPU inference performance — designed from the ground up for consumer hardware
Defines and maintains the GGUF format standard

Stats: 98.6k GitHub stars, 1,038 contributors, 28 upstream commits per week. CES 2026 NVIDIA optimizations yielded up to 35% faster token generation.

Use llama.cpp directly when you need fine-grained control that Ollama or LM Studio don't expose. Otherwise, use the higher-level tools — they give you 95% of the performance with much less configuration.

ExoLabs — Distributed Inference

Exo takes a fundamentally different approach: instead of running a model on one device, it splits the model across multiple devices connected peer-to-peer. No master-worker architecture — any device can contribute compute.

What's been demonstrated:

DeepSeek V3 (671B parameters) across 8 M4 Pro 64GB Mac Minis (512GB total memory) at ~5 tok/s
DeepSeek R1 (671B) across 7 Mac Minis + 1 M4 Max MacBook Pro (496GB total)
2 NVIDIA DGX Spark + M3 Ultra Mac Studio = 2.8x benchmark improvement through disaggregated inference

Why this works with Apple Silicon: Unified memory is ideal for Mixture-of-Expert (MoE) models. All 671B parameters load across the cluster, but only 37B are computed per inference step. Apple devices become surprisingly cost-effective for MoE architectures.

Current status: Alpha (v0.0.15-alpha public, 1.0 not yet released). macOS native app requires Tahoe 26.2+.

If you have multiple Macs, Exo lets you pool them into a single inference cluster. The constraint is total unified memory across devices — and the network connecting them.

For a deep dive on which Mac Mini to buy for local inference (with current Amazon pricing and used market analysis), see my complete Mac Mini buying guide for local LLMs.

Other Notable Tools

Jan.ai

Open-source (AGPLv3) privacy-first desktop app. 41.1k stars, 5.3M+ downloads. Runs 100% offline via the Cortex engine (wraps llama.cpp). The standout feature is hybrid local + cloud switching — you can connect OpenAI, Anthropic, and local models in one interface, switching between them as needed. MCP integration for agentic workflows. Supports Windows ARM (Snapdragon).

LocalAI

The most comprehensive API-compatible local server. Drop-in replacement for OpenAI's API that supports text, images, audio, video, embeddings, and voice cloning — all locally. Multi-backend support (llama.cpp, vLLM, transformers, diffusers, MLX). Anthropic API support added January 2026. Best for: developers with existing OpenAI API code who want to run locally with minimal changes.

KoboldCpp

Single-executable fork of llama.cpp with an integrated web UI. "One file, zero install" — download, double-click, select a model. Triple API compatibility (KoboldAI + OpenAI + Ollama endpoints). The best tool for creative writing and roleplay with built-in memory, world info, author's notes, and SillyTavern integration.

GPT4All

Desktop app by Nomic AI with built-in LocalDocs for private document chat (RAG). The 2026 GPT4All Reasoner adds on-device reasoning with tool calling and code sandboxing. Backed by a funded company (Nomic AI). Best for non-technical users who want to chat with their documents privately.

MLX

Apple's open-source ML framework purpose-built for Apple Silicon. Not a user-facing app — a framework that other tools use as a backend. Leverages unified memory with zero CPU-GPU data copying. Built-in mixed-precision quantization (4/6/8-bit per layer). M5 Neural Accelerators provide up to 4x speedup for time-to-first-token. Swift API for native macOS/iOS apps.

Quantization Formats and Tradeoffs

Quantization compresses model weights from 16 bits per weight (FP16/BF16) down to fewer bits. This is what makes it possible to run a 70B parameter model on consumer hardware.

GGUF: The Universal Format

GGUF was created by llama.cpp and is used by Ollama, LM Studio, KoboldCpp, GPT4All, and Jan.ai. The "K-quant" variants use mixed precision per layer, allocating more bits to important layers.

Quant	Bits/Weight	Size (7B model)	Quality Retention	Best For
Q8_0	8-bit	~7.5 GB	~99% (near-lossless)	Maximum quality, enough RAM
Q6_K	6-bit	~5.5 GB	~97%	Quality-focused with moderate RAM
Q5_K_M	5-bit	~4.8 GB	~95%	Good balance
Q4_K_M	4-bit	~4.0 GB	~92% (sweet spot)	Most users
Q3_K_M	3-bit	~3.2 GB	~85%	Tight memory constraints
Q2_K	2-bit	~2.5 GB	~75%	Extreme compression

The practical ladder: Q4_K_M → Q5_K_M → Q6_K → Q8_0 as you get more memory. For most users, Q4_K_M is the sweet spot — 92% quality retention with 75% size reduction from FP16.

GPU-Optimized Formats

These formats are designed for NVIDIA GPUs and used by vLLM, ExLlamaV2, and transformers:

Format	Bits	Quality	Speed (Marlin)	Used By
AWQ	4-bit	~95%	741 tok/s	vLLM, transformers
GPTQ	4-bit	~90%	712 tok/s	vLLM, ExLlamaV2
EXL2	2-8 mixed	Variable	Fastest (single-user)	ExLlamaV2 / TabbyAPI
FP8	8-bit	~99%	Very fast	vLLM, llama.cpp
NVFP4	4-bit	~92%	Fastest (Blackwell)	llama.cpp, vLLM

AWQ vs GPTQ: AWQ consistently outperforms GPTQ in both quality (95% vs 90%) and speed. AWQ preserves activation-aware important weights. For most GPU users, AWQ is the better choice.

GGUF vs AWQ/GPTQ: GGUF is universal — runs on CPU, GPU, and Apple Silicon. AWQ/GPTQ are GPU-only but provide better throughput on NVIDIA hardware. Use GGUF for flexibility, AWQ for maximum GPU throughput.

Choosing the Right Tool

By Use Case

Scenario	Tool	Why
First time, just want to try	LM Studio	Visual GUI, one-click downloads
Developer, quick local testing	Ollama	One command, OpenAI-compatible API
Creative writing / roleplay	KoboldCpp	Built-in storytelling features
Private document chat	GPT4All	LocalDocs RAG built-in
Privacy-first desktop app	Jan.ai	Full offline, hybrid local/cloud
Production GPU serving	vLLM	Highest throughput, multi-GPU
Drop-in OpenAI replacement	LocalAI	Most complete API compatibility
Mac-native app development	MLX	Swift API, best Apple Silicon perf
Models too large for one device	Exo	Distributed inference
Maximum control	llama.cpp	The foundation

By Skill Level

Level	Recommended Tools
Beginner (no terminal)	LM Studio, GPT4All, Jan.ai
Intermediate (CLI)	Ollama, KoboldCpp
Advanced (Python/systems)	llama.cpp, MLX, LocalAI, vLLM
Expert (distributed)	Exo, vLLM multi-GPU

The Common Multi-Tool Workflow

Many developers in 2026 use a three-tool pipeline:

LM Studio for model discovery and evaluation (browse, download, compare side-by-side)
Ollama for development and integration (OpenAI-compatible API for app development)
vLLM for production deployment (maximum throughput on GPU infrastructure)

Hardware Buying Guide

The Fundamental Rule

For LLM inference, memory bandwidth is the bottleneck, not compute. A chip with higher GB/s generates tokens faster, even if it has fewer FLOPS. This is why an M3 Max (400 GB/s) generates tokens faster than an M4 Pro (273 GB/s) despite the M4 Pro being newer.

Memory Requirements by Model Size

Model Size	Min RAM (Q4)	Comfortable (Q6-Q8)	Example Models
3B	4 GB	6 GB	Phi-4-mini
7-8B	6 GB	10 GB	Llama 3.1 8B, Mistral 7B
13-14B	10 GB	16 GB	Llama 3.1 13B, Qwen 14B
30-34B	20 GB	32 GB	Codestral 22B
70B	40 GB	64 GB	Llama 3.1 70B, Qwen 72B
100B+	64 GB	128 GB+	Llama 3.1 405B (quantized)

Apple Silicon

Macs are uniquely suited for local LLMs because of unified memory — the GPU can access all system RAM, unlike discrete GPUs with fixed VRAM. RAM is not upgradeable on Apple Silicon. Buy the most you can afford.

Machine	Memory	Bandwidth	Price	Best For
Mac Mini M4	16-24 GB	120 GB/s	$599-799	7-14B, experimentation
Mac Mini M4 Pro	24-48 GB	273 GB/s	$1,399-1,999	Sweet spot. 70B at Q4 with 48GB
MacBook Pro M4 Pro	24-48 GB	273 GB/s	$1,999-2,499	Portable 70B inference
MacBook Pro M4 Max	48-128 GB	546 GB/s	$3,499-4,999	Fast 70B, moderate 100B+
Mac Studio M4 Ultra	128-512 GB	819 GB/s	$3,999-11,999	Run anything locally
MacBook Pro M5 Max	48-128 GB	TBD	$3,499+	Neural Accelerators, 4x TFT

Best value: Mac Mini M4 Pro 48GB (~$1,999) — runs 70B parameter models and costs less than a good GPU.

For a complete pricing breakdown of every Mac Mini configuration (new and used), with model compatibility tables and OpenClaw setup instructions, see my Mac Mini buying guide for local LLMs.

NVIDIA GPUs

VRAM is the limiting factor — models must fit in GPU VRAM or spill to CPU RAM at a significant speed penalty.

GPU	VRAM	Bandwidth	Price (2026)	Best For
RTX 3060 12GB	12 GB	360 GB/s	$250-300 (used)	Budget entry, 7B
RTX 3090 24GB	24 GB	936 GB/s	$800-1,000 (used)	Best budget for 13B
RTX 4090 24GB	24 GB	1,008 GB/s	$1,600-2,200	Balance. 13B full, 70B quantized
RTX 5090 32GB	32 GB	1,792 GB/s	$2,500-3,600+	Flagship. 2.6x faster than A100 on 7B
RTX 3090 x2	48 GB	1,872 GB/s	$1,600-2,000	Budget 70B on Linux with vLLM

Budget Tiers

Budget	Recommendation	What You Can Run
$0	Your existing machine + Ollama	3-7B on most modern hardware
$375	Used M1 Mac 16GB	7B models at decent speed
$599	Mac Mini M4 24GB	7-14B comfortably
$900	Used RTX 3090 (add to PC)	7-13B at GPU speed
$1,999	Mac Mini M4 Pro 48GB	70B models — best value in the market
$2,000	Used RTX 4090 (add to PC)	13B fast, 70B quantized
$3,500+	RTX 5090 or MBP M4/M5 Max	70B fast, frontier performance
$8,000+	Mac Studio M4 Ultra 192GB	Run anything

For building dedicated GPU inference servers at any budget ($150 to $5,000+), Digital Spaceport has the most comprehensive build guides I've found.

Get the free Ubuntu Bootstrap Script — 340-line idempotent VM setup for GPU inference servers with Zsh, Node.js, Docker, and 35+ tools.

Thought Leaders and Builder Strategies

These are the builders, researchers, and educators I've been learning from as I explore local inference. Whether they're building tools, training models, or documenting hardware builds, they're all making this ecosystem more accessible.

This list was inspired by 0xSero's thread on people to follow in the local inference space. 0xSero is one of the most active voices in the open-source AI community, and his recommendations pointed me to many of the builders profiled below.

0xSero (@0xSero)

One of the most active builders in the local inference community. Publishes quantized models on Hugging Face using Intel AutoRound, making large models runnable on consumer hardware. Built vllm Studio for managing local models with chat template proxies that make Hermes, MiniMax, and GLM models compatible with OpenAI and Anthropic API formats. Also created ai-data-extraction for extracting chat and code context data from AI coding assistants for ML training, and fine-tuned models like sero-nouscoder-14b-sft trained on real coding conversations.

Andrej Karpathy (@karpathy)

The best teacher in AI. nanochat is the definitive entry point for understanding LLM training — a full-stack pipeline in ~8,300 lines of clean PyTorch covering tokenization, pretraining, SFT, and reinforcement learning. Trains a 561M ChatGPT clone in ~4 hours for ~$100 (or ~$15 on spot instances).

What makes nanochat uniquely effective for learning: one dial — transformer depth. This single integer auto-determines all other hyperparameters, so you can understand the full pipeline without needing hyperparameter tuning expertise.

His latest project, autoresearch, uses AI agents to autonomously optimize nanochat training configurations — AI improving AI training.

Peter Steinberger (@steipete)

His GitHub is a treasure trove. Peekaboo (macOS screenshot automation for AI agents), Summarize (CLI that extracts/summarizes any URL, YouTube, PDF, or audio), and OpenClaw (the fastest-growing GitHub project at 180k+ stars — an autonomous AI assistant that lives on your computer and self-modifies its own code).

His design principle: "CLIs are the universal interface that both humans and AI agents can actually use effectively." Build CLI-first — it becomes the universal adapter between human workflows and agent automation.

Mario Zechner (@badlogicgames)

Pi is possibly the best, simplest open-source agentic loop to learn from. The pi-mono agent toolkit achieves power through radical minimalism: exactly 4 tools, a system prompt under 1,000 tokens, and a philosophy that "what you leave out matters more than what you put in." Pi became the engine behind OpenClaw.

His anti-MCP argument is worth considering: popular MCP servers like Playwright MCP (21 tools, 13.7k tokens) consume 7-9% of context window before work begins. Pi's alternative: CLI tools with README files — agents read the README only when needed, paying token cost only when necessary.

Takeaway: Start with 4 tools, not 40. Context engineering matters more than tool count.

Ahmad Osman (@theahmadosman)

The GPU king. Moderator of r/LocalLLaMA, deep practical knowledge across NVIDIA, Mac, and Tenstorrent hardware. Hosts GPU giveaways with NVIDIA (RTX PRO 6000 Blackwell for GTC 2026) and regularly interviews open-weight labs. His key blog post — Stop Wasting Your Multi-GPU Setup With llama.cpp: Use vLLM or ExLlamaV2 for Tensor Parallelism — is essential reading for anyone with multiple GPUs.

@sudoingX

Pushing the limits of single-GPU inference. Ran Qwopus (Claude Opus 4.6 reasoning distilled into Qwen 3.5 27B) on a single RTX 3090 at 29-35 tok/s with thinking mode. Ran Qwen 3.5 9B on a single RTX 3060 — "5.3 GB of model on a card most people bought to play Warzone." Also discovered and published the fix for the Qwen 3.5 jinja template crash that broke OpenCode and Claude Code.

Takeaway: A single RTX 3090 can run 27B coding models at usable speeds — impressive for tasks like code completion and simpler agentic workflows.

Alex Cheema (@alexocheema)

Founder of ExoLabs. Oxford physics graduate. Pioneering distributed inference across Apple hardware — demonstrated 671B parameter models running across Mac Mini clusters. The Exo framework (42.7k stars) uses peer-to-peer topology with automatic device discovery and dynamic model partitioning. If you're interested in Mac Mini and Mac Studio clustering, this is the person to follow.

Digital Spaceport (@gospaceport)

The homelab hardware teacher. End-to-end AI server builds at every budget — from $150 entry-level to $5,000 quad-3090 builds. His Proxmox guides for Ollama + Open WebUI and vLLM are the best I've found.

Numman Ali (@nummanali)

Prolific CLI tool builder. cc-mirror creates isolated Claude Code variants with custom providers — your main installation stays untouched. Supports Z.ai, MiniMax, OpenRouter, Ollama, and local LLMs. Quick start: npx cc-mirror quick --provider mirror --name mclaude. Also building OpenSkills (cross-agent skill sharing) and an agent-native SDLC pipeline.

Takeaway: You don't need an Anthropic subscription to use Claude Code's interface. cc-mirror lets you point it at local or alternative models.

Get the Claude Code Config Pack — CLAUDE.md template, settings, hooks, and keybindings for the complete AI coding setup.

Dax Raad (@thdxr)

Creator of OpenCode — an open-source terminal-first AI coding agent with 120k+ stars, 75+ LLM providers, and zero data storage. Also built SST and models.dev. His grounded take: "The productivity feeling is real. The productivity isn't." OpenCode is vendor lock-in free — use any model provider.

Julia Turc (@juliarturc)

The compression scientist. Her paper Well-Read Students Learn Better (706+ citations) proved that pre-training compact models before distillation yields compound improvements — foundational research for how modern quantized models work. Now building Storia.ai (YC S24). Her YouTube channel explains deep AI concepts without the hype.

Teknium (@Teknium1)

Head of Post-Training at Nous Research ($1B valuation). Co-creator of the Hermes 4 model family (open-weight, hybrid reasoning, up to 405B parameters). Built DataForge for graph-based synthetic data generation. The OpenHermes 2.5 dataset (1M samples) is openly available. Also drove decentralized training via INTELLECT-2 — a 32B model trained across 100+ GPUs on 3 continents.

Open-Weight Model Labs

Several people are driving the open-weight model ecosystem forward:

Victor Mustar (@victormustar) — Head of Product at Hugging Face, shaping the UX of the platform hosting the world's largest open model collection.
Z.ai Community (@louszbd) — GLM-5 is 744B parameters (40B active), MIT licensed, #1 among open models on Text Arena with day-0 vLLM/SGLang support.
Skyler Miao (@SkylerMiao7) — Head of Engineering at MiniMax. M2 is 230B total / 10B active, MoE architecture that scores well on benchmarks while being very cost-efficient to run. API pricing: $0.30/M input tokens.

Also Worth Following

@Ex0byt — Making local inference on massive models possible on consumer hardware
@alexinexxx — GPU kernel programming learner with strong drive and educational content
@crystalsssup — Building top open-weight models and releasing research openly

Key Themes

1. The barrier to entry keeps dropping. Karpathy trains a ChatGPT clone for $15-100. Consumer hardware runs models that were data-center-only a year ago. You can start experimenting for $0 with Ollama on your existing machine.

2. Consumer GPUs are more capable than you'd expect. @sudoingX runs 27B coding models on a single RTX 3090 at usable speeds. Digital Spaceport documents builds starting at $150.

3. Apple Silicon clustering is an interesting frontier. Exo Labs runs 671B parameter models across Mac Mini clusters. Unified memory + MoE is surprisingly effective for the price.

4. Agent architecture should be minimal. Pi proves 4 tools and a 1,000-token system prompt outperforms bloated frameworks. Context engineering matters more than tool count.

5. Open-weight models are genuinely useful. GLM-5 (MIT), MiniMax M2, Hermes 4, Qwen — strong performance across many tasks, openly available. They're great for simple workflows, privacy-sensitive tasks, and offline use. For complex reasoning and agentic coding, frontier cloud models still have a clear edge.

6. Local and cloud are complementary. cc-mirror and OpenCode let you use familiar interfaces with local or alternative models. The best setup for most developers is probably both — cloud for hard tasks, local for everything else.

This field evolves fast. I'm still early in my own local inference journey — learning what works, what's overhyped, and where the real value is. If you're curious, the easiest way to start is ollama run llama3 on your existing machine and see what it can do. No commitment, no cost.

Get the full 14-page StarMorph Research PDF — detailed comparison tables, hardware buying guide, and thought leader profiles.

Some links in this article are affiliate links. If you purchase through them, I may earn a small commission at no extra cost to you. I only recommend products I actually use.

Originally published at StarBlog

Mermaid.js Tutorial: The Complete Guide to Diagrams as Code (2026)

Starmorph AI — Sun, 29 Mar 2026 13:22:47 +0000

Liquid syntax error: Variable '{{% raw %}' was not properly terminated with regexp: /\}\}/

Yazi: The Blazing-Fast Terminal File Manager for Developers

Starmorph AI — Fri, 20 Mar 2026 01:11:21 +0000

Yazi: The Blazing-Fast Terminal File Manager for Developers

TL;DR: Yazi is a blazing-fast, async terminal file manager built in Rust with image previews, vim keybindings, and a Lua plugin system. Install with brew install yazi (macOS) or cargo install --locked yazi-fm. Navigate with h/j/k/l, preview files instantly, and manage directories without leaving the terminal. 33k+ GitHub stars and significantly faster than Ranger thanks to non-blocking I/O.

If you spend most of your day in the terminal — navigating projects, previewing files, managing directories — you've probably used ls, cd, and tree thousands of times. Terminal file managers like Ranger have existed for years, but they share a fundamental problem: synchronous I/O. Open a directory with 10,000 files and the UI freezes.

Yazi (meaning "duck" in Chinese) solves this with a fully async, Rust-powered architecture. Every I/O operation is non-blocking. Directories load progressively. Image previews render natively. And it ships with a Lua plugin system and built-in package manager so you can extend it however you want.

With 33k+ GitHub stars and rapid iteration since its 2023 launch, Yazi has become the default terminal file manager for developers who care about speed.

Get the Pro Zsh Config — 40+ aliases, custom functions, Claude AI integration, and a tuned developer shell environment.

Why Yazi

Six things set Yazi apart from every other terminal file manager:

Fully async I/O — All file operations (listing, copying, previewing) run on background threads. The UI never freezes, even in massive directories.
Native image preview — Built-in support for Kitty Graphics Protocol, Sixel, iTerm2 Inline Images, and Ghostty. No hacky Uberzug workarounds needed (though it supports Uberzug++ as a fallback).
Scrollable previews — Preview text files, images, PDFs, videos, archives, JSON, and Jupyter notebooks. Scroll through content without opening the file.
Lua plugin system — Write functional plugins, custom previewers, metadata fetchers, and preloaders in Lua 5.4. There's a built-in package manager (ya pkg) for installing community plugins.
Vim-style keybindings — If you know vim motions, you already know Yazi. hjkl navigation, visual mode, yanking, and marks all work as expected.
Multi-tab and task management — Open multiple directory tabs, run file operations in the background with real-time progress, and cancel tasks on the fly.

Installation

macOS (Homebrew)

brew install yazi ffmpeg sevenzip jq poppler fd ripgrep fzf zoxide imagemagick font-symbols-only-nerd-font

Ubuntu / Debian

There's no official apt package with guaranteed up-to-date versions. Your best options:

# Option 1: Snap
sudo snap install yazi

# Option 2: Download binary from GitHub releases
# https://github.com/sxyazi/yazi/releases

# Option 3: Build from source (requires Rust toolchain)
cargo install --force yazi-build

Arch Linux

sudo pacman -S yazi ffmpeg 7zip jq poppler fd ripgrep fzf zoxide imagemagick

Fedora

dnf copr enable lihaohong/yazi
dnf install yazi

Other Platforms

Nix: Available in nixpkgs
Windows: scoop install yazi or winget install sxyazi.yazi
Cargo (any OS): cargo install --force yazi-build

Required and Recommended Dependencies

Yazi needs file for MIME type detection (pre-installed on most systems). For the full experience, install these optional dependencies:

Dependency	Purpose
Nerd Font	File type icons
ffmpeg	Video thumbnails
7-Zip	Archive preview and extraction
jq	JSON preview
poppler	PDF preview
fd	Filename search (`s` key)
ripgrep	Content search (`S` key)
fzf	Fuzzy file finding (`z` key)
zoxide	Smart directory jumping (`Z` key)
ImageMagick	HEIC, JPEG XL, font preview

Shell Wrapper (cd on exit)

By default, quitting Yazi doesn't change your shell's working directory. Add this wrapper function to your .zshrc or .bashrc:

function y() {
  local tmp="$(mktemp -t "yazi-cwd.XXXXXX")" cwd
  yazi "$@" --cwd-file="$tmp"
  if cwd="$(command cat -- "$tmp")" && [ -n "$cwd" ] && [ "$cwd" != "$PWD" ]; then
    builtin cd -- "$cwd"
  fi
  rm -f -- "$tmp"
}

Now use y instead of yazi. When you quit with q, your shell cds into whatever directory you were browsing.

Core Concepts

Yazi uses a three-pane layout inspired by Ranger:

┌──────────┬──────────────┬──────────────┐
│  Parent  │   Current    │   Preview    │
│  dir     │   dir        │   of file    │
│          │              │              │
│          │  > file.ts   │  [contents]  │
│          │    lib/       │              │
│          │    tests/     │              │
└──────────┴──────────────┴──────────────┘

Left pane: Parent directory (context for where you are)
Center pane: Current directory (where your cursor is)
Right pane: Preview of the hovered file or directory contents

Navigate with hjkl — h goes up a directory, l enters a directory or opens a file, j/k move the cursor down/up.

Tabs

Yazi supports multiple tabs, numbered 1–9. Press t to create a new tab, 1–9 to switch instantly. Think of it like browser tabs for your filesystem.

Tasks

File operations (copy, move, delete) run as background tasks with real-time progress. Press w to open the task manager, x to cancel a task.

Visual Mode

Press v to enter visual mode — select ranges of files with j/k, then operate on the selection (yank, cut, delete, etc.). Works exactly like vim visual line mode.

Complete Keybinding Reference

Navigation

Key	Action
`j` / `k`	Move cursor down / up
`l` / `h`	Enter directory (or open file) / Go to parent
`H` / `L`	Go back / Go forward (history)
`gg`	Jump to top of list
`G`	Jump to bottom of list
`Ctrl+d` / `Ctrl+u`	Half-page down / up
`Ctrl+f` / `Ctrl+b`	Full page down / up
`J` / `K`	Scroll preview pane down / up

Quick Directory Access

Key	Action
`gh`	Go to home directory (`~`)
`gc`	Go to config directory
`gd`	Go to downloads directory
`g Space`	Interactive directory change (type a path)
`z`	Fuzzy find via fzf
`Z`	Smart jump via zoxide

File Operations

Key	Action
`o` / `Enter`	Open file
`O`	Open interactively (choose program)
`y`	Yank (copy) selected files
`x`	Cut selected files
`p`	Paste files
`P`	Paste (overwrite if exists)
`Y` / `X`	Cancel yank / cut
`d`	Trash files (soft delete)
`D`	Permanently delete files
`a`	Create new file or directory
`r`	Rename file
`-`	Create symlink (absolute path)
`_`	Create symlink (relative path)
`.`	Toggle hidden files

Selection

Key	Action
`Space`	Toggle selection on current file
`v`	Enter visual mode (select range)
`V`	Enter visual mode (unset range)
`Ctrl+a`	Select all files
`Ctrl+r`	Inverse selection
`Esc`	Cancel selection

Copy Paths to Clipboard

Key	Action
`cc`	Copy full file path
`cd`	Copy directory path
`cf`	Copy filename
`cn`	Copy filename without extension

Filter, Find, and Search

Key	Action
`f`	Filter files (live filtering as you type)
`/`	Incremental find (next match)
`?`	Incremental find (previous match)
`n` / `N`	Next / previous find match
`s`	Search filenames with fd
`S`	Search file contents with ripgrep
`Ctrl+s`	Cancel search

Sorting

Key	Action
`,m` / `,M`	Sort by modified time / reverse
`,b` / `,B`	Sort by birth (creation) time / reverse
`,e` / `,E`	Sort by extension / reverse
`,a` / `,A`	Sort alphabetically / reverse
`,n` / `,N`	Sort naturally / reverse
`,s` / `,S`	Sort by size / reverse
`,r`	Sort randomly

Tab Management

Key	Action
`t`	Create new tab
`1`–`9`	Switch to tab N
`[` / `]`	Previous / next tab
`{` / `}`	Swap with previous / next tab
`Ctrl+c`	Close current tab

Shell and Tasks

Key	Action
`;`	Run shell command (non-blocking)
`:`	Run shell command (blocking, waits for exit)
`w`	Open task manager
`~` / `F1`	Open help menu
`q`	Quit (writes CWD for shell wrapper)
`Q`	Quit without writing CWD

Configuration

Yazi uses three TOML config files in ~/.config/yazi/:

yazi.toml — Core Settings

[mgr]
ratio        = [1, 4, 3]      # Pane width ratios [parent, current, preview]
sort_by      = "natural"       # natural, mtime, extension, alphabetical, size
sort_dir_first = true          # Directories listed before files
show_hidden  = false           # Show dotfiles
scrolloff    = 5               # Cursor padding from edge
linemode     = "none"          # none, size, mtime, permissions, owner

[preview]
wrap       = "no"              # Line wrapping in preview
tab_size   = 2                 # Tab width in preview
max_width  = 600               # Max image preview width
max_height = 900               # Max image preview height

[opener]
edit = [
  { run = '${EDITOR:-vi} "$@"', block = true, desc = "Edit" },
]

keymap.toml — Custom Keybindings

Add keybindings without overriding defaults using prepend_keymap:

[mgr]
prepend_keymap = [
  # Quick directory jumps
  { on = ["g", "r"], run = "cd ~/repos", desc = "Go to repos" },
  { on = ["g", "p"], run = "cd ~/projects", desc = "Go to projects" },

  # Open lazygit
  { on = ["<C-g>"], run = "shell 'lazygit' --block", desc = "Open lazygit" },
]

theme.toml — Colors and Styling

Override any visual element. For pre-made themes, install a flavor:

# Install the Catppuccin Mocha flavor
ya pkg add yazi-rs/flavors:catppuccin-mocha

# Set it in theme.toml
[flavor]
dark  = "catppuccin-mocha"
light = "catppuccin-latte"

Browse available flavors at yazi-rs/flavors.

init.lua — Plugin Initialization

This Lua file runs on startup. Use it to configure plugins:

-- ~/.config/yazi/init.lua

-- Enable zoxide database updates when navigating
require("zoxide"):setup { update_db = true }

-- Enable git status indicators
require("git"):setup { order = 1500 }

Plugin Ecosystem

Yazi has a thriving plugin ecosystem with 150+ community plugins. The built-in ya pkg package manager handles installation, updates, and version pinning.

Installing Plugins

# Install from the official plugins monorepo
ya pkg add yazi-rs/plugins:git
ya pkg add yazi-rs/plugins:smart-enter

# Install from a standalone community repo
ya pkg add Lil-Dank/lazygit

# List installed packages
ya pkg list

# Update all packages
ya pkg upgrade

# Remove a package
ya pkg delete yazi-rs/plugins:git

# Install all packages from package.toml (fresh machine setup)
ya pkg install

Plugins are tracked in ~/.config/yazi/package.toml, so you can version-control your plugin list and replicate it across machines.

Essential Plugins

These are the plugins I'd install on any new setup:

git.yazi — Git Status in File Listings

Shows modified/staged/untracked/ignored status inline next to every file:

ya pkg add yazi-rs/plugins:git

Configure in init.lua:

require("git"):setup { order = 1500 }

Add the fetchers in yazi.toml:

[[plugin.prepend_fetchers]]
id = "git"
url = "*"
run = "git"

[[plugin.prepend_fetchers]]
id = "git"
url = "*/"
run = "git"

lazygit.yazi — Full Git UI

Launch lazygit from within Yazi for staging, committing, rebasing, and more:

ya pkg add Lil-Dank/lazygit

smart-enter.yazi — Context-Aware Enter

Opens files or enters directories with a single key press:

ya pkg add yazi-rs/plugins:smart-enter

full-border.yazi — Visual Borders

Adds clean visual borders around all panes:

ya pkg add yazi-rs/plugins:full-border

Configure in init.lua:

require("full-border"):setup()

chmod.yazi — File Permissions

Change file permissions directly from Yazi:

ya pkg add yazi-rs/plugins:chmod

diff.yazi — File Comparison

Compare files and create patches:

ya pkg add yazi-rs/plugins:diff

More Notable Community Plugins

Plugin	Description	Install
starship.yazi	Starship prompt in Yazi header	`ya pkg add Rolv-Apneseth/starship`
yatline.yazi	Fully customizable header and status lines	`ya pkg add imsi32/yatline`
relative-motions.yazi	Vim relative line number jumps	`ya pkg add dedukun/relative-motions`
yamb.yazi	Persistent bookmarks with fzf	`ya pkg add h-hg/yamb`
projects.yazi	Save/restore tab sessions	`ya pkg add MasouShizuka/projects`
sudo.yazi	Execute operations with sudo	`ya pkg add TD-Sky/sudo`
bypass.yazi	Auto-skip single-subdirectory dirs	`ya pkg add Rolv-Apneseth/bypass`
compress.yazi	Create archives from selections	`ya pkg add KKV9/compress`
glow.yazi	Preview markdown with glow	`ya pkg add Reledia/glow`

For the full list, check out awesome-yazi.

Writing Custom Plugins

Yazi plugins are Lua 5.4 scripts. Create a directory in ~/.config/yazi/plugins/ with an init.lua file:

~/.config/yazi/plugins/my-plugin.yazi/
    init.lua

Here's a minimal example that copies the current directory structure to clipboard (useful for giving context to an LLM):

-- ~/.config/yazi/plugins/tree-to-clipboard.yazi/init.lua
local M = {}

function M:entry()
  local cwd = tostring(cx.active.current.cwd)
  local output = Command("tree")
    :arg("-L"):arg("3")
    :arg("--gitignore")
    :cwd(cwd)
    :output()

  if output then
    ya.clipboard(output.stdout)
    ya.notify {
      title = "Tree copied",
      content = "Directory tree copied to clipboard",
      timeout = 3,
    }
  end
end

return M

Bind it in keymap.toml:

[mgr]
prepend_keymap = [
  { on = ["g", "t"], run = "plugin tree-to-clipboard", desc = "Copy tree to clipboard" },
]

For type checking and autocomplete in your editor, install the types plugin:

ya pkg add yazi-rs/plugins:types

Tool Integrations

tmux

For image previews to work inside tmux, add to your .tmux.conf:

set -g allow-passthrough on
set -ga update-environment TERM
set -ga update-environment TERM_PROGRAM

Neovim

yazi.nvim provides deep bidirectional integration. Files hovered in Yazi are highlighted in Neovim, and you can open files as buffers, splits, or tabs directly from Yazi.

zoxide

Enable automatic database updates so every directory you visit in Yazi gets added to zoxide's ranking:

-- init.lua
require("zoxide"):setup { update_db = true }

fzf and ripgrep

Both are built-in integrations — no plugin needed. Just have fzf, fd, and ripgrep in your $PATH:

z — Fuzzy find files with fzf
s — Search filenames with fd
S — Search file contents with ripgrep

Practical Workflows

TypeScript / Web Development

Navigating a monorepo: Open tabs for different packages. Tab 1 for apps/web, tab 2 for packages/ui, tab 3 for packages/api. Press 1, 2, 3 to switch instantly.

Finding components: Press / and start typing a component name. Yazi incrementally narrows the file list as you type. Faster than Ctrl+P in VS Code for large projects because it doesn't index — it just filters what's on screen.

Previewing configs: Navigate to tsconfig.json, package.json, .env.local, or next.config.js and read the contents in the preview pane without opening your editor. Sort by modified time (,m) to see what changed recently.

Reviewing build output: Navigate to .next/, dist/, or node_modules/.cache to inspect build artifacts. The preview pane renders JSON, JavaScript, and source maps inline.

Bulk rename: Need to rename a batch of component files from PascalCase to kebab-case? Select files with v and visual mode, press r to open the bulk rename buffer in your $EDITOR, then use vim macros or find-and-replace to transform all names at once.

Linux Server Administration

Log inspection: Navigate to /var/log/ and preview log files inline. Sort by modified time (,m) to see the most recent logs first. Search within log content with S to grep across all log files.

Config file management: Jump between /etc/nginx/, /etc/systemd/, and /home/deploy/ using zoxide (Z). Preview config files before editing — catch mistakes before they take down a service.

Permission management: Use the chmod.yazi plugin to change permissions visually. Set linemode to permissions in yazi.toml to see file permissions inline:

[mgr]
linemode = "permissions"

Remote file management: Use sshfs.yazi to mount remote directories over SSH and browse them like local files.

Disk management: Use mount.yazi to mount, unmount, and eject disks without dropping to a shell.

AI-Assisted Development (Claude Code, Cursor, etc.)

When an AI coding agent is autonomously editing your codebase, Yazi becomes your real-time visibility layer:

Monitor file changes: Keep Yazi open alongside your AI agent. Sort by modified time (,m) and you'll see files bubble to the top as the agent modifies them. The preview pane shows the current contents instantly — no need to cat or open each file.

Review generated files: After an agent generates code, navigate to the output directory and scroll through each file's contents in the preview pane. Faster than opening each file individually in an editor.

Git status awareness: With git.yazi enabled, you see which files are modified, staged, or untracked right in the file listing. After an AI agent makes changes, you can immediately see the blast radius.

Copy directory context for prompts: Use the shell command (:) to run tree --gitignore -L 3 | pbcopy and paste the directory structure into your LLM conversation. Or write a custom plugin (like the tree-to-clipboard example above) to do it with a keybinding.

Bulk review and clean up: After an agent creates files you don't want, select them in visual mode (v), then trash (d) or permanently delete (D). Faster than rm-ing files one by one.

Quick diff: Use the diff.yazi plugin to compare the agent's output against your original files.

Yazi vs Ranger vs lf vs nnn

Feature	Yazi	Ranger	lf	nnn
Language	Rust + Lua	Python	Go	C
I/O model	Fully async	Synchronous	Async dir loading	Synchronous
Large directory performance	Excellent	Sluggish (10k+ files)	Fast	Fastest
Image preview	Native (Kitty, Sixel, iTerm2)	Uberzug only	External scripts	None
Plugin system	Lua + built-in pkg manager	Python scripts	Shell scripts	Shell scripts
Out-of-box experience	Excellent	Good (needs config)	Minimal	Minimal
File preview	Text, image, PDF, video, archive, JSON	Text, images (with setup)	Text (via script)	None
Tabs	Built-in (1–9)	Built-in	No	Contexts (4 max)
Trash support	Built-in	Limited	External	Via plugin
Memory usage	Low	Higher (Python)	Very low	Lowest (~3.5MB)
GitHub stars	33k+	16k	8k	19k

Pick Yazi if you want the best async performance, image previews, and a modern plugin ecosystem that works out of the box.

Pick Ranger if you're already invested in its Python plugin ecosystem and don't mind the performance trade-off.

Pick lf if you want a minimal, Go-based file manager and prefer configuring everything via shell scripts.

Pick nnn if you need the absolute lightest footprint — ideal for SSH into constrained servers or Docker containers.

Resources

Official

GitHub Repository — Source code, issues, discussions
Official Documentation — Installation, configuration, plugin API
Quick Start Guide — Get running in 5 minutes
Configuration Reference — yazi.toml, keymap.toml, theme.toml
Plugin Documentation — Writing and using plugins
Image Preview Setup — Terminal-specific setup instructions
Tips and Tricks — Advanced usage patterns
FAQ — Common questions and troubleshooting

Plugins and Themes

yazi-rs/plugins — Official plugin monorepo (18 plugins)
yazi-rs/flavors — Official theme/flavor repository
awesome-yazi — Curated list of 150+ community plugins and resources

Integrations

yazi.nvim — Neovim integration
lazygit.yazi — Lazygit inside Yazi
starship.yazi — Starship prompt in Yazi

Originally published at StarBlog

Best Mac Mini for Running Local LLMs and OpenClaw: Complete Pricing & Buying Guide (2026)

Starmorph AI — Fri, 20 Mar 2026 01:11:18 +0000

TL;DR: The Mac Mini M4 Pro with 48GB RAM ($1,599 new) is the sweet spot for local LLMs — it runs 70B parameter models like Llama 3.1 70B comfortably. The 24GB M4 base ($599) handles 7B-13B models. For 100B+ models, you need 128GB+ RAM ($3,199+). Used M2 Pro models with 32GB start around $800. Apple Silicon's unified memory architecture eliminates the VRAM bottleneck that limits GPU-based setups.

Apple's unified memory architecture means the CPU, GPU, and Neural Engine share one memory pool — no PCIe bottleneck, no copying between VRAM and system RAM. This is exactly what LLM inference needs, and it makes the Mac Mini a compelling option for running local models and AI agents like OpenClaw.

But which Mac Mini should you actually buy? And should you buy new or used?

I researched every Apple Silicon Mac Mini configuration, checked current used market prices, and mapped out exactly which LLM models you can run on each RAM tier — including what you need to run OpenClaw with local models. Here's the complete breakdown.

This post contains affiliate links. If you buy through these links, I may earn a small commission at no extra cost to you.

Why Mac Mini for LLMs

Three reasons the Mac Mini dominates local AI inference:

Unified memory = usable memory. On a PC with a discrete GPU, you're limited by VRAM (typically 8–24GB). On a Mac Mini, ALL your RAM is available for model loading. A 48GB Mac Mini has 48GB of usable model space.
Memory bandwidth. The M4 Pro has ~273 GB/s memory bandwidth. For LLM inference, memory bandwidth directly determines tokens per second. More bandwidth = faster responses.
Power efficiency. A Mac Mini draws ~30W under AI load. A dual-GPU PC rig draws 600W+. If you're running models 24/7, the electricity savings alone pay for the Mac Mini within a year.

The one hard rule: the model must fit in RAM or it won't run. RAM determines whether a model works. The chip determines how fast it runs. Buy the most RAM you can afford — you can't upgrade it later.

New Mac Mini Pricing (All M4 Configurations)

These are the current Apple MSRP prices for the 2024 Mac Mini lineup. Amazon frequently discounts these by $50–$100.

Chip	CPU / GPU	RAM	Storage	MSRP	Amazon
M4	10c CPU / 10c GPU	16GB	256GB	$599	Buy on Amazon
M4	10c CPU / 10c GPU	16GB	512GB	$799	Buy on Amazon
M4	10c CPU / 10c GPU	24GB	512GB	$999	Apple.com only
M4	10c CPU / 10c GPU	32GB	1TB	~$1,199	Apple.com only
M4 Pro	12c CPU / 16c GPU	24GB	512GB	$1,399	Buy on Amazon
M4 Pro	14c CPU / 20c GPU	48GB	1TB	~$1,999	Buy on Amazon
M4 Pro	14c CPU / 20c GPU	64GB	1TB	~$2,399	Apple.com only

Note: The M4 tops out at 32GB. If you need 48GB or 64GB, you must go M4 Pro — which also gives you ~30–50% higher memory bandwidth for faster token generation. Some configurations (24GB M4, 32GB M4, 64GB M4 Pro) are build-to-order and only available through Apple.com.

Used vs New Price Comparison

Used prices are based on Swappa, eBay, and Back Market listings as of February 2026. Facebook Marketplace prices tend to run ~10% lower but carry more risk (no buyer protection, harder to verify condition).

Model (Year)	Chip	RAM	Original MSRP	Used Price (Feb 2026)	Savings
Mac Mini (2020)	M1	8GB	$699	$275–$290	~60% off
Mac Mini (2020)	M1	16GB	$899	$350–$400	~58% off
Mac Mini (2023)	M2	8GB	$599	$300–$350	~45% off
Mac Mini (2023)	M2	16GB	$799	$450–$500	~40% off
Mac Mini (2023)	M2 Pro 10c	16GB	$1,299	$650–$750	~45% off
Mac Mini (2023)	M2 Pro 12c	32GB	$1,599	$825–$900	~45% off
Mac Mini (2024)	M4	16GB	$599	$475–$525	~16% off
Mac Mini (2024)	M4	24GB	$999	$800–$875	~15% off
Mac Mini (2024)	M4 Pro	24GB	$1,399	$1,100–$1,250	~15% off

The biggest value drops are on M1 and M2 models — you're getting 45–60% off original price. M4 models haven't depreciated much yet since they're less than two years old.

Tips for Buying Used

Swappa and Back Market offer buyer protection and verified listings
Facebook Marketplace is cheapest but verify the serial number on Apple's Check Coverage page before buying
Always test that the Mac boots and check About This Mac to confirm the RAM and storage match the listing
Avoid any listing that won't let you verify specs in person

What Can You Run? LLM Models by RAM Tier

macOS reserves ~4GB for system processes, so your actual available model space is RAM minus ~4GB. Here's what fits at each tier:

RAM	Available for Models	What You Can Run	Example Models
8GB	~4GB	Tiny models only — good for experimenting	Phi-3 Mini, Gemma 2B, TinyLlama 1.1B
16GB	~12GB	Small to medium models — solid for coding assistants	Llama 3.1 8B (Q4), Mistral 7B, Qwen2 7B, CodeLlama 7B
24GB	~20GB	Medium models comfortably — great all-rounder	Llama 3.1 8B (FP16), Codestral 22B (Q4), Mixtral 8x7B (Q4)
32GB	~28GB	Large quantized models — serious local AI	Llama 3.1 70B (Q2), Qwen2 32B (Q4), DeepSeek-V2 Lite
48GB	~44GB	70B models at good quality — the sweet spot	Llama 3.1 70B (Q4), DeepSeek-Coder 33B (FP16), Mixtral 8x22B (Q2)
64GB	~60GB	70B+ at high quality — near-cloud performance	Llama 3.1 70B (Q6/Q8), Qwen2 72B (Q4), DeepSeek-V3 (quantized)

Quick rule of thumb: model size in GB ≈ RAM needed. A 14B parameter model at Q4 quantization needs ~8GB. A 70B model at Q4 needs ~40GB.

What the Quantization Levels Mean

Q2/Q3 — Heavy compression. Noticeable quality loss but fits larger models in less RAM
Q4 — The sweet spot. Minor quality trade-off, significant memory savings
Q6/Q8 — Near full quality. Needs more RAM but output is close to the original model
FP16 — Full precision. Best quality, largest memory footprint

Recommendations by Budget

Under $400: M1 16GB (Used) — ~$375

The cheapest way to get into local LLMs. Runs 7B models fine for experimentation, coding assistance with smaller models, and RAG pipelines. The M1's memory bandwidth is lower (~68 GB/s) so token generation is slower, but the models load and run.

Best for: Learning, experimenting, lightweight coding assistants

Check Swappa or eBay for used M1 Mac Mini listings.

Under $900: M2 Pro 32GB (Used) — ~$850

The best value play for serious local LLM use. 32GB lets you run models that a 16GB machine simply cannot load. You can squeeze a 70B model at aggressive quantization, or run 14B–32B models comfortably at Q4.

Best for: Running production-grade coding assistants, medium-size open models, multiple smaller models simultaneously

$999 New: M4 24GB

If you want new with warranty, this is the entry point. 24GB handles most practical models (7B–22B) with room for the OS. The M4's improved memory bandwidth over M1/M2 means faster token generation at every model size.

Best for: Daily driver that handles most local AI tasks, future-proofed with latest chip

The M4 24GB configuration is a build-to-order option — configure it on Apple.com.

~$2,000 New: M4 Pro 48GB — The LLM Sweet Spot

This is the configuration most local LLM enthusiasts recommend. 48GB of unified memory lets you run 70B quantized models comfortably. The M4 Pro's ~273 GB/s memory bandwidth means you're getting fast token generation — not just loading models, but getting usable response speeds.

Best for: Running Llama 3.1 70B, DeepSeek V3, and other frontier open models locally. Serious AI development, fine-tuning experiments, running multiple models.

Buy M4 Pro 48GB Mac Mini on Amazon

~$2,400+ New: M4 Pro 64GB — Maximum Local AI

For running 70B+ models at higher quantization levels (Q6/Q8) where output quality approaches the cloud-hosted version. Also useful if you want to run multiple models simultaneously or keep a large model loaded while doing other memory-intensive work.

Best for: Maximum model quality, running multiple models, professional AI research

The 64GB configuration is build-to-order — configure it on Apple.com.

Running OpenClaw on a Mac Mini

OpenClaw is an open-source AI agent (68k+ GitHub stars) that turns your Mac Mini into a personal AI assistant you can message from WhatsApp, Telegram, Slack, Discord, Signal, or iMessage. Unlike simple chatbot wrappers, OpenClaw can actually do things on your machine — browse the web, manage files, run shell commands, execute scheduled tasks, and interact with 100+ skill plugins.

The Mac Mini has become the go-to hardware for self-hosting OpenClaw because it's small, silent, power-efficient, and can run 24/7 in a closet. Combined with local models via Ollama, you get a fully private AI assistant with zero ongoing API costs.

Important: Model Provider Terms of Service

Be careful which cloud models you use with OpenClaw. As of early 2026, both Anthropic (Claude) and Google (Gemini) prohibit using their APIs with OpenClaw under their terms of service. Users have reported getting their API keys banned for doing so. OpenAI's policies are more permissive, but always check the current terms before connecting any cloud provider.

This is a major reason why the local model route is so appealing for OpenClaw — you own the hardware, you own the model weights, and there are no terms of service to violate. If you plan to use OpenClaw exclusively with local models, the hardware requirements below are what matter. If you use a cloud provider whose terms allow it, you don't need powerful hardware at all — even the base $599 Mac Mini with 16GB will work fine, since the inference happens on the provider's servers and your Mac Mini just runs the lightweight OpenClaw gateway.

What Makes OpenClaw Different

OpenClaw isn't a coding assistant like Claude Code or Cursor — it's a general-purpose life agent. You message it like a coworker:

"Summarize my inbox and draft replies"
"Monitor this GitHub repo and notify me of new issues"
"Scrape these 50 URLs and put the data in a spreadsheet"
"Remind me to review PRs every morning at 9am"

It connects to your messaging apps as the interface and uses local (or cloud) LLMs as the brain. The skills system lets you control exactly what the agent can and can't do on your machine.

OpenClaw Hardware Requirements (Local Models)

The hardware requirements below only apply if you're running local models. If you're using a permitted cloud API, OpenClaw itself is lightweight and runs on anything.

For local inference, OpenClaw is more demanding than running a single model in Ollama because the agent needs a large context window (minimum 64K tokens) to handle multi-step tasks reliably. That context window eats into your available RAM on top of the model weights.

Mac Mini Config	What You Can Run with OpenClaw	Experience
16GB (M4)	GLM-4.7-Flash (9B) with tight context	Functional but constrained — simple tasks only
24GB (M4)	Devstral-24B (Q4) or GLM-4.7-Flash with comfortable context	Good for single-model agent tasks
32GB (M2 Pro / M4)	Qwen3-Coder-32B (Q4) or Devstral-24B with full 64K context	Solid — handles most agent workflows
48GB (M4 Pro)	Qwen3-Coder-32B with room for large context + OS overhead	Great — reliable multi-step tasks
64GB (M4 Pro)	Dual model setup: Qwen3-Coder-32B primary + GLM-4.7-Flash fallback	Best — "zero cloud" configuration, full local autonomy

Recommended Models for OpenClaw

OpenClaw requires models with strong tool-calling support and at least 64K context. Not every model works well — the agent needs to reliably call functions, not just generate text. The community-tested picks:

GLM-4.7-Flash (9B active params, 128K context) — Best lightweight option. Excellent tool-calling, runs on 16GB+. Good as a fallback model in dual setups.
Qwen3-Coder-32B (32B params, 256K context) — Community consensus pick for coding tasks. Extremely stable tool calling. Needs ~20GB at Q4 plus 4–6GB for KV cache. Requires 32GB+ hardware.
Devstral-24B (24B params) — Strong coding model that fits in ~14GB at Q4. Good middle ground between GLM-4.7-Flash and Qwen3-Coder.
MiniMax M2.1 (via LM Studio) — The official docs recommend this as the best current local stack with 196K context.

Quick Setup: OpenClaw + Ollama on Mac Mini

# Install Ollama (if not already installed)
brew install ollama

# Pull a recommended model
ollama pull qwen3-coder:32b

# Install OpenClaw
npm install -g openclaw@latest

# Run the onboarding wizard
openclaw onboard --install-daemon

The onboarding wizard walks you through connecting a messaging channel (Telegram is easiest — create a bot via @BotFather), pointing OpenClaw at your Ollama instance (http://localhost:11434/v1), and configuring skills.

Local vs Cloud: The Cost and Capability Trade-Off

Running OpenClaw with cloud API models costs roughly $30–$100/month depending on usage, but requires almost no local hardware — the base Mac Mini works fine. Running fully local has a one-time hardware cost and ~$3/month in electricity, but requires a significant RAM investment for good model quality.

Local models have gotten dramatically better in 2025–2026, but cloud models still have an edge for complex multi-step reasoning. OpenClaw supports a hybrid setup — local models for routine tasks with a cloud model fallback for harder queries via models.mode: "merge" in the config. Just make sure any cloud provider you connect is one whose terms of service explicitly allow third-party agent use.

Where to Buy

New

Retailer	Notes
Amazon	Frequently $50–$100 below MSRP, Prime shipping
Apple Store	Full BTO customization (only place for some configs)
B&H Photo	No sales tax in most states
Micro Center	In-store deals, sometimes lowest prices

Used / Refurbished

Retailer	Notes
Apple Refurbished	1-year warranty, tested by Apple, 15% off
Swappa	Verified listings, buyer protection
Back Market	Graded condition, 1-year warranty
Facebook Marketplace	Cheapest prices but no buyer protection — inspect in person
eBay	Wide selection, eBay buyer protection

Software Setup

Once you have your Mac Mini, getting local LLMs running takes about 5 minutes:

Ollama (Recommended)

The simplest way to run local models. One binary, no dependencies.

# Install
brew install ollama

# Start the server
ollama serve

# Pull and run a model
ollama pull llama3.1:8b
ollama run llama3.1:8b

# For 70B (needs 48GB+ RAM)
ollama pull llama3.1:70b
ollama run llama3.1:70b

LM Studio

GUI application with a model browser, chat interface, and local API server. Great if you prefer a visual interface.

Download from lmstudio.ai.

Exo

Cluster multiple Macs together for running models that exceed a single machine's RAM. If you have two 32GB Mac Minis, you can run a 70B model across both.

pip install exo
exo run llama-3.1-70b

The bottom line: For local LLM inference and tools like OpenClaw, buy the most RAM you can afford. The M4 Pro 48GB at ~$2,000 is the sweet spot for running serious models and a reliable AI agent. If budget is tight, a used M2 Pro 32GB at ~$850 gets you surprisingly far. And if you just want to experiment, a used M1 16GB for ~$375 is the cheapest entry point that's actually usable.

RAM determines what you can run. Everything else determines how fast it runs.

Originally published at StarBlog

Pixelmuse CLI Guide: AI Image Generation From Your Terminal

Starmorph AI — Fri, 20 Mar 2026 01:11:14 +0000

If you're a developer who lives in the terminal, you've probably hit this problem: you need an image for a blog post, a social card, or a project thumbnail — and suddenly you're context-switching to a browser, logging into some image generator, waiting for a result, downloading it, and dragging it into your project. That entire flow breaks your focus.

Pixelmuse CLI lets you generate AI images without leaving the terminal. One command, a prompt, and your image is saved to disk — ready to use. It also ships with an interactive TUI, prompt templates, and an MCP server so AI coding agents like Claude Code can generate images autonomously.

Install Pixelmuse CLI

Requirements: Node.js 20+ and a package manager (pnpm, npm, or yarn).

# Install globally
pnpm add -g pixelmuse

# Or with npm
npm install -g pixelmuse

# Verify installation
pixelmuse --version

Optional: Install chafa for terminal image previews:

# macOS
brew install chafa

# Ubuntu/Debian
sudo apt-get install chafa

With chafa installed, Pixelmuse automatically renders a preview of your generated image right in the terminal.

Create Your Account

You need a Pixelmuse account to generate images. Every new account gets 15 free credits — enough for 15 generations with the default model.

Option 1: Sign up in the browser

Go to pixelmuse.studio/sign-up and create an account with email or GitHub.

Option 2: Sign up from the CLI

Run the setup wizard — it opens the signup page automatically if you don't have an account:

pixelmuse setup

The setup wizard walks you through account creation, authentication, MCP configuration, and default settings in one flow.

Authenticate

Pixelmuse CLI supports two authentication methods:

Device Code Login (Recommended)

pixelmuse login

This opens your browser to verify your device. Enter the code shown in your terminal, approve access, and you're authenticated. The API key is stored securely in your OS keychain.

Manual API Key

If you prefer, generate an API key at pixelmuse.studio/settings/api-keys and either:

# Set as environment variable (add to ~/.zshrc for persistence)
export PIXELMUSE_API_KEY="pm_live_your_key_here"

# Or enter manually during login
pixelmuse login
# Select "Enter API key manually" when prompted

Key resolution order: Environment variable → OS Keychain → Config file (~/.config/pixelmuse-cli/auth.json).

Generate Your First Image

The simplest generation — just a prompt:

pixelmuse "a cat floating through space"

That's it. Pixelmuse uses the default model (nano-banana-2, 1 credit), generates the image, saves it to your current directory, and shows a terminal preview.

The output file is named from your prompt: a-cat-floating-through-space.png.

With Options

# Widescreen blog thumbnail
pixelmuse "neon cityscape at night" -a 16:9

# Specific model and output path
pixelmuse "watercolor mountain landscape" -m imagen-3 -o hero.png

# Anime style
pixelmuse "samurai standing in rain" -s anime -a 2:3

CLI Flags and Options

Flag	Short	Default	Purpose
`--model`	`-m`	`nano-banana-2`	Model to use
`--aspect-ratio`	`-a`	`1:1`	Image dimensions
`--style`	`-s`	`none`	Style preset
`--output`	`-o`	Auto-named	Output file path
`--json`	—	`false`	Machine-readable JSON output
`--no-preview`	—	—	Skip terminal preview
`--open`	—	`false`	Open in system image viewer
`--clipboard`	—	`false`	Copy image to clipboard
`--watch`	—	—	Watch a prompt file, regenerate on save
`--no-save`	—	—	Don't save to disk
`--public`	—	`false`	Make image publicly visible

Available Models

Pixelmuse ships with 6 models at different price points:

Model	Credits	Best For
nano-banana-2	1	Speed, text rendering, world knowledge (default)
flux-schnell	1	Quick mockups and ideation
imagen-3	1	Photorealistic images, complex compositions
recraft-v4	1	Typography, graphic design, composition
nano-banana-pro	4	Advanced text rendering, multi-image editing
recraft-v4-pro	7	High-resolution design, art direction

List models from the CLI anytime:

pixelmuse models

Start with nano-banana-2 — it's 1 credit, fast, and handles most use cases. Move to specialized models when you need specific strengths.

Aspect Ratios

Ratio	Use Case
`1:1`	Social media posts, avatars (default)
`16:9`	Blog thumbnails, YouTube thumbnails, OG images
`9:16`	Phone wallpapers, Instagram stories
`4:3`	Presentations
`2:3`	Portraits
`21:9`	Ultrawide banners

# Blog thumbnail
pixelmuse "your prompt" -a 16:9

# Instagram story
pixelmuse "your prompt" -a 9:16

Check Your Account and History

# View credit balance and plan info
pixelmuse account

# See your last 20 generations
pixelmuse history

# Open a specific generation in your image viewer
pixelmuse open <generation-id>

Prompt Templates

Templates let you save reusable prompt configurations — prompt text, model, aspect ratio, and variables — as YAML files.

Create a Template

pixelmuse template init blog-thumbnail

This creates ~/.config/pixelmuse-cli/prompts/blog-thumbnail.yaml. Edit it:

name: Blog Thumbnail
description: Dark-themed blog post thumbnail
prompt: >
  A cinematic {{subject}} on a dark gradient background,
  dramatic lighting, 8K resolution
defaults:
  model: nano-banana-2
  aspect_ratio: '16:9'
  style: none
variables:
  subject: 'code editor with syntax highlighting'
tags: [blog, thumbnail, dark]

Use a Template

# Generate with default variable values
pixelmuse template use blog-thumbnail

# Override variables
pixelmuse template use blog-thumbnail --var subject="React hooks diagram"

# List all templates
pixelmuse template list

# View template details
pixelmuse template show blog-thumbnail

Templates are powerful for batch content workflows — define your brand's image style once, then generate consistent visuals with one command.

Interactive TUI

For a more visual experience, launch the interactive terminal UI:

pixelmuse ui

The TUI gives you:

Generation wizard — step-by-step image generation with model selection
Gallery — browse all your past generations with previews
Model browser — compare models side by side
Account management — check credits, view usage stats
Prompt editor — create and manage templates visually

Key bindings:

Key	Action
Arrow keys	Navigate
`Enter`	Select
`Esc`	Go back
`q`	Quit

MCP Server Setup

The MCP (Model Context Protocol) server lets AI coding agents generate images autonomously. When you configure it, tools like Claude Code, Cursor, and Windsurf can call Pixelmuse directly during a conversation.

Get Your API Key

Go to pixelmuse.studio/settings/api-keys and copy your key.

Claude Code

Add to ~/.claude/mcp.json:

{
  "mcpServers": {
    "pixelmuse": {
      "command": "npx",
      "args": ["-y", "pixelmuse-mcp"],
      "env": {
        "PIXELMUSE_API_KEY": "pm_live_your_key_here"
      }
    }
  }
}

Cursor

Add to your Cursor MCP settings (Settings → MCP):

{
  "pixelmuse": {
    "command": "npx",
    "args": ["-y", "pixelmuse-mcp"],
    "env": {
      "PIXELMUSE_API_KEY": "pm_live_your_key_here"
    }
  }
}

Windsurf

Same configuration as Cursor — add to your Windsurf MCP settings file.

What the MCP Server Provides

Three tools become available to your AI agent:

Tool	Purpose
`generate_image`	Generate an image with prompt, model, aspect ratio, style
`list_models`	List available models and credit costs
`check_balance`	Check account credit balance

Once configured, you can ask Claude Code things like:

"Generate a 16:9 blog thumbnail showing a developer typing in a dark terminal"

And it will call Pixelmuse directly, save the image, and continue working — no context switch needed.

Auto-Configure via Setup

The setup wizard can detect and configure MCP for your editors automatically:

pixelmuse setup

It checks for Claude Code, Cursor, and Windsurf and offers to add the MCP configuration for you.

Claude Code Skill

If you use Claude Code, you can add a Pixelmuse skill that lets you generate images mid-conversation with natural language.

Create ~/.claude/skills/pixelmuse-generate/skill.md with the trigger phrases and instructions for Claude Code to call the Pixelmuse CLI. The skill enables prompts like:

"Generate a thumbnail for this blog post"

And Claude Code will run the appropriate pixelmuse command based on your context.

The Pixelmuse CLI README includes a ready-to-use skill template.

Advanced Usage

Piping Prompts

Read prompts from stdin — useful for scripting and chaining commands:

# From echo
echo "mountain landscape at golden hour" | pixelmuse -o landscape.png

# From a file
cat prompt.txt | pixelmuse -m imagen-3

# From another command
curl -s https://api.example.com/prompt | pixelmuse

Watch Mode

Auto-regenerate when a prompt file changes — great for iterating on prompts:

pixelmuse --watch prompt.txt -o output.png

Edit prompt.txt in your editor, save, and the image regenerates automatically.

JSON Output for Scripting

Get machine-readable output for automation pipelines:

pixelmuse --json "your prompt" | jq .output_path

Batch Generation with Shell Scripts

#!/bin/bash
prompts=("sunset over ocean" "mountain at dawn" "city at night")
for prompt in "${prompts[@]}"; do
  pixelmuse "$prompt" -a 16:9 -o "$(echo $prompt | tr ' ' '-').png"
done

Environment Variable Auth

For CI/CD or shared machines, set the API key as an environment variable:

export PIXELMUSE_API_KEY="pm_live_your_key_here"

This takes priority over keychain and config file auth.

Quick Reference

Task	Command
Install	`pnpm add -g pixelmuse`
Setup wizard	`pixelmuse setup`
Login	`pixelmuse login`
Generate image	`pixelmuse "prompt"`
Generate 16:9	`pixelmuse "prompt" -a 16:9`
Use specific model	`pixelmuse "prompt" -m imagen-3`
Save to path	`pixelmuse "prompt" -o output.png`
List models	`pixelmuse models`
Check credits	`pixelmuse account`
View history	`pixelmuse history`
Launch TUI	`pixelmuse ui`
Create template	`pixelmuse template init name`
Use template	`pixelmuse template use name`
Watch mode	`pixelmuse --watch file.txt`
JSON output	`pixelmuse --json "prompt"`

Get started at pixelmuse.studio/sign-up — 15 free credits, no credit card required. Full API documentation is at pixelmuse.studio/developers.

Originally published at StarBlog

10 More CLI Tools for AI Coding: Part 2 Terminal Workflow Guide

Starmorph AI — Fri, 20 Mar 2026 01:11:11 +0000

TL;DR: Part 2 covers 10 more CLI tools: Tmuxinator, Gh CLI, Jq, Httpie, Dust, Procs, Bandwhich, Tokei, Hyperfine, and Glow — plus the best resources for discovering new packages across Homebrew, NPM, crates.io, and GitHub Trending. Install all with: brew install tmuxinator gh jq httpie dust procs bandwhich tokei hyperfine glow.

After the first 10 CLI tools post blew up, the most common comment was "you need to check out Yazi." The second most common request was for resources to actually discover new tools. This part 2 covers both — 10 more CLI tools I've added to my workflow, plus the package explorers and curated lists I use to find them.

This is the companion guide to my YouTube video: 10 More CLI Tools (Part 2). Every tool below includes installation instructions and the commands to get started.

Yazi — Terminal File Manager

The most requested tool from part 1's comments — and for good reason. Yazi is a blazing-fast, async, Rust-powered terminal file manager with image previews, tabs, and a Lua plugin system. It makes Ranger feel slow.

# Install
brew install yazi        # macOS
cargo install --locked yazi-fm yazi-cli  # via Cargo

# Launch
yazi

Key bindings:

Key	Action
`j/k`	Navigate up/down
`h/l`	Parent directory / Enter directory
`G`	Jump to bottom
`g g`	Jump to top
`~`	Jump to home directory
`.`	Jump to config directory
`Shift+O`	Reveal in Finder / Open with editor
`y`	Copy selected file path to clipboard
`t`	Create new tab
`1-9`	Switch between tabs
`;`	Sort options (alphabetical, size, time)

Yazi handles large directories without freezing because every I/O operation is non-blocking. You can sort by size to find your biggest folders, reverse the order, open files directly, and even copy file paths to paste into other terminal windows.

I wrote a full deep-dive on Yazi with plugin setup, configuration, and advanced workflows — check out the complete Yazi guide.

Zoxide Interactive Mode

I covered Zoxide in part 1, but missed the best feature: interactive mode. Instead of z projects (which jumps to the top match), use zi to get a fuzzy finder with all matching directories.

# Standard jump (top match wins)
z pixelmuse

# Interactive mode — pick from multiple matches
zi pixelmuse

# Browse all tracked directories interactively
zi

This is a lifesaver when you have similarly named directories. I have both pixelmuse-studio and pixelmuse-cli repos — zi pixelmuse lets me pick which one instead of guessing. Check the part 1 post for installation and initial setup.

Tealdeer (tldr)

Man pages are comprehensive but overwhelming. Tealdeer (tldr) gives you the top 5-10 practical examples for any command — the stuff you actually need.

# Install
brew install tealdeer    # macOS
sudo apt install tealdeer  # Ubuntu/Debian (may also be `tldr`)
cargo install tealdeer   # via Cargo

# Update the local page cache (run once after install)
tldr --update

# Get quick examples for any command
tldr tar
tldr ffmpeg
tldr yazi
tldr docker

Example output for tldr tar:

tar - Archiving utility

- Create an archive from files:
    tar cf target.tar file1 file2 file3

- Extract an archive in the current directory:
    tar xf source.tar

- Create a gzipped archive:
    tar czf target.tar.gz file1 file2

- Extract a gzipped archive to a directory:
    tar xzf source.tar.gz -C directory

Compare that to man tar which is hundreds of lines. When you install a new package and want a quick overview of what it can do, tldr is the first thing to run.

bat

bat is cat with syntax highlighting, line numbers, and git integration. It's a drop-in replacement that makes reading files in the terminal actually pleasant.

# Install
brew install bat         # macOS
sudo apt install bat     # Ubuntu/Debian

# View a file (with syntax highlighting)
bat script.ts
bat README.md
bat config.yaml

# Use as a pager (scrollable)
bat --paging=always long-file.log

I alias cat to bat in my shell config so every file I read gets automatic formatting:

# Add to ~/.zshrc
alias cat="bat"

When you cat a Markdown file, instead of seeing raw # and ** symbols, you get properly highlighted headings and bold text. Same for TypeScript, Python, YAML — bat detects the language from the file extension and highlights accordingly.

tmux

tmux is the terminal multiplexer — it lets you run persistent, multi-pane terminal sessions that survive disconnects. If you close your laptop and come back, your tmux sessions are still running.

# Install
brew install tmux        # macOS
sudo apt install tmux    # Ubuntu/Debian

# Start a new session
tmux new -s work

# Split panes
# Ctrl+b %    → vertical split
# Ctrl+b "    → horizontal split

# Navigate panes
# Ctrl+b ←/→/↑/↓

# List sessions
tmux ls

# Detach from session (keeps running)
# Ctrl+b d

# Reattach
tmux attach -t work

tmux is especially powerful with Claude Code — you can have one pane running Claude, another watching logs, and a third monitoring system resources. The session persists even if your SSH connection drops.

I have a full tmux guide with configuration, tmuxinator automation, and practical monitoring setups — read the complete tmux guide.

Pixelmuse CLI — AI Image Generation

Pixelmuse CLI brings AI image generation into your terminal. It connects to the Pixelmuse API so you can generate images, blog thumbnails, and creative assets without leaving the command line — and it ships with both a direct CLI and an interactive TUI.

# Install
npm install -g pixelmuse-cli

# Generate an image
pixelmuse generate "a cyberpunk cityscape at sunset"

# Launch the interactive TUI (auth, generate, browse)
pixelmuse

The real power is pairing it with Claude Code. Point Claude at a blog post or repo and tell it to generate a thumbnail based on the content — it reads the full context, crafts a prompt, and generates the image through the Pixelmuse API. No context-switching to a browser, no copy-pasting prompts.

I built this as an extension of the Pixelmuse platform — I'll be making a dedicated video on building CLIs with React Ink and Claude Code soon.

Mole — Mac Deep Clean

Mole is a CLI tool for deep cleaning and optimizing your Mac. It analyzes disk usage and checks system health with a clean TUI dashboard.

# Install
brew install mole

# Analyze disk usage
mole analyze

# Check system health dashboard
mole status

The status dashboard shows CPU, memory, disk, battery, and network details in one view. Useful when you want a quick health check beyond what btop shows — especially for disk usage analysis and cleanup recommendations.

Jolt — Battery and Hardware Monitor

Jolt is a hardware-focused system monitor, especially useful for tracking battery health and power consumption on laptops.

# Install
brew install jolt

# Launch the hardware monitor
jolt

Where btop focuses on CPU and memory processes, Jolt gives you deeper insight into battery cycles, hardware temperatures, and power draw. Nice to have alongside btop if you're running intensive local AI models and want to watch your hardware health.

ttyper — Terminal Typing Test

ttyper is a minimalist typing test that runs in your terminal. I use it as a warmup when I start working — a few minutes of focused typing in the actual terminal helps me get into the flow.

# Install
brew install ttyper      # macOS
cargo install ttyper     # via Cargo

# Start a typing test
ttyper

# Use specific word count
ttyper -w 50

# Use a custom word list
ttyper -c custom-words.txt

It tracks WPM, accuracy, and shows real-time feedback on mistakes. Low-stakes, fun, and surprisingly effective for warming up your fingers before a coding session.

Discovering New Tools

The most common question from part 1: "How do you find these tools?" Here are the resources I use.

Taproom — Explore Homebrew Packages

I covered Taproom in part 1, but didn't show its best exploration feature: sorting by total installs. This shows you the most popular Homebrew packages across the entire ecosystem.

taproom
# Then sort by "Total Installs" to see what's trending

This is how I discovered several tools from part 1 — just browsing the top-installed packages and finding things I hadn't tried.

Forage CLI — Explore NPM Packages

I built Forage CLI specifically because there wasn't a good TUI for browsing NPM packages. It's like the npmjs.com website but in your terminal — browse categories, read package details, and open the NPM page directly.

# Install
npm install -g forage-cli

# Launch
forage

Browse different categories, drill into individual packages, and open them on NPM for full documentation. I built this with React Ink and Claude Code — it was my first CLI project.

crates-tui — Explore Rust Packages

crates-tui is the same concept for Rust crates. Browse, search, and explore the Rust package ecosystem from your terminal.

# Install
cargo install crates-tui

# Launch
crates-tui

Ratatui — Curated TUI Ecosystem

Ratatui is both a Rust framework for building TUIs and a community hub. Their awesome-ratatui list on GitHub is one of the best curated collections of terminal tools — many of the tools from part 1 came from browsing this list.

The Package Manager Landscape

There are several distinct ecosystems to explore:

Ecosystem	Package Manager	Explorer Tool
macOS/Linux system tools	Homebrew	Taproom
JavaScript/Node.js	npm	Forage CLI
Rust	Cargo/crates.io	crates-tui
Python	pip/PyPI	—
Linux system packages	apt	—

Each ecosystem has different strengths. Homebrew and Cargo tend to have the best CLI/TUI tools. NPM is strong for JavaScript developer tooling. Python's PyPI is best for data science and AI utilities.

Quick Reference

Tool	Install	Launch	What It Does
Yazi	`brew install yazi`	`yazi`	Async terminal file manager
Zoxide (interactive)	(already installed)	`zi`	Fuzzy directory picker
Tealdeer	`brew install tealdeer`	`tldr <cmd>`	Simplified man pages
bat	`brew install bat`	`bat file.ts`	cat with syntax highlighting
tmux	`brew install tmux`	`tmux new -s work`	Terminal multiplexer
Pixelmuse CLI	`npm i -g pixelmuse-cli`	`pixelmuse`	AI image generation
Mole	`brew install mole`	`mole status`	Mac system cleanup
Jolt	`brew install jolt`	`jolt`	Battery/hardware monitor
ttyper	`brew install ttyper`	`ttyper`	Terminal typing test
Taproom	`brew install taproom`	`taproom`	Explore Homebrew packages
Forage CLI	`npm i -g forage-cli`	`forage`	Explore NPM packages
crates-tui	`cargo install crates-tui`	`crates-tui`	Explore Rust crates

Conclusion

Between part 1 and part 2, that's 20+ CLI tools to level up your terminal workflow alongside AI coding agents. You don't need all of them — pick 2-3 that solve a pain point you have right now and build from there.

If you're looking for more, check out the Ratatui awesome list and the package explorers above. The terminal tool ecosystem is growing fast, especially as more developers move their workflows into the CLI alongside tools like Claude Code.

Related guides:

Originally published at StarBlog

Obsidian + Claude Code: The Complete Integration Guide

Starmorph AI — Fri, 20 Mar 2026 01:10:48 +0000

TL;DR: Integrate Obsidian with Claude Code using 5 strategies: dedicated developer vault with symlinks (ln -s ~/vault/notes ./docs), vault-as-repo with .obsidianignore filtering, MCP bridges for direct vault access, Obsidian plugins (Smart Connections, Copilot), and community-tested workflows. Symlinks are the simplest — one command gives Claude Code read access to your knowledge base.

Obsidian and Claude Code are two of the most powerful tools in a developer's toolkit right now — but using them together isn't obvious. Claude Code generates markdown files constantly (plans, memory, CLAUDE.md configs), and Obsidian is the best markdown editor on the planet. The problem? If you open a code repo as an Obsidian vault, you get PNGs, JavaScript files, JSON configs, and node_modules cluttering your file explorer.

I researched blog posts, Twitter threads, YouTube videos, GitHub repos, and Obsidian forum discussions to compile every strategy the community has found. This guide covers five distinct approaches, from simple file filtering to MCP bridges, so you can pick the one that fits your workflow.

The Problem

Claude Code stores its configuration across multiple locations:

~/.claude/CLAUDE.md — global instructions
~/.claude/plans/ — plan files for implementation tasks
~/.claude/projects/ — per-project memory files
~/.claude/skills/ — reusable skill definitions
{repo}/CLAUDE.md — per-project instructions (checked into each repo)

If you work across multiple repos, these files are scattered everywhere. Opening a code repo as an Obsidian vault technically surfaces the markdown, but also dumps every PNG, JS file, lock file, and node_modules directory into your file explorer.

Obsidian's built-in "Excluded Files" setting (Settings > Files & Links) helps, but it only does a soft exclude — files are hidden from some views but still indexed internally. It doesn't fully solve the problem.

Strategy 1: Dedicated Developer Vault with Symlinks

Best for: developers working across multiple repos who want unified search.

Create a dedicated Obsidian vault that's separate from any code repo. Use directory symlinks to pull in the files you care about.

# Create a dedicated vault (NOT inside any repo)
mkdir ~/Developer-Vault
cd ~/Developer-Vault

# Symlink your Claude Code global config
ln -s ~/.claude claude-global

# Symlink each project
ln -s ~/projects/my-app my-app
ln -s ~/projects/my-api my-api

Then configure .obsidian/app.json to filter out code noise:

{
  "userIgnoreFilters": ["node_modules/", ".next/", "dist/", ".git/", ".vercel/"]
}

Install File Explorer++ to filter by extension (hide *.js, *.ts, *.png, etc.).

What you get

Unified search across all CLAUDE.md files, plans, memory, and skills
Dataview queries spanning all projects
Cross-linking between project notes
No .obsidian/ clutter in your actual repos

Gotchas

Obsidian only supports directory symlinks, not individual file symlinks
Symlinks can cause issues on Obsidian Mobile — exclude from mobile sync
The Obsidian Git plugin only tracks one repo (the vault's own), not symlinked repos
Moving files across symlink boundaries in the Obsidian file explorer doesn't work

Strategy 2: Vault IS the Claude Code Working Directory

Best for: personal knowledge management / "second brain" workflows.

The most popular approach on Twitter and blogs. Your Obsidian vault is the directory you run claude from. CLAUDE.md at the vault root serves double duty — instructions for Claude and a readable note in Obsidian.

my-vault/
├── CLAUDE.md              # Claude reads this + Obsidian displays it
├── .claude/               # Skills, hooks, settings
├── daily-notes/
├── projects/
│   ├── pixelmuse/
│   └── my-api/
├── research/
├── decisions/
└── templates/

Key patterns from the community:

CLAUDE.md at root = vault operating manual
VAULT-INDEX.md = live dashboard Claude reads first
Per-folder index.md files that Claude auto-updates when creating or deleting files

The ballred/obsidian-claude-pkm starter kit adds goal cascading with yearly, monthly, and weekly goals. Noah Vincent's IPARAG structure organizes by Inbox, Projects, Areas, Resources, Archives, and Galaxy (Zettelkasten).

This works best when your vault is your project — not when you already have repos with established structures.

Strategy 3: MCP Bridge

Best for: keeping repos clean while giving Claude access to your knowledge base.

Run Claude Code in your repo directory as normal. An MCP server running inside Obsidian lets Claude query your vault without it being the working directory.

The obsidian-claude-code-mcp plugin auto-discovers vaults via WebSocket on port 22360. Multiple vaults are supported with unique port configurations.

# You're working in your app repo
cd ~/projects/my-app
claude

# Claude Code can simultaneously query your Obsidian vault
# for notes, plans, and context — no symlinks needed

The Claudesidian MCP plugin goes further with semantic search via Ollama embeddings and full agent-mode capabilities.

Trade-off: Requires Obsidian to be running. Another moving part in your stack.

Strategy 4: One Vault Per Repo

Best for: simple setups with single-project focus.

Open each repo as its own Obsidian vault. Use userIgnoreFilters to hide non-markdown files (see the file clutter fix below).

Add .obsidian/ to your .gitignore:

# Obsidian
.obsidian/
.trash/

Downside: No cross-project search. Must switch vaults constantly. Can't see global ~/.claude/ plans alongside project files.

Strategy 5: QMD + Session Sync

Best for: heavy Claude Code users who want persistent memory across sessions.

This is the power user stack, championed by Shopify CEO Tobi Lutke's QMD tool:

QMD — semantic search over your markdown vault (60%+ token reduction vs grep/glob)
sync-claude-sessions — auto-exports Claude Code sessions to markdown on close
/recall skill — pulls relevant context before starting a new session

All local, no cloud. Claude Code sessions become searchable notes in your vault. Developer @ArtemXTech documented this stack and reported dramatically improved context recall.

Kevin Lee reported that updating all Claude Code skills with semantic chunking from QMD reduced token usage and processing time by over 60%.

Fixing the File Clutter Problem

If you're already using a code repo as an Obsidian vault and seeing PNGs and assets everywhere, here's the fix.

Step 1: Exclude directories via app.json

Open Settings > Files & Links > Excluded Files, or edit .obsidian/app.json directly:

{
  "userIgnoreFilters": ["node_modules/", ".next/", "dist/", ".git/", ".vercel/", "public/"]
}

Step 2: Exclude file types with regex patterns

In the same Excluded Files setting, add regex patterns wrapped in forward slashes:

/.*\.png/
/.*\.jpg/
/.*\.jpeg/
/.*\.svg/
/.*\.gif/
/.*\.ico/
/.*\.webp/
/.*\.js/
/.*\.ts/
/.*\.tsx/
/.*\.jsx/
/.*\.css/
/.*\.json/
/.*\.lock/

Important: This is a soft exclude. Files are hidden from search and graph view but still indexed internally by Obsidian.

Step 3: Install File Explorer++ for hard filtering

The File Explorer++ plugin supports wildcard/regex filters on file names and paths. You can toggle filters on and off, which is much more practical than the built-in settings.

Step 4: Turn off "Detect all file extensions"

In Settings > Files & Links, turn OFF "Detect all file extensions." This hides file types that Obsidian can't natively handle (JS, TS, JSON, etc.) from the explorer.

Alternative: File Ignore plugin

The File Ignore plugin uses .gitignore-style patterns and physically renames matched files with a dot prefix so Obsidian completely skips them during indexing. This is the most thorough solution but it physically modifies filenames.

Recommended Plugins

Must-Have for Developer Vaults

Plugin	Why
File Explorer++	Filter by wildcard/regex. Hide `.js`, `.png`, etc. Toggle filters on/off
Dataview	Query across all CLAUDE.md files, list plans by status, aggregate metadata
Templater	Create CLAUDE.md templates with standard sections

Claude Code Inside Obsidian (Pick One)

Plugin	Approach
Claudian	Embeds Claude Code as sidebar chat. Permission modes (YOLO/Safe/Plan)
Agent Client	Claude Code, Codex, and Gemini CLI in a side panel. Supports @mentions of notes
Claude Sidebar	Embedded terminal, auto-launches Claude Code, multiple tabs

MCP Plugins (Remote Access)

Plugin	Approach
obsidian-claude-code-mcp	Claude Code discovers vaults via WebSocket. No need to `cd` into vault
Claudesidian MCP	Full agent-mode MCP with semantic search via Ollama embeddings

Other Useful Developer Plugins

Plugin	Purpose
Folder Note	Attach a note to a folder. Click folder to open its note
File Hider	Right-click individual files/folders to hide them
Hide Folders	Pattern-based folder visibility toggle in file navigator

Dataview Queries for Claude Code Files

If you add frontmatter to your CLAUDE.md files, Dataview becomes extremely powerful.

Add frontmatter to each CLAUDE.md

---
type: claude-config
project: my-app
stack: [nextjs, tailwind, supabase]
status: active
---

Query all project configs

```

dataview
TABLE project, stack, status
FROM ""
WHERE type = "claude-config"
SORT project ASC


```
```

`

### List all Claude plans by last modified

`

```sql
```

dataview
TABLE file.mtime as "Last Modified"
FROM "claude-global/plans"
SORT file.mtime DESC


```
```

`

### Templater template for new CLAUDE.md files

```markdown
---
type: claude-config
project: <% tp.system.prompt("Project name") %>
status: active
date: <% tp.date.now("YYYY-MM-DD") %>
---

# <% tp.system.prompt("Project name") %> — Claude Code Configuration

## Tech Stack

-

## Code Quality

-

## Key Architecture

-

## Env Vars

-
```

## The Obsidian CLI Game-Changer

Obsidian 1.12 introduced a CLI that dramatically changes the integration story. Kepano (Obsidian CEO) [announced](https://x.com/kepano/status/2021251878521073847) that any agent — Claude Code, Codex, Gemini CLI — can now use Obsidian natively.

Developer @drrobcincotta [benchmarked it](https://x.com/drrobcincotta/status/2022210753575760293) on a 4,663-file, 16 GB research vault:

- **Finding orphan notes:** grep took 15.6s vs CLI at 0.26s — **54x faster**
- **Vault search:** grep 1.95s vs CLI 0.32s — **6x faster**

The three ways to connect Claude Code to Obsidian, ranked:

1. **Obsidian CLI** (fastest, most token-efficient)
2. **REST API** via community plugins
3. **Filesystem access** via grep/glob (slowest, most expensive)

Kepano is also building [official Claude Skills for Obsidian](https://x.com/kepano/status/2008578873903206895) to help Claude Code edit `.md`, `.base`, and `.canvas` files.

## Key Community Insight

The #1 rule from Greg Isenberg and InternetVin's [viral workflow video](https://www.youtube.com/watch?v=6MBq1paspVU) (59 min, 2026):

> **"Agents read, humans write."**

Your vault should contain your authentic thinking. Claude reads it for context but shouldn't pollute it with generated content. Keep Claude's outputs (plans, memory) in `~/.claude/` and your knowledge in the vault proper.

Custom slash commands from their workflow:

- **`/my-world`** — loads full vault context
- **`/today`** — morning planning from daily notes
- **`/close`** — evening reflection
- **`/trace`** — track how an idea evolved over months
- **`/ghost`** — answer in your voice using vault context

## Sources

### Blog Posts

- [Chase AI — Claude Code + Obsidian Persistent Memory](https://www.chaseai.io/blog/claude-code-obsidian-persistent-memory)
- [WhyTryAI — Build Your Second Brain](https://www.whytryai.com/p/claude-code-obsidian)
- [Noah Vincent — AI Second Brain Setup](https://noahvnct.substack.com/p/how-to-build-your-ai-second-brain)
- [Niclas Dern — My Obsidian + Claude Code Setup](https://niclasdern.substack.com/p/my-obsidian-claude-code-setup)
- [Kyle Gao — Using Claude Code with Obsidian](https://kyleygao.com/blog/2025/using-claude-code-with-obsidian/)
- [Kenneth Reitz — Obsidian Vaults and Claude Code](https://kennethreitz.org/essays/2026-03-06-obsidian_vaults_and_claude_code)
- [Sebastian Steins — Symlinks for Obsidian](https://www.ssp.sh/brain/add-external-folders-git-blog-book-to-my-obsidian-vault-via-symlink/)
- [XDA — Claude Code Inside Obsidian](https://www.xda-developers.com/claude-code-inside-obsidian-and-it-was-eye-opening/)
- [Awesome Claude — 3 Ways to Use Obsidian with Claude Code](https://awesomeclaude.ai/how-to/use-obsidian-with-claude)

### YouTube

- [Greg Isenberg + InternetVin — Obsidian + Claude Code (59 min)](https://www.youtube.com/watch?v=6MBq1paspVU)
- [Dynamous — Second Brain with Claude Code + Obsidian (41 min)](https://www.youtube.com/watch?v=jYMhDEzNAN0)
- [Connecting Claude and Obsidian: Step-by-Step Guide](https://www.youtube.com/watch?v=VeTnndXyJQI)

### Twitter/X

- [@kepano — Obsidian CEO building official Claude Skills](https://x.com/kepano/status/2008578873903206895)
- [@dwarkesh_sp — Early viral "Claude Code on Obsidian" tweet](https://x.com/dwarkesh_sp/status/1894147173782360221)
- [@drrobcincotta — Obsidian CLI benchmarks (54x faster than grep)](https://x.com/drrobcincotta/status/2022210753575760293)
- [@ArtemXTech — QMD + session sync stack](https://x.com/ArtemXTech/status/2028330693659332615)
- [@gregisenberg — Personal OS with Obsidian + Claude Code](https://x.com/gregisenberg/status/2026036464287412412)

### GitHub Repos & Templates

- [ballred/obsidian-claude-pkm](https://github.com/ballred/obsidian-claude-pkm) — Starter kit with goal cascading
- [huytieu/COG-second-brain](https://github.com/huytieu/COG-second-brain) — Self-evolving second brain template
- [ksanderer/claude-vault](https://github.com/ksanderer/claude-vault) — Git-based sync for cloud Claude Code
- [heyitsnoah/claudesidian](https://github.com/heyitsnoah/claudesidian) — Pre-configured vault structure

---

*Originally published at [StarBlog](https://blog.starmorph.com/blog/obsidian-claude-code-integration-guide)*

DEV Community: Starmorph AI

How to Build Karpathy's LLM Wiki: The Complete Guide to AI-Maintained Knowledge Bases

Why Knowledge Bases Collapse

The Three-Layer Architecture

Layer 1: Raw Sources (raw/)

Layer 2: The Wiki (wiki/)

Layer 3: The Schema (CLAUDE.md)

The Three Operations: Ingest, Query, Lint

Ingest

Query

Lint

Setting Up Your LLM Wiki with Claude Code

Step 1: Create the directory structure

Step 2: Initialize Git

Step 3: Create the CLAUDE.md schema

Step 4: Add your first sources

Step 5: Run Claude Code and ingest

The Schema: Your Most Important File

Using Obsidian as the Frontend

Graph View

Backlinks

Dataview Queries

LLM Model Names Decoded: A Developer's Guide to Parameters, Quantization & Formats

Anatomy of a Model Name

Parameters: What the Numbers Mean

Size Tiers

Bigger Isn't Always Better

Training Variants: Base vs Instruct vs Chat

Base (Pretrained)

Instruct / IT (Instruction Tuned)

Chat

Other Training Suffixes

Quantization Demystified

Precision Formats

GGUF Quantization Levels

What K-Quant Actually Does

I-Quants (Importance Matrix)

GPU-Native Quantization Methods

Model Formats: GGUF vs Safetensors vs Others

GGUF

Safetensors

MLX

Others

The Key Insight

Format Compatibility Matrix

Architecture: Dense vs Mixture of Experts

Dense Models

Mixture of Experts (MoE)

Community Fine-Tunes and Variants

Common Derivative Suffixes

Key Community Contributors

Distillation Explained

The 2026 Model Landscape

Gemma 4 (Google) — Apache 2.0

Qwen 3.5 (Alibaba) — Apache 2.0

Llama 4 (Meta) — Llama Community License

Other Notable Families

Trends Defining 2026

How to Read a Hugging Face Model Card

Repository Name

Key Files

What to Check Before Downloading

Finding the Right Variant

Decision Framework: Finding the Right Model

Step 1: Know Your Hardware Limits

Step 2: Explore What the Community Is Using

Step 3: Which Quantization?

Step 4: Which Format?

Quick-Start: Trying Models with Ollama

Glossary

Sources

Research Papers

Resources

10 CLI Tools Every Developer Should Use with AI Coding Agents

1. LazyGit

2. Glow

3. LLM Fit

4. Models CLI

5. Taproom

6. Ranger

Layer 1: Raw Sources (`raw/`)

Layer 2: The Wiki (`wiki/`)

Layer 3: The Schema (`CLAUDE.md`)