DEV Community

Mininglamp
Mininglamp

Posted on

Three Open-Source Projects That Turn Your Mac Into a Private AI Workstation

The idea of running AI agents entirely on your laptop used to be a joke. A fun thought experiment you'd entertain over coffee before switching back to your cloud API dashboard and watching the bills pile up.

In 2026, it's a real workflow.

Not a demo. Not a "technically possible if you squint" proof of concept. An actual, production-grade stack where a vision-language model sees your screen, operates your apps, accelerates inference on Apple Silicon, and builds entire applications from a product spec — all without a single byte leaving your machine.

At Mininglamp Technology, we've been building toward this with three open-source projects. Each solves a distinct piece of the on-device AI puzzle. Together, they form something we think is genuinely new: a complete private AI workstation stack that runs on a Mac.

Let's walk through them.


1. Mano-P: The Agent That Sees Your Screen

Repo: github.com/Mininglamp-AI/Mano-P

Most "AI agents" are glorified API wrappers. They read text, call tools, and hope the tool's interface hasn't changed since the prompt was written. Mano-P takes a fundamentally different approach: it's a GUI-VLA (Vision-Language-Action) model that perceives your screen the way a human does — by looking at it.

Mano-P comes in two sizes:

  • 72B (cloud/server): The full model, currently ranked #1 on OSWorld with a score of 58.2% — a significant lead over the second-place opencua-72b at 45.0%.
  • 4B (local): A distilled model designed to run entirely on-device. On an M5 Pro, it decodes at roughly ~80 tokens/second with a peak memory footprint of just 4.3GB. It runs on M4 chips with 32GB RAM.

Mano-P Architecture

What makes this interesting isn't just the benchmark numbers — it's the interaction model. Mano-P doesn't need custom integrations or tool definitions. It sees buttons, text fields, menus, and dialogs the same way you do. Tell it "open Safari and find the latest Hacker News post about Rust," and it navigates the GUI visually, clicking and typing as needed.

The 72B model also includes WebRetriever, a web navigation component that scores 41.7 on NavEval — ahead of Gemini 2.5 Pro (40.9) and Claude 4.5 (31.3). Web browsing as a first-class agent capability, not an afterthought.

Why This Matters

The traditional approach to computer-use agents is brittle. You build tool adapters, maintain API schemas, and pray that the next macOS update doesn't break your Accessibility API hooks. A vision-first agent sidesteps all of that. If a human can use the app, Mano-P can use the app.


2. Cider: Inference Acceleration for Apple Silicon

Repo: github.com/Mininglamp-AI/cider

Running a 4B model at 80 tok/s on a Mac doesn't happen by accident. It requires an inference engine that actually understands Apple Silicon's hardware characteristics. That's what Cider is.

Cider is an inference acceleration SDK built specifically for Apple's M-series chips. Its key contribution is activation quantization — specifically W8A8 and W4A8 schemes — which fills a gap that MLX currently doesn't cover. MLX supports weight-only quantization (W4A16, W8A16), but activations stay in full precision. Cider quantizes both weights and activations, which unlocks substantially better throughput.

Benchmark Overview

The Numbers

On an M5 Pro, Cider delivers 1.4–2.2x faster inference compared to MLX W4A16, depending on the quantization granularity you choose:

Quantization Granularity Speedup vs MLX W4A16
W8A8 / W4A8 Per-channel 1.8x (fastest)
W8A8 / W4A8 Per-group (gs=128) 1.5x
W8A8 / W4A8 Per-group (gs=64) 1.3x

There's a tradeoff between speed and accuracy, as you'd expect. On the CUA Benchmark (M5, 16GB), W8A16 quantization maintains 58.0% accuracy while W8A8 comes in at 54.0%. Depending on your use case, that 4-point delta may or may not matter — for many agentic workflows, the speed gain is worth it.

Why Not Just Use MLX?

This isn't about replacing MLX. MLX is excellent at what it does. But weight-only quantization hits a wall when you need both low memory and high throughput for real-time agent interactions. Activation quantization is the next lever, and right now, Cider is the open-source option that pulls it on Apple Silicon.

Think of it this way: MLX gives you the foundation. Cider fills the gap in activation quantization that lets you push throughput further on the same hardware.


3. Mano-AFK: The Autonomous App Builder

Repo: github.com/Mininglamp-AI/mano-afk

This is where things get wild.

Mano-AFK takes a PRD (Product Requirements Document) and turns it into a working application. Not a skeleton. Not boilerplate. A deployed, tested application — with zero human intervention in the loop.

Here's the pipeline:

  1. Read the PRD — Parse requirements, extract features, identify tech stack
  2. Write the code — Generate the full application
  3. Deploy it — Spin up a local or containerized environment
  4. Test it visually — Using Mano-P's vision model to actually look at the running app
  5. Find bugs — Compare what's on screen to what the PRD specified
  6. Fix them — Modify code, redeploy, retest

Mano-Action Training Flow

The critical piece here is step 4. Most code-generation tools "test" by running unit tests they also generated — which is roughly as useful as grading your own homework. Mano-AFK uses Mano-P's vision capabilities to perform visual testing: it loads the app, looks at the screen, and verifies that the UI actually matches the spec. A button that's supposed to be blue but renders as white? Caught. A form that submits but shows no confirmation? Caught.

This closes the loop in a way that pure code generation can't. The vision model acts as an independent quality gate that evaluates the artifact, not just the source.

What It's Good For

Mano-AFK shines for internal tools, prototypes, and MVPs where the cost of human QA exceeds the cost of iteration cycles. It's not going to replace your engineering team on a complex distributed system. But for "I need a dashboard that shows these metrics with these filters by Thursday"? It's remarkably capable.


The Stack: Model → Accelerator → Builder

Here's where the three projects become more than the sum of their parts.

┌─────────────────────────────────────────────┐
│              Your Mac (M4+ / 32GB)          │
│                                             │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐ │
│  │  Mano-P  │  │  Cider   │  │ Mano-AFK  │ │
│  │  (Agent) │──│  (Accel) │──│ (Builder) │ │
│  │  4B VLA  │  │  W8A8    │  │ PRD→App   │ │
│  └──────────┘  └──────────┘  └───────────┘ │
│                                             │
│  Data stays here. Always.                   │
└─────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Mano-P provides the vision-language-action intelligence — the ability to see, understand, and act on screen content. Cider accelerates inference so that intelligence runs at interactive speeds on consumer hardware. Mano-AFK orchestrates multi-step autonomous workflows, using Mano-P as both its brain and its eyes.

The result is a stack where:

  • Your AI agent perceives and operates your entire desktop
  • Inference is fast enough for real-time interaction (not "wait 30 seconds per action" fast — actually fast)
  • Autonomous workflows can build, deploy, and quality-test applications without human involvement
  • Nothing leaves your machine. No API calls to external servers. No telemetry. No data exfiltration vectors. Your code, your screen content, your documents — they stay on your Mac.

That last point matters more than people think. Enterprise teams working with proprietary code, healthcare organizations handling patient data, legal teams reviewing confidential documents — these groups can't use cloud AI agents, period. An on-device stack isn't a nice-to-have for them. It's the only option.

Hardware Requirements

Let's be clear about what you need: Apple M4 with 32GB of RAM is the minimum for running the 4B model at usable speeds. An M5 Pro will give you the best experience. This isn't a "runs on any Mac" situation — you need the unified memory bandwidth and Neural Engine capabilities of recent Apple Silicon.


The Bigger Picture

We're not claiming this replaces cloud AI. The 72B model exists for a reason — some workloads need that scale, and running it requires serious hardware. What we are saying is that the gap between "cloud-only" and "runs on your laptop" has narrowed dramatically, and for a growing category of workflows, the on-device option is not just viable but preferable.

The three forces driving this:

  1. Model distillation has gotten remarkably good. The 4B Mano-P retains enough capability from its 72B parent to handle real-world GUI tasks.
  2. Apple Silicon's unified memory architecture is uniquely suited to LLM inference. High memory bandwidth + large unified pool = exactly what transformer decoding needs.
  3. Activation quantization (via Cider) closes the remaining throughput gap. Weight-only quantization was the easy win; activation quantization is the hard one that makes real-time interaction possible.

The open-source angle matters here too. These aren't black-box binaries. You can inspect the model weights, audit the inference engine, verify that nothing phones home. For privacy-sensitive deployments, "trust us" isn't good enough. "Read the code" is.


Get Started

All three projects are released under Apache 2.0 — use them commercially, fork them, contribute back, or just kick the tires.

If you build something with them, we'd love to hear about it. File an issue, open a PR, or just star the repos if you think this direction is worth pursuing.

The future of AI workstations isn't in the cloud. It's on your desk.


Mininglamp Technology builds AI infrastructure for enterprises. Our open-source projects focus on on-device AI agents, inference optimization, and autonomous software development. Learn more at github.com/Mininglamp-AI.

Top comments (0)