DEV Community

My fully offline AI-assisted Linux development machine

Deepu K Sasidharan on May 11, 2026

Originally published at deepu.tech. One of my most popular posts of all time was when I wrote about my beautiful Linux development machine in 2019...

Read full post

Peter Vivo • May 12

Looks great! I like to use linux, at least unix based terminal. For example my company laptop is a windows11 but the wls install ubuntu 22.4 partial solve my development workflow. I know that is fare from this handcraftect solutions, but the company requriments are strict, even I can't reach the dev.to from some weird company policy from my working computer. Any way I like your work!

uiqtwe6 • May 12

Is it a company laptop?

Peter Vivo • May 12

2020 Dell i5 16GB Ram, worn english layout keyboard, but I always using US layout - minor confusion.
A good news copilot cli running on cloud so that capacity don't effect the computer.

Deepu K Sasidharan • May 12

Neah

Fyodor • May 12

That's a helluva broputer... 😅

Deepu K Sasidharan • May 12

I'm gonna steal broputer 😂 although not sure if I should be offended or not 🤣

Fyodor • May 13

Nah, no offense, that’s a really cool setup made with lots of love and dedication, I’m pretty sure it pays off big time 👍🏼

Rajas Poorna • May 13

Lovely setup!
Have you considered using Qwen3.6 35BA3B?
I use it on my MI50 32GB and basically get a 3x boost in tokens/s (both in and out) for not much intelligence penalty. Also probably worth turning on the feature to remember its thinking, given that you can support its full context window.
Once I saw that kind of tokens/s it was hard to justify the slower dense models.

Deepu K Sasidharan • May 13

I haven't personally tried it since I saw someone comparing that with dense models for long context tasks and the MOE models hallucinated way more when context was big. I will try it when I have time and see.

Deepu K Sasidharan • May 13 • Edited

What context are you using

Vic Chen • May 12

This is the dream setup for anyone who cares about owning their stack. The llama.cpp + ROCm combo on the Flow Z13 is impressive — 128GB unified memory changes the calculus for local AI entirely. I've been thinking about a similar local-first approach for some of my financial data analysis pipelines where I really don't want prompts hitting third-party APIs. The tradeoff you mentioned about context-length slowdown with 27B models matches what I've seen too. Qwen3.6 Q8_0 at 256k context is a solid sweet spot. Thanks for sharing the bench numbers and the archdots repo — exactly the kind of practical detail that's hard to find.

Immanuel Gabriel • May 13

Never even knew one could do something like this. so creative. I appear to have a long way to go.

Vikassh. • May 12

Nice article. I never thought about this approach before

Galileo G • May 12

Try Krusader or similar 2 pane keyboard heavy file managers.

Vikassh. • May 13

How has this setup performed under real traffic

Deepu K Sasidharan • May 13

I have been using it for reviews, quick fixes, repo research etc and have been quite good. Right now building a full fledged filesystem management TUI in Rust. Will report back my findings. So far very impressed, i'm 3 prompts in and its fxing issues after first iteration.

th3cavalry • May 13

Check out my repo. I've got the keyboard and back window RGB working.

github.com/th3cavalry/GZ302-Linux-...

Deepu K Sasidharan • May 13

Super cool. thanks for sharing. i will use it as benchmark to test the model.

Mininglamp • May 14

For coding assistance, a well-quantized 4B model at 40+ tok/s beats a 27B model at 8 tok/s in actual productivity. The bottleneck isn't intelligence — it's iteration speed. At 50-100 completions per hour, latency compounds fast. The practical setup: small+fast model for flow state (completions, quick edits), big+slow model for architecture planning and code review invoked 2-3 times per session. Two-tier local beats single-tier every time.

Deepu K Sasidharan • May 14

Also 8 tok/s is not that bad when you generating code. The 35B A3B will get you around 40 tok/s if the task doesn't need best intelligence. 4B models are not useful atleast for what I do. Again you do you. Everyone has different needs.

Deepu K Sasidharan • May 14

Have you tried the said models? In my experience anything other than the dense models aren't that useful for serious coding. Maybe ok for minor stuff but not for generating entire apps or fixing complex issues.

Andy Nian • May 13

A fully offline AI-assisted setup sounds cool, but wouldn't keeping updates and dependencies for tools like llama.cpp be a hassle without cloud access? How do you manage version control on your offline setup? I can see isolation making versioning tricky. If you're getting your Linux environment ready for job interviews, take a look at prachub.com. They have company-tagged coding banks that could be useful, especially if you're targeting a specific tech role.

GoDavaii - Advanced Health AI • May 15

Fascinating local AI setup. While great for development, true accessibility in health AI-especially for global, voice-first users-needs to move beyond local inference. The next billion users will speak symptoms like 'kaaichal' (Tamil for fever) in their mother tongue, not type them.\n\nThis demands robust, scalable voice models that understand...