DEV Community

Cover image for My fully offline AI-assisted Linux development machine

My fully offline AI-assisted Linux development machine

Deepu K Sasidharan on May 11, 2026

Originally published at deepu.tech. One of my most popular posts of all time was when I wrote about my beautiful Linux development machine in 2019...
Collapse
 
pengeszikra profile image
Peter Vivo

Looks great! I like to use linux, at least unix based terminal. For example my company laptop is a windows11 but the wls install ubuntu 22.4 partial solve my development workflow. I know that is fare from this handcraftect solutions, but the company requriments are strict, even I can't reach the dev.to from some weird company policy from my working computer. Any way I like your work!

Collapse
 
78q6d profile image
uiqtwe6

Is it a company laptop?

Collapse
 
pengeszikra profile image
Peter Vivo

2020 Dell i5 16GB Ram, worn english layout keyboard, but I always using US layout - minor confusion.
A good news copilot cli running on cloud so that capacity don't effect the computer.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Neah

Collapse
 
fyodorio profile image
Fyodor

That's a helluva broputer... 😅

only for bros

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I'm gonna steal broputer 😂 although not sure if I should be offended or not 🤣

Collapse
 
fyodorio profile image
Fyodor

Nah, no offense, that’s a really cool setup made with lots of love and dedication, I’m pretty sure it pays off big time 👍🏼

Collapse
 
rajas_poorna_0f9376cca3f6 profile image
Rajas Poorna

Lovely setup!
Have you considered using Qwen3.6 35BA3B?
I use it on my MI50 32GB and basically get a 3x boost in tokens/s (both in and out) for not much intelligence penalty. Also probably worth turning on the feature to remember its thinking, given that you can support its full context window.
Once I saw that kind of tokens/s it was hard to justify the slower dense models.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I haven't personally tried it since I saw someone comparing that with dense models for long context tasks and the MOE models hallucinated way more when context was big. I will try it when I have time and see.

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

What context are you using

Collapse
 
vicchen profile image
Vic Chen

This is the dream setup for anyone who cares about owning their stack. The llama.cpp + ROCm combo on the Flow Z13 is impressive — 128GB unified memory changes the calculus for local AI entirely. I've been thinking about a similar local-first approach for some of my financial data analysis pipelines where I really don't want prompts hitting third-party APIs. The tradeoff you mentioned about context-length slowdown with 27B models matches what I've seen too. Qwen3.6 Q8_0 at 256k context is a solid sweet spot. Thanks for sharing the bench numbers and the archdots repo — exactly the kind of practical detail that's hard to find.

Collapse
 
immanuel_gabriel_341393bf profile image
Immanuel Gabriel

Never even knew one could do something like this. so creative. I appear to have a long way to go.

Collapse
 
v_rai_7a0813fcee9d16 profile image
Vikassh.

Nice article. I never thought about this approach before

Collapse
 
galileo_g_60bdf6defcc5ae7 profile image
Galileo G

Try Krusader or similar 2 pane keyboard heavy file managers.

Collapse
 
v_rai_7a0813fcee9d16 profile image
Vikassh.

How has this setup performed under real traffic

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have been using it for reviews, quick fixes, repo research etc and have been quite good. Right now building a full fledged filesystem management TUI in Rust. Will report back my findings. So far very impressed, i'm 3 prompts in and its fxing issues after first iteration.

Collapse
 
th3cavalry profile image
th3cavalry

Check out my repo. I've got the keyboard and back window RGB working.

github.com/th3cavalry/GZ302-Linux-...

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Super cool. thanks for sharing. i will use it as benchmark to test the model.

Collapse
 
mininglamp profile image
Mininglamp

For coding assistance, a well-quantized 4B model at 40+ tok/s beats a 27B model at 8 tok/s in actual productivity. The bottleneck isn't intelligence — it's iteration speed. At 50-100 completions per hour, latency compounds fast. The practical setup: small+fast model for flow state (completions, quick edits), big+slow model for architecture planning and code review invoked 2-3 times per session. Two-tier local beats single-tier every time.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Also 8 tok/s is not that bad when you generating code. The 35B A3B will get you around 40 tok/s if the task doesn't need best intelligence. 4B models are not useful atleast for what I do. Again you do you. Everyone has different needs.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Have you tried the said models? In my experience anything other than the dense models aren't that useful for serious coding. Maybe ok for minor stuff but not for generating entire apps or fixing complex issues.

Collapse
 
xiaoming_nian_94953c8c9b8 profile image
Andy Nian

A fully offline AI-assisted setup sounds cool, but wouldn't keeping updates and dependencies for tools like llama.cpp be a hassle without cloud access? How do you manage version control on your offline setup? I can see isolation making versioning tricky. If you're getting your Linux environment ready for job interviews, take a look at prachub.com. They have company-tagged coding banks that could be useful, especially if you're targeting a specific tech role.

Collapse
 
godavaii profile image
GoDavaii - Advanced Health AI

Fascinating local AI setup. While great for development, true accessibility in health AI-especially for global, voice-first users-needs to move beyond local inference. The next billion users will speak symptoms like 'kaaichal' (Tamil for fever) in their mother tongue, not type them.\n\nThis demands robust, scalable voice models that understand...