DEV Community

Cover image for My fully offline AI-assisted Linux development machine

My fully offline AI-assisted Linux development machine

Deepu K Sasidharan on May 11, 2026

Originally published at deepu.tech. One of my most popular posts of all time was when I wrote about my beautiful Linux development machine in 2019...
Collapse
 
webreflection profile image
Andrea Giammarchi

I have a similar machine but it's a Desktop one (minisforum 395+ 128GB) but while I've never looked into its BIOS, I've thought the whole point of these machines was to have similar unified memory DGX spark has, as example (and I have one of those too) ... is there any reason you had to explicitly split 64GB of memory here and there as opposite of letting the machine/OS handle that for you? Specially DS4 project (which I love and use on DGX Spark) requires 96GB minimum to run but it doesn't necessarily need to take all that space, although I believe with a 32GB CPU split and a 96GB for the GPU that project should run, still curious to learn/know why nobody on macOS needs to worry about this, and neither do I on my DGX Spark (or maybe it comes pre-configured to handle that automatically) ... thanks!

That being said, nice post ... I feel you for the AMD ROCm state but it's really getting better day by day, can't wait to have it more reliable/robust to make it the mac alternative for developers!

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Last I tried there was some issues in loading models larger than RAM. But I think its not an issue on newer kernels, I'm planning on disabling the split and see how my previous use cases work now.

Collapse
 
harjjotsinghh profile image
Harjot Singh

i love that you're focusing on a fully offline setup for AI-assisted development. it’s cool to see how you've customized your environment with arch and niri. if you're ever interested in quickly spinning up a web app, moonshift lets you deploy a next.js + postgres + auth build in about 7 minutes, and you keep the code on your github. let me know if you want to give it a shot for free.

Collapse
 
adityamitra profile image
Aditya Mitra

You should also give omp.sh a try.
I found it much better in speed and management that opencode.

Collapse
 
pengeszikra profile image
Peter Vivo

Looks great! I like to use linux, at least unix based terminal. For example my company laptop is a windows11 but the wls install ubuntu 22.4 partial solve my development workflow. I know that is fare from this handcraftect solutions, but the company requriments are strict, even I can't reach the dev.to from some weird company policy from my working computer. Any way I like your work!

Collapse
 
78q6d profile image
uiqtwe6

Is it a company laptop?

Collapse
 
pengeszikra profile image
Peter Vivo

2020 Dell i5 16GB Ram, worn english layout keyboard, but I always using US layout - minor confusion.
A good news copilot cli running on cloud so that capacity don't effect the computer.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Neah

Collapse
 
fyodorio profile image
Fyodor

That's a helluva broputer... πŸ˜…

only for bros

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I'm gonna steal broputer πŸ˜‚ although not sure if I should be offended or not 🀣

Collapse
 
fyodorio profile image
Fyodor

Nah, no offense, that’s a really cool setup made with lots of love and dedication, I’m pretty sure it pays off big time πŸ‘πŸΌ

Collapse
 
rajas_poorna_0f9376cca3f6 profile image
Rajas Poorna

Lovely setup!
Have you considered using Qwen3.6 35BA3B?
I use it on my MI50 32GB and basically get a 3x boost in tokens/s (both in and out) for not much intelligence penalty. Also probably worth turning on the feature to remember its thinking, given that you can support its full context window.
Once I saw that kind of tokens/s it was hard to justify the slower dense models.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I haven't personally tried it since I saw someone comparing that with dense models for long context tasks and the MOE models hallucinated way more when context was big. I will try it when I have time and see.

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

What context are you using

Collapse
 
vicchen profile image
Vic Chen

This is the dream setup for anyone who cares about owning their stack. The llama.cpp + ROCm combo on the Flow Z13 is impressive β€” 128GB unified memory changes the calculus for local AI entirely. I've been thinking about a similar local-first approach for some of my financial data analysis pipelines where I really don't want prompts hitting third-party APIs. The tradeoff you mentioned about context-length slowdown with 27B models matches what I've seen too. Qwen3.6 Q8_0 at 256k context is a solid sweet spot. Thanks for sharing the bench numbers and the archdots repo β€” exactly the kind of practical detail that's hard to find.

Collapse
 
v_rai_7a0813fcee9d16 profile image
Vikassh.

Nice article. I never thought about this approach before

Collapse
 
v_rai_7a0813fcee9d16 profile image
Vikassh.

How has this setup performed under real traffic

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have been using it for reviews, quick fixes, repo research etc and have been quite good. Right now building a full fledged filesystem management TUI in Rust. Will report back my findings. So far very impressed, i'm 3 prompts in and its fxing issues after first iteration.

Collapse
 
th3cavalry profile image
th3cavalry

Check out my repo. I've got the keyboard and back window RGB working.

github.com/th3cavalry/GZ302-Linux-...

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Super cool. thanks for sharing. i will use it as benchmark to test the model.

Collapse
 
immanuel_gabriel_341393bf profile image
Immanuel Gabriel

Never even knew one could do something like this. so creative. I appear to have a long way to go.

Collapse
 
galileo_g_60bdf6defcc5ae7 profile image
Galileo G

Try Krusader or similar 2 pane keyboard heavy file managers.

Collapse
 
mininglamp profile image
Mininglamp

For coding assistance, a well-quantized 4B model at 40+ tok/s beats a 27B model at 8 tok/s in actual productivity. The bottleneck isn't intelligence β€” it's iteration speed. At 50-100 completions per hour, latency compounds fast. The practical setup: small+fast model for flow state (completions, quick edits), big+slow model for architecture planning and code review invoked 2-3 times per session. Two-tier local beats single-tier every time.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Have you tried the said models? In my experience anything other than the dense models aren't that useful for serious coding. Maybe ok for minor stuff but not for generating entire apps or fixing complex issues.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Also 8 tok/s is not that bad when you generating code. The 35B A3B will get you around 40 tok/s if the task doesn't need best intelligence. 4B models are not useful atleast for what I do. Again you do you. Everyone has different needs.

Collapse
 
harjjotsinghh profile image
Harjot Singh

Fully-offline AI dev is more practical now than most people realize, and posts like this matter because they show the quality floor of local models has crossed "actually useful" for a lot of day-to-day work. Privacy, zero marginal cost, no rate limits, works on a plane - the tradeoffs increasingly favor local for the bulk of tasks.

The honest setup most people land on is hybrid: local model handles the high-volume mechanical work offline/free, and you reach for a frontier API only on the rare genuinely-hard problem where the local model's ceiling shows. Even then, you've moved 80% of your usage off the meter. Your writeup is a good blueprint for that - curious which local model you settled on and where you still felt the need to phone home to a bigger one. Great build.

Collapse
 
prachub profile image
PracHub

A fully offline AI-assisted setup sounds cool, but wouldn't keeping updates and dependencies for tools like llama.cpp be a hassle without cloud access? How do you manage version control on your offline setup? I can see isolation making versioning tricky. If you're getting your Linux environment ready for job interviews, take a look at prachub.com. They have company-tagged coding banks that could be useful, especially if you're targeting a specific tech role.

Collapse
 
godavaii profile image
GoDavaii - Advanced Health AI

Fascinating local AI setup. While great for development, true accessibility in health AI-especially for global, voice-first users-needs to move beyond local inference. The next billion users will speak symptoms like 'kaaichal' (Tamil for fever) in their mother tongue, not type them.\n\nThis demands robust, scalable voice models that understand...