Originally published at deepu.tech.
One of my most popular posts of all time was when I wrote about my beautiful Linux development machine in 2019...
For further actions, you may consider blocking this person and/or reporting abuse
Looks great! I like to use linux, at least unix based terminal. For example my company laptop is a windows11 but the
wls install ubuntu 22.4partial solve my development workflow. I know that is fare from this handcraftect solutions, but the company requriments are strict, even I can't reach the dev.to from some weird company policy from my working computer. Any way I like your work!Is it a company laptop?
2020 Dell i5 16GB Ram, worn english layout keyboard, but I always using US layout - minor confusion.
A good news copilot cli running on cloud so that capacity don't effect the computer.
Neah
That's a helluva broputer... 😅
I'm gonna steal broputer 😂 although not sure if I should be offended or not 🤣
Nah, no offense, that’s a really cool setup made with lots of love and dedication, I’m pretty sure it pays off big time 👍🏼
Lovely setup!
Have you considered using Qwen3.6 35BA3B?
I use it on my MI50 32GB and basically get a 3x boost in tokens/s (both in and out) for not much intelligence penalty. Also probably worth turning on the feature to remember its thinking, given that you can support its full context window.
Once I saw that kind of tokens/s it was hard to justify the slower dense models.
I haven't personally tried it since I saw someone comparing that with dense models for long context tasks and the MOE models hallucinated way more when context was big. I will try it when I have time and see.
What context are you using
This is the dream setup for anyone who cares about owning their stack. The llama.cpp + ROCm combo on the Flow Z13 is impressive — 128GB unified memory changes the calculus for local AI entirely. I've been thinking about a similar local-first approach for some of my financial data analysis pipelines where I really don't want prompts hitting third-party APIs. The tradeoff you mentioned about context-length slowdown with 27B models matches what I've seen too. Qwen3.6 Q8_0 at 256k context is a solid sweet spot. Thanks for sharing the bench numbers and the archdots repo — exactly the kind of practical detail that's hard to find.
Never even knew one could do something like this. so creative. I appear to have a long way to go.
Nice article. I never thought about this approach before
Try Krusader or similar 2 pane keyboard heavy file managers.
How has this setup performed under real traffic
I have been using it for reviews, quick fixes, repo research etc and have been quite good. Right now building a full fledged filesystem management TUI in Rust. Will report back my findings. So far very impressed, i'm 3 prompts in and its fxing issues after first iteration.
Check out my repo. I've got the keyboard and back window RGB working.
github.com/th3cavalry/GZ302-Linux-...
Super cool. thanks for sharing. i will use it as benchmark to test the model.
For coding assistance, a well-quantized 4B model at 40+ tok/s beats a 27B model at 8 tok/s in actual productivity. The bottleneck isn't intelligence — it's iteration speed. At 50-100 completions per hour, latency compounds fast. The practical setup: small+fast model for flow state (completions, quick edits), big+slow model for architecture planning and code review invoked 2-3 times per session. Two-tier local beats single-tier every time.
Also 8 tok/s is not that bad when you generating code. The 35B A3B will get you around 40 tok/s if the task doesn't need best intelligence. 4B models are not useful atleast for what I do. Again you do you. Everyone has different needs.
Have you tried the said models? In my experience anything other than the dense models aren't that useful for serious coding. Maybe ok for minor stuff but not for generating entire apps or fixing complex issues.
A fully offline AI-assisted setup sounds cool, but wouldn't keeping updates and dependencies for tools like llama.cpp be a hassle without cloud access? How do you manage version control on your offline setup? I can see isolation making versioning tricky. If you're getting your Linux environment ready for job interviews, take a look at prachub.com. They have company-tagged coding banks that could be useful, especially if you're targeting a specific tech role.
Fascinating local AI setup. While great for development, true accessibility in health AI-especially for global, voice-first users-needs to move beyond local inference. The next billion users will speak symptoms like 'kaaichal' (Tamil for fever) in their mother tongue, not type them.\n\nThis demands robust, scalable voice models that understand...