Definitely not just you — this trips up almost everyone starting out with local LLMs!
The short answer: context window rot. Once you've been in the same chat for a while, the model loses track of the big picture and starts contradicting itself. It's not your prompting — it's just how these models work.
This is actually a key difference from paid cloud models like Claude or GPT. Those have longer context windows and smarter handling of long conversations. With local LLMs, context management is entirely your responsibility — the model won't clean up after itself.
Quickest fix: treat each task as a fresh session. When something breaks and the LLM starts looping, don't keep pushing — open a new chat, paste only the relevant file, and describe exactly what you want changed.
Also worth trying: write a short CONTEXT.md in your project with the app's structure and rules, and paste it at the top of each new session. Forces you to think clearly AND gives the model a clean anchor.
Your setup (qwen3-coder q5_K_M) is genuinely good — you're not limited by hardware here. Local LLMs just require a bit more discipline from the user side to get the best out of them. 😄
I’m a self-taught developer building practical tools and content around Linux, Bash, and modern web workflows. I started this journey by breaking things, fixing them, and turning those lessons
Thanks for the tips! I have implemented a few variation of each one you mentioned. I'm very glad to know that if I am able to start generating a bit of passive income, that it would be well vested to check out some subscription models for more than just these reasons, too.
I've tried creating a new session for every task, while have tailored .md files and rules so that the context remains as "un-bloated" as possible.
I have a decent setup to definitely do some learning and practicing. 1TB SSD, i7 8600, RTX 2070, 64G RAM. 8gigs of vRAM is cauding my bottleneck which i'm sure that it for you too.
I have my 2020 macbook pro 8G M1 in front of my 2 screens to the PC. I've considered incorporating it in somehow to take a bit of the load off. Glad i'm pushing in the right direction.
The .md rules file approach is exactly right — you're already thinking like someone who's been doing this a while!
And yes, 8GB VRAM is the real bottleneck on the RTX side. But here's the thing — your M1 MacBook might actually be more useful than you think. Ollama on Apple Silicon uses Unified Memory, so the full 8GB is shared between CPU and GPU with no hard split. It handles mid-size models surprisingly well, and for lighter tasks it can genuinely take load off your main machine.
Also worth checking out: Google Antigravity (antigravity.google). It's Google's new agent-first IDE, free during public preview, with generous Gemini usage included. The big advantage here is that it runs on cloud-side LLMs — so you don't need beefy local hardware at all. Works on Linux, Windows, and macOS, so both your machines are covered. One heads-up though — as with most preview-stage tools, there's a chance your inputs are used for model training, so I'd avoid putting sensitive or commercial code through it.
As for paid models — honestly, even occasional use of Claude or GPT for the tricky architectural decisions is worth it. You don't need a full subscription right away; just knowing when to reach for the right tool makes a big difference.
Sounds like you're pushing in exactly the right direction. Good luck with the build! 😄
I’m a self-taught developer building practical tools and content around Linux, Bash, and modern web workflows. I started this journey by breaking things, fixing them, and turning those lessons
I was getting a bit intimidated thinking that my approaches, because I tried several, were all wrong. It's very relieving and reassuring to know that I wasn't wasting time, and that I, indeed, have a foundation to build on that was all productive and correct 🤩
Even with the M1 8G, I wasn't underrating it much at all because I've heard a couple places that the MacBooks ability to do what you're explaining here, so that's very convenient.
And I think you just pushed me to dedicate a bit of time one day this week to giving Google's Antigravity a shot. I've heard things here and there about it, and all good so far. One frustration of mine that prevents me from trying new software or apps is checking it out and getting into the groove of things and having a good time learning, and then BOOM, you hit a paywall. I kinda just assumed that I would hit one early on with Antigravity because I figured that dealing with Cloud models would blow through tokens and such.
Still learning a lot though!
Thanks for the advice! It's all been phenomenal. 🙃
Cheers!
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Definitely not just you — this trips up almost everyone starting out with local LLMs!
The short answer: context window rot. Once you've been in the same chat for a while, the model loses track of the big picture and starts contradicting itself. It's not your prompting — it's just how these models work.
This is actually a key difference from paid cloud models like Claude or GPT. Those have longer context windows and smarter handling of long conversations. With local LLMs, context management is entirely your responsibility — the model won't clean up after itself.
Quickest fix: treat each task as a fresh session. When something breaks and the LLM starts looping, don't keep pushing — open a new chat, paste only the relevant file, and describe exactly what you want changed.
Also worth trying: write a short
CONTEXT.mdin your project with the app's structure and rules, and paste it at the top of each new session. Forces you to think clearly AND gives the model a clean anchor.Your setup (qwen3-coder q5_K_M) is genuinely good — you're not limited by hardware here. Local LLMs just require a bit more discipline from the user side to get the best out of them. 😄
Glad the post helped — good luck with the build!
Thanks for the tips! I have implemented a few variation of each one you mentioned. I'm very glad to know that if I am able to start generating a bit of passive income, that it would be well vested to check out some subscription models for more than just these reasons, too.
I've tried creating a new session for every task, while have tailored .md files and rules so that the context remains as "un-bloated" as possible.
I have a decent setup to definitely do some learning and practicing. 1TB SSD, i7 8600, RTX 2070, 64G RAM. 8gigs of vRAM is cauding my bottleneck which i'm sure that it for you too.
I have my 2020 macbook pro 8G M1 in front of my 2 screens to the PC. I've considered incorporating it in somehow to take a bit of the load off. Glad i'm pushing in the right direction.
Thanks again! 😊
The
.mdrules file approach is exactly right — you're already thinking like someone who's been doing this a while!And yes, 8GB VRAM is the real bottleneck on the RTX side. But here's the thing — your M1 MacBook might actually be more useful than you think. Ollama on Apple Silicon uses Unified Memory, so the full 8GB is shared between CPU and GPU with no hard split. It handles mid-size models surprisingly well, and for lighter tasks it can genuinely take load off your main machine.
Also worth checking out: Google Antigravity (antigravity.google). It's Google's new agent-first IDE, free during public preview, with generous Gemini usage included. The big advantage here is that it runs on cloud-side LLMs — so you don't need beefy local hardware at all. Works on Linux, Windows, and macOS, so both your machines are covered. One heads-up though — as with most preview-stage tools, there's a chance your inputs are used for model training, so I'd avoid putting sensitive or commercial code through it.
As for paid models — honestly, even occasional use of Claude or GPT for the tricky architectural decisions is worth it. You don't need a full subscription right away; just knowing when to reach for the right tool makes a big difference.
Sounds like you're pushing in exactly the right direction. Good luck with the build! 😄
Awesome!
I was getting a bit intimidated thinking that my approaches, because I tried several, were all wrong. It's very relieving and reassuring to know that I wasn't wasting time, and that I, indeed, have a foundation to build on that was all productive and correct 🤩
Even with the M1 8G, I wasn't underrating it much at all because I've heard a couple places that the MacBooks ability to do what you're explaining here, so that's very convenient.
And I think you just pushed me to dedicate a bit of time one day this week to giving Google's Antigravity a shot. I've heard things here and there about it, and all good so far. One frustration of mine that prevents me from trying new software or apps is checking it out and getting into the groove of things and having a good time learning, and then BOOM, you hit a paywall. I kinda just assumed that I would hit one early on with Antigravity because I figured that dealing with Cloud models would blow through tokens and such.
Still learning a lot though!
Thanks for the advice! It's all been phenomenal. 🙃
Cheers!