I'm developing network analyzer using C++. And sometimes I'm off to the customer site to see what's going on there. A lot of network traffic, arguing, engineering pitfall, ..... happen there.
And there's LLM, your best friend anytime you want. Because of the customers we deal with most of time we're expected to work offline. So there's no ChatGPT nor Claude or Gemini or whatever.
Since then, to help myself tech support, I tried to apply a handful of local LLM models. Here comes limited, unprofessional(since I didn't learn AI seriously), and totally personal opinions and memorandum from my experience.
Prepare for Test Material
First I needed a prompt. Yes. I've got to make a recipe that I'd like my models to do. I extracted some "rankings" from our analyzers and wrote instructions of what each column means and how to analyze using the tables.
From Google to "Open Sesame"
First I tried our good old friend, Google's Gemma3. It was versatile, fast, and lightweight(at least for me). And later I moved to Qwen3 thinking model. Though a LOT slower, the result was far better. After changing some prompts I could get similar results from Gemma3, but later it was abandoned, since the details were either incorrect or the result changed every time I ran with the same prompt.
During this phase I also tried others, including DeepSeek R1 and LG EXAOne(yes, that LG. Korean company). DeepSeek R1 was like..... a failure to me. Its report didn't make sense. For LG EXAOne, well, the result was decent(at least, on par as Qwen3 Instruct model or so), but it was prohibited to be commercially used. Oops. :P
Man it's way tooooooooooo slow - and here comes a new challenger from France - Mistral AI
Qwen3 thinking model was great. But its result took too long to be generated. I learned that I could turn off its thinking via some switch but it didn't work. Later I learned of Qwen3 instruct model, but it was FAR later...... :P
And I learned of Mistral AI. They produce decent models. Well, frankly speaking when I tried their Mistral Small models I was quite disappointed, but it was years ago. Why not try again now?
I downloaded Ministral 3 and Devstral Small 3.2. At first both gave me decent results. Minsitral 3 did it really well on generating reports, and it didn't miss any details from my instruction(it was quite unlike Gemma3). Devstral 3.2 was handful to generate some code snippet, though it didn't support FIM(Fill-in-the-Middle), which was of no use for me anyway(Qwen3-Coder couldn't suggest anything for my brand new project, so I concluded that it makes no difference).
One thing interesting was that Devstral 3.2 generated more compact and satisfying code than Qwen3Coder-Next. Before Next I tried Qwen3Coder, and it generated hallucinated code whenever I tried - a STL-ish function call which doesn't exist in any C++ standard at all. So guess what? I just dropped it.
Later I tried Qwen3Coder-Next. Well, frankly speaking the code from Qwen3Coder-Next met all the criteria I gave to it and didn't show the hallucination from Qwen3Coder, but its code was like, some old Chinese literature: full of rhetoric decorations with rich expressions, but its core is like..... void. As far as it can go it doesn't matter - I don't want to install luxury car stereo or nitro boost for my small bike!
Introduction to Quantization
When I tried Ministral and Devstral Small models, I first tried Q6_K models, but later changed to Q8_K models since my laptop could endure the loads. The difference was not that much but worth noticing. Q8_K-generated codes were more feature-rich and error-resistant. In my situation complicated error handling was not needed but it was worth having some.
official vs. Unsloth vs. bartowski
Also, I tried Unsloth and bartowski builds hoping that maybe they generate better or run faster than official ones. If they're as popular as official ones, why not trying?
After trying, I concluded Unsloth builds are quit opinionated. They provide similar results with less bits, but their optimization is so opinionated that sometimes the model doesn't work as expected. For example, though I said that the report will be written in Korean but Unsloth builds occasionally emitted Chinese and Japanese words, where bartowski builds didn't do.
At other instance, I just said "introduce yourself" and Unsloth Ministral replied with French. Later I asked for explanation and it said that " is French". OMG. Though I didn't learn French myself, at least I know there's no words like "yourself" in French.
Anyway, if you're using Latin-based language I think you can stick to Unsloth builds with smaller quantization for satisfying result, but if not, use bartowski(at least to me). For me, bartowski builds were like pure conversion of models to GGUF without any opinions or tuning so they can reflect the will of original authors more thoroughly.
Ah, and one thing - not only Ministral, but also its bigger cousin Mistral Small 3.2 was also confused about having consistency in what language it's talking in.
Quantization part 2: Qwen3-Next
After running Devstral and Ministral for a while, I learned that when quantized right, the size of Qwen3Coder-Next and Qwen3-Next becomes similar to Mistral small models. Then why not compare them? Since Qwen3-Next models has more parameters, maybe they show better results?
So I compared IQ3_M models from bartowski(since I think his quants are better to reflect original author's will) against Q8_0 models of Mistral models. And well, I concluded Qwen3-Next models provide better results if instructed right(=prompt is detailed enough), and it was faster on generating results(parsing my question took more time but answer generation was twice faster).
So, what now?
I stick to Qwen3-Next for report generation and Qwen3Coder-Next for code generation, and, hopefully FIM(if it can follow my speed of thoughts. :P).
I'd love to show my fellow tech support team about new environment I built to help them doing consulting tasks and writing reports. but it's New Lunar Year holidays(설날) in Korea I've got to take a rest first.
Well, I just hope they like it.
Top comments (0)