So to keep it short, I'll lay out the details for quick readers, it's Bitnet 1.58 bonsai-8B+ bitnet.cpp(or llama.cpp of ur lazy) + tools like persistent memory and auto batching,(or just use ollama and use community plugins) ummm yeah that's it, if ur in for the juice here's more:
- Llama.cpp is good honestly works well for Bitnet unless u feel maso u can use Bitnet very similar low learning curve, but try use 512 batching (works for me)if u have a dedicated GPU unlike please use that, it'll get ur bestie,
- If ur feeling risky use early speculation like a small 0.5B model but Bitnet is fast enough already(also adds unnecessary ram overhead, or idk lora TTT is a good way? Too many things to do)
- Why Bitnet, speed and just raw general IQ is dense AF(gives like 7B accuracy and 45t/s but don't take my word for it,but also so I don't feel bad expect 25t/s)
- Should or can u find something better? Absolutely
- Maybe ask me questions, I'll answer in a few mins prolly(i hallucinate too)
- Any upgrades u can add? In place TTT makes u a mad max model, but lora TTT is going crazy especially if u use the prototype Qlora + inplace TTT, next is like tool calling use TTT or lora to teach it permanently (remember to save if ur using TTT) yeah there's more but honestly this should get u going pretty smoothly
Top comments (0)