Very interesting, and for the Nous you used from OpenRouter, did you use nousresearch/hermes-4-70b or less ? Which size makes interesting on local GPU ?
For this experiment I used a Nous Hermes endpoint via OpenRouter rather than full local inference, so not specifically Hermes-4-70B local. But from testing, the really interesting threshold for Hermes-style persistent learning feels around the 30B–70B range — that’s where skill evolution, preference inference, autonomous refinement, and “operational intuition” become much more coherent across sessions.
That said, even 7B–14B quantized models on consumer GPUs can become surprisingly strong for recurring workflows because the persistent skill/memory layer compounds over time. A smaller model with 7 days of accumulated skills can genuinely outperform a stronger stateless model for specific tasks.
Very interesting, and for the Nous you used from OpenRouter, did you use
nousresearch/hermes-4-70bor less ? Which size makes interesting on local GPU ?For this experiment I used a Nous Hermes endpoint via OpenRouter rather than full local inference, so not specifically Hermes-4-70B local. But from testing, the really interesting threshold for Hermes-style persistent learning feels around the 30B–70B range — that’s where skill evolution, preference inference, autonomous refinement, and “operational intuition” become much more coherent across sessions.
That said, even 7B–14B quantized models on consumer GPUs can become surprisingly strong for recurring workflows because the persistent skill/memory layer compounds over time. A smaller model with 7 days of accumulated skills can genuinely outperform a stronger stateless model for specific tasks.
Thanks a lot for the benchs and tips, they will be very useful !
You too man!☺️