DEV Community

Maykeye
Maykeye

Posted on

BakaLLM, step 13, RMT mainlined

Hello, fairy dairy diary.

Flash

So, RMT curse continued. I redid training and after 3 epochs loss is 4.01302, no 3.9xx. Heresy! Blasphemy. The world is against us! Oh well, BakaPause is recorded as 4.02, so I'll count as a win, as RMT takes next to no parms (Pause: 190492800 vs RMT: 190505088)

Oh well, since drop of several SSM-based model and FlashFftConv library I want to test it next. .

Step4 is postponed for another time. It increases number of weights way too much for what it adds. Also I found new glorious activation function: nn.Identity, or more specifically y = identity(linear(x)) * base_layer(x). Oh well, I'll remember it for later.

For now I want to check tasty tools in HazyResearch/flash-fft-conv
What if they will not waste memory? So elegant it will! So wonderful it will be! The MEGA will return? Monarch will be added? Experiments will be experimented!

(But then again since current implementation of RMT turned out so bad compared to the old one, now solidifiers can be returned. This is also something to check. The roadmap is clear!)

Chill!~ (Oh, we have -30°C here BTW, so chill levels are pretty high! Yay, free cooling of my laptop, yes)

PS. Oh, also current code sets random seed to ⑨, it still not determenistic without jumping through loops of torch.use_deterministic_algorithms(True) and using env vars, but we are going for reproducability, yay!

Top comments (0)