DEV Community

Maykeye
Maykeye

Posted on

BakaLLM 17.7 Simiplification!

Hello, fairy dairy diary!

Got Mini Dall-e working!

Cirno Image

But beside that changed RMT again a little.
Previously when new batch started, RMT was only appended in the end.

It made dynamic batches to be quite complex as I had to split batches into parts where there were RMT on the left and not in many places. So I decided to simplify the process and prepend something even for the new data. Tried two approaches: first: zero, as if all RMT-READ had 0.0 in every position.

Then I tried prepend RMT tokens. It got better and now we got new E3 record: 3.34 loss. Not bad. Not bad at all.

model_type loss ppl note
baka_mamba 3.3458 28.3842 after 3 epoch ber 8batch; rmt has initial copy of write-mem
baka_mamba 3.4515 31.5496 after 2 epoch ber 8batch; rmt has initial copy of write-mem
baka_mamba 3.7411 42.1462 after 1 epoch ber 8batch; rmt has initial copy of write-mem
baka_mamba 3.7830 43.9508 after 1 epoch ber 8batch; rmt has initial zero
baka_mamba 3.3656 28.9515 after 3 epoch ber 8batch (3h12m)
baka_mamba 3.4757 32.3230 after 2 epoch ber 8batch (2h56m)
baka_mamba 3.7726 43.4954 after 1 epoch ber 8batch (3h8m)

Top comments (0)