Hello, fairy dairy diary!
Got Mini Dall-e working!
But beside that changed RMT again a little.
Previously when new batch started, RMT was only appended in the end.
It made dynamic batches to be quite complex as I had to split batches into parts where there were RMT on the left and not in many places. So I decided to simplify the process and prepend something even for the new data. Tried two approaches: first: zero, as if all RMT-READ had 0.0 in every position.
Then I tried prepend RMT tokens. It got better and now we got new E3 record: 3.34 loss. Not bad. Not bad at all.
model_type | loss | ppl | note |
---|---|---|---|
baka_mamba | 3.3458 | 28.3842 | after 3 epoch ber 8batch; rmt has initial copy of write-mem |
baka_mamba | 3.4515 | 31.5496 | after 2 epoch ber 8batch; rmt has initial copy of write-mem |
baka_mamba | 3.7411 | 42.1462 | after 1 epoch ber 8batch; rmt has initial copy of write-mem |
baka_mamba | 3.7830 | 43.9508 | after 1 epoch ber 8batch; rmt has initial zero |
baka_mamba | 3.3656 | 28.9515 | after 3 epoch ber 8batch (3h12m) |
baka_mamba | 3.4757 | 32.3230 | after 2 epoch ber 8batch (2h56m) |
baka_mamba | 3.7726 | 43.4954 | after 1 epoch ber 8batch (3h8m) |
Top comments (0)