Hello, fairy dairy diary~.
Too busy for Cirno Pic once again.
So here is validation table
For the next mamba experiment I
(a) moved away from parallel addition of mamba and now model works as MAMBA(MLP+ATTN).
(b) inserted mamba each second layer as I had enough memory to do it.
Good news it works. After 1st epoch my loss is as low as never.
Bad news it took 9 hours and I didn't even activated recurrence in mamba. As it doesn't support chunkwise, when I'll have to do slow O(n) instead of faster parallel O(N log N)
However results are really good. Like astonishingly good.
Tomorrow I am going to half number of mamba and train from the very morning.
Next, activate recurrence and use mamba to see all sequence.
For now chill and out!
Top comments (0)