BakaLLM part NaN(still), Mamba shows promise and slowness

Hello, fairy dairy diary~.

Too busy for Cirno Pic once again.

So here is validation table

For the next mamba experiment I

(a) moved away from parallel addition of mamba and now model works as MAMBA(MLP+ATTN).

(b) inserted mamba each second layer as I had enough memory to do it.

Good news it works. After 1st epoch my loss is as low as never.

Bad news it took 9 hours and I didn't even activated recurrence in mamba. As it doesn't support chunkwise, when I'll have to do slow O(n) instead of faster parallel O(N log N)

However results are really good. Like astonishingly good.

Tomorrow I am going to half number of mamba and train from the very morning.

Next, activate recurrence and use mamba to see all sequence.

For now chill and out!

Top comments (0)

Hydration error when installing NextJS 15

Md Shykat - Nov 21

Using Map, Filter, and Reduce Effectively in JavaScript🔥

Hossein Mobarakian - Nov 21

If you are starting in AI field ...

Jaisurya - Nov 21

Hanoi Tower with procs

Vinicius Porto - Nov 21

DEV Community