BakaLLM 17.7 Simiplification!

Hello, fairy dairy diary!

Got Mini Dall-e working!

But beside that changed RMT again a little.
Previously when new batch started, RMT was only appended in the end.

It made dynamic batches to be quite complex as I had to split batches into parts where there were RMT on the left and not in many places. So I decided to simplify the process and prepend something even for the new data. Tried two approaches: first: zero, as if all RMT-READ had 0.0 in every position.

Then I tried prepend RMT tokens. It got better and now we got new E3 record: 3.34 loss. Not bad. Not bad at all.

model_type	loss	ppl	note
baka_mamba	3.3458	28.3842	after 3 epoch ber 8batch; rmt has initial copy of write-mem
baka_mamba	3.4515	31.5496	after 2 epoch ber 8batch; rmt has initial copy of write-mem
baka_mamba	3.7411	42.1462	after 1 epoch ber 8batch; rmt has initial copy of write-mem
baka_mamba	3.7830	43.9508	after 1 epoch ber 8batch; rmt has initial zero
baka_mamba	3.3656	28.9515	after 3 epoch ber 8batch (3h12m)
baka_mamba	3.4757	32.3230	after 2 epoch ber 8batch (2h56m)
baka_mamba	3.7726	43.4954	after 1 epoch ber 8batch (3h8m)

DEV Community

BakaLLM 17.7 Simiplification!

Top comments (0)

Read next

Prisma 101 baby.

Architecture Decision Records (ADR): Documenting Your Project’s Decisions

💡 Particle Aurora Bloom

Frontend Challenge: December Edition