DEV Community

Maykeye
Maykeye

Posted on

BakaLLM 17.6 More epochs! More evaluations! Cerebras slashed and smashed

Hello fairy dairy diary!

For the first time BakaLLM outperformed the model with more parameters!

Instead of Cirno pic the graph

Validation graph

model_type n_params loss ppl note
gpt2 590310912 2.97026801109314 19.4971443468238 Cerebras-590M
gpt_neox 405334016 2.66211414337158 14.3265454754034 Pythia-410M
gpt_neox 162322944 3.06171250343323 21.3641119680711 Pythia-160M
baka_mamba 182929408 3.19427084922791 24.3923814826904 after 9 epoch ber 8batch (3h19m)
baka_mamba 182929408 3.20950531959534 24.7668315275161 after 8 epoch ber 8batch (3h24m)
baka_mamba 182929408 3.21783852577209 24.9740809678157 after 7 epoch ber 8batch (3h21m)
baka_mamba 182929408 3.23828125 25.4898733561039 after 6 epoch ber 8batch (3h33m)
baka_mamba 182929408 3.26393222808838 26.152171522808 after 5 epoch ber 8batch (3h08m)
gpt2 255977024 3.2076301574707 24.7204332188596 Cerebras-GPT-256M
baka_mamba 182929408 3.30104160308838 27.1408942419521 after 4 epoch ber 8batch (3h08m)
baka_mamba 182929408 3.36562490463257 28.9515836260749 after 3 epoch ber 8batch (3h12m)
baka_mamba 182929408 3.47578120231628 32.3230695329532 after 2 epoch ber 8batch (2h56m)
baka_mamba 182929408 3.77265620231628 43.4954442322217 after 1 epoch ber 8batch (3h8m)

The victory is in sight! 256M destroyed! Slammed! Torn to pieces! We will not talk about 160M pythia or that it took 9 epochs or that its general model not that particular take on wiki103 alone. Insignificant details will not stand in the way of such grandiose victory!

One by one models are falling! The world is doomed! Yes!!

Back to drawing board.

Chilling out!

Top comments (0)