Hello fairy dairy diary!
For the first time BakaLLM outperformed the model with more parameters!
Instead of Cirno pic the graph
model_type | n_params | loss | ppl | note |
---|---|---|---|---|
gpt2 | 590310912 | 2.97026801109314 | 19.4971443468238 | Cerebras-590M |
gpt_neox | 405334016 | 2.66211414337158 | 14.3265454754034 | Pythia-410M |
gpt_neox | 162322944 | 3.06171250343323 | 21.3641119680711 | Pythia-160M |
baka_mamba | 182929408 | 3.19427084922791 | 24.3923814826904 | after 9 epoch ber 8batch (3h19m) |
baka_mamba | 182929408 | 3.20950531959534 | 24.7668315275161 | after 8 epoch ber 8batch (3h24m) |
baka_mamba | 182929408 | 3.21783852577209 | 24.9740809678157 | after 7 epoch ber 8batch (3h21m) |
baka_mamba | 182929408 | 3.23828125 | 25.4898733561039 | after 6 epoch ber 8batch (3h33m) |
baka_mamba | 182929408 | 3.26393222808838 | 26.152171522808 | after 5 epoch ber 8batch (3h08m) |
gpt2 | 255977024 | 3.2076301574707 | 24.7204332188596 | Cerebras-GPT-256M |
baka_mamba | 182929408 | 3.30104160308838 | 27.1408942419521 | after 4 epoch ber 8batch (3h08m) |
baka_mamba | 182929408 | 3.36562490463257 | 28.9515836260749 | after 3 epoch ber 8batch (3h12m) |
baka_mamba | 182929408 | 3.47578120231628 | 32.3230695329532 | after 2 epoch ber 8batch (2h56m) |
baka_mamba | 182929408 | 3.77265620231628 | 43.4954442322217 | after 1 epoch ber 8batch (3h8m) |
The victory is in sight! 256M destroyed! Slammed! Torn to pieces! We will not talk about 160M pythia or that it took 9 epochs or that its general model not that particular take on wiki103 alone. Insignificant details will not stand in the way of such grandiose victory!
One by one models are falling! The world is doomed! Yes!!
Back to drawing board.
Chilling out!
Top comments (0)