DEV Community

Maykeye
Maykeye

Posted on

Baka4LLM. New horizons

Hello fairy dairy diary~!

Conserved Causal LLM for a while, moved to new horizon...

Cieno wears a mask

...Masked LLM..

The reasoning is simple: hypothesis it's easier to train acceptable MLM on 16GB VRAM. Which then can be chained to itself.

For now points of references are

https://github.com/samvher/bert-for-laptops/blob/main/BERT_for_laptops.ipynb

https://arxiv.org/abs/2212.14034

and there was a paper that explored if it was possible to train decoder from encoder-only, IIRC, they found it can I think, but I can't find it and I might hallucinate it, will look for it later.

For now some pretraining is to go, then to decide what to do with it and how update it!

Chill!

Top comments (0)