Hello fairy dairy diary~!
Conserved Causal LLM for a while, moved to new horizon...
...Masked LLM..
The reasoning is simple: hypothesis it's easier to train acceptable MLM on 16GB VRAM. Which then can be chained to itself.
For now points of references are
https://github.com/samvher/bert-for-laptops/blob/main/BERT_for_laptops.ipynb
https://arxiv.org/abs/2212.14034
and there was a paper that explored if it was possible to train decoder from encoder-only, IIRC, they found it can I think, but I can't find it and I might hallucinate it, will look for it later.
For now some pretraining is to go, then to decide what to do with it and how update it!
Chill!
Top comments (0)