DEV Community


Posted on

Baka4LLM. New horizons

Hello fairy dairy diary~!

Conserved Causal LLM for a while, moved to new horizon...

Cieno wears a mask

...Masked LLM..

The reasoning is simple: hypothesis it's easier to train acceptable MLM on 16GB VRAM. Which then can be chained to itself.

For now points of references are

and there was a paper that explored if it was possible to train decoder from encoder-only, IIRC, they found it can I think, but I can't find it and I might hallucinate it, will look for it later.

For now some pretraining is to go, then to decide what to do with it and how update it!


Top comments (0)