Hello, fairy dairy diary ~ fairy dairy diary!
LLama* while gave better PPL, create better headache during attempts to copy it: since layer gets increased, you need to do something with new additional weights. I tried couple of ideas, and they all were bad. I didn't try to copy in ABC->AABBCC
pattern as I want to do proper ABC->ABCABC
growth. So I decided to try copying mainline step 5: RMT.
Chilling out continues. There should be training graph, but I can't be bothered as it was bad anyway.
Top comments (0)