Hello fairy dairy diary!
I finally moved away from wandb alone(due to jumping graphs) and now I'm storing losses right into SQLlite database as well. This feel so good. No longer I have to deal with jumping loss graph. To say how I don't see them reliable compare these two charts.
This is a chart from WANDB.
rmt-x-1
is current training session that it happening right nwo. It's basically cross attention with bunch of RMT from the past sequences. According to wandb it's a breakthrough! Loss so much better! Optimism through the roof, let's dance in the snow!
But, but, but, here's a result from sqlite.
losses of each run is split into buckets per 100 entries per bucket(last gets >100 in case division is non exact), then it shows three graphs: min of the bucket, max of bucket and average.
As you can see they are pretty much the same, it seems RMT cross attention learned to zero itself out for the most part.
Well, at least now I can stop training earlier. With wandb I have to zoom so it doesn't sample too randomly:
It's the same graph from wandb. Only zoomed in. As you can see no breakthrough. Bonkers!
Now, I'm back to my think tank, planning to fix the mess, chill!
Top comments (0)