Does DINO loss compare the [CLS] tokens from both teacher and student?

#machinelearning #deeplearning #computervision

Yes, exactly.

In DINO and DINOv2, the DINO loss is applied between the [CLS] tokens of the teacher and student models.

The [CLS] token output from the teacher is softmaxed with temperature and centered.
The student is trained to match this distribution using cross-entropy loss.
Each view of the same image produces one [CLS] embedding, and the goal is to make the student’s [CLS] output match the teacher’s.
So, the comparison is always between the [CLS] tokens, across different augmentations of the same image.

DEV Community

Does DINO loss compare the [CLS] tokens from both teacher and student?

Top comments (0)