Hongcheng Gaoが第一著者,上海交通大学
ビデオ理解のhallucinationに関する論文
This paper categorizes the hallucination in the video understanding task into three types.
Conflict with prior
ビデオの内容が事前知識と違う状況を示す
In this paper, the situation where a cat and a mouse get along means a strange situation, which causes hallucinations.
In-context-conflict
There are discrepancies between questions and options.
Valid answers cannot be obtained from the given materials.
These are unanswerable questions.
Capability deficiency
Numerical tasks
Experiments
Supervised reasoning fine-tuning
By using Chain of thoughts when generating the video pairs and answers, the proposed method enables the good fine-tuning dataset.
要はファインチューニング用のデータセットを作るのにLong CoT Responseを使っただけ
SRFT means supervised reasoning fine-tuning
Thinking-based DPO
人間が修正したLMM-SRFTの文章と直接修正したぺあをmikuraberu
This method assigns a greater weight to the corrected reasoning steps.
Conclusion
CoTやられとるんだね...
Top comments (0)