Seeing is Believing: Mitigating Hallucination in Large VisionLanguage Models via CLIP-Guided Decoding

#ai #computervision #discuss

CLIPSCOREを用いてはるしネーションの検出を行った論文
Ailin Dengが第一著者 NUSのBryan hooiのグループ

The proposed method utilizes the CLIPSCORE to detect hallucinations
The experiment demonstrates that the sentences generated later have a tendency to contain hallucinations.

CLIP-Guided Decoding
Reliability scoring

F(c) := (1-\alpha)f_\theta(c) + \alpha \frac{1}{t} \sum_i^{t}f_\phi(x_{img}, s_i)

CLIP guidance is

\sum_i^{t}f_\phi(x_{img}, s_i)

Depending on \alpha, the effect of CLIP guidance is adjusted.
これがハル氏ネーションスコアとなっている

結論
CLIP score (reliability scoring) is a better approach to detect hallucinations than token likelihood.

DEV Community

Seeing is Believing: Mitigating Hallucination in Large VisionLanguage Models via CLIP-Guided Decoding

Top comments (0)