DEV Community

Takara Taniguchi
Takara Taniguchi

Posted on

Seeing is Believing: Mitigating Hallucination in Large VisionLanguage Models via CLIP-Guided Decoding

CLIPSCOREを用いてはるしネーションの検出を行った論文
Ailin Dengが第一著者 NUSのBryan hooiのグループ

The proposed method utilizes the CLIPSCORE to detect hallucinations
The experiment demonstrates that the sentences generated later have a tendency to contain hallucinations.

CLIP-Guided Decoding
Reliability scoring

F(c):=(1α)fθ(c)+α1titfϕ(ximg,si) F(c) := (1-\alpha)f_\theta(c) + \alpha \frac{1}{t} \sum_i^{t}f_\phi(x_{img}, s_i)

CLIP guidance is
itfϕ(ximg,si) \sum_i^{t}f_\phi(x_{img}, s_i)

Depending on \alpha, the effect of CLIP guidance is adjusted.
これがハル氏ネーションスコアとなっている

結論
CLIP score (reliability scoring) is a better approach to detect hallucinations than token likelihood.

Top comments (0)