DEV Community

Takara Taniguchi
Takara Taniguchi

Posted on

[memo] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

DAMOの研究

Introduction

Visual contrastive decoding

VCD effectively reduces the over-reliance on statistical biases

Contribution

  • In-depth analysis from the perspective of unimordal prior statistical bias.
  • Visual contrastive decoding
  • Demonstrate the efficacy of the proposed VCD in alleviating object hallucinations.

Related works

Method

Decoding of VLMs

Object hallucination often emerge when visual-irrelevant tokens are generated.

Visual uncertainty amplifies hallucination (language prior and visual undertainty)

Experiment

Benchmarks

MME, LLava-Bench, POPE

Discussion

The perception task is improved.

Conclusion

Visual uncertainty and language prior affect hallucination mechanisms.

Top comments (0)