[memo] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

#computervision #ai #deeplearning #pwl

DAMOの研究

Introduction

Visual contrastive decoding

VCD effectively reduces the over-reliance on statistical biases

Contribution

In-depth analysis from the perspective of unimordal prior statistical bias.
Visual contrastive decoding
Demonstrate the efficacy of the proposed VCD in alleviating object hallucinations.

Related works

Method

Decoding of VLMs

Object hallucination often emerge when visual-irrelevant tokens are generated.

Visual uncertainty amplifies hallucination (language prior and visual undertainty)

Experiment

Benchmarks

MME, LLava-Bench, POPE

Discussion

The perception task is improved.

Conclusion

Visual uncertainty and language prior affect hallucination mechanisms.

DEV Community