GPT-4 exhibited a range of remarkable abilities
Introduction
They collect images which have not been used for training
They provide comprehensive annotations to facilitate the evaluation of both generative and discriminative tasks.
LLM-free evaluation pipeline.
Contributions
- Create benchmark AMBER
- LLM-free evaluation pipeline
- analyze the most advanced GPT-4V
Related works
Hallucinations reasons
- Diverse training data with some errors
- lose attention to the image
- Information loss after visual encoder
Dataset construction
Images too challenging for accurate annotation are discarded
Annotated by human
Generative and discriminative tasks
Metrics
CHAIR is a commonly used metric for evaluating hallucinations.
It measures the frequency of hallucinatory objects.
Hal represents the proportion of responses with hallucinations
感想
人間がアノテーションしているので大変そう
疲れそう
Top comments (0)