Yuxan Wangが第一著者,北京一般人工知能機関
The problem is as follows
The majority of current models exhibit significant issues with hallucinations
Detecting extrinsic hallucinations is difficult
Existing models are better at detecting facts than identifying hallucinations.
Related works
Existing methods do not focus on dynamic content.
actions, events, stories
Issues of LVLMs are specified in VideoHallucer.
Intrinsic
Object relation
Temporal
Semantic detail
Extrinsic
Factual
Non-factual
Experiment and benchmark
Self-PEP Framework
This seems to be one kind of CoTs
Conclusion
Adversarially generated questions from videos
VQA based and Caption based
Top comments (0)