[memo]Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

Arizona State universityのYezhou Yang

Abstract

Multiple-choice VQA textual distractors generation for VQA focusing on generating challenging yet meaningful distractors given the context image.

DG-VQA

Related works

VQA: two typical tasks

Introduction

Contribution

Related works

VQA
- CLIPの話とかしている
Distractor generation
- There are few studies in the multimodal domains
- Sakaguchi train a discriminative model to predict distractors
- Gao etal
Pre-trained models as KB
Reinforcement learning

Problem definition

Challenging does not mean that the generated distractors D must be semantically equivalent to the correct answer

Method

DGVQA vs RL problem
- RL framework where the agent model is trained to generate distractors based on the feedback from the environment
- Policy gradient framework
- VQA model produces the reward
Neural distractor generator

Conclusion

感想

2022のCVPRWなのか．．．最初に書かれたのは2019なのに．．．

DEV Community