DEV Community

Takara Taniguchi
Takara Taniguchi

Posted on

[memo]A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

Allen研究所, Dusting Schwenkが第一著者

Introduction

  • Knowledge, common-sense in NLP. Creating benchmark which requires a diverse knowledge outside
  • Detailed analysis
  • An evaluation

BERT

CLIP

GPT-3

Related works

Knowledge-based VQA datasets

  • FVQA does not require reasoning
  • KVQA focuses too much on names of people.
  • OK-VQA
  • S3VQA requires detecting objects in images

Commonsense

  • Wikipedia SQu-AD
  • CommonsenseQA

Dataset collection

COCO2017

Interesting images are filtered by using Resnet

Clustered images by CLIP

Workers are hired

Experiment

BERT, CLIP, CLIPcap

Conclusion

Commonsenseとは....となる

Top comments (0)