Skip to content

DEV Community

Takara Taniguchi

Posted on Jun 28

[memo]A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

#nlp #ai #gpt3 #computervision

Allen研究所, Dusting Schwenkが第一著者

Introduction

Knowledge, common-sense in NLP. Creating benchmark which requires a diverse knowledge outside
Detailed analysis
An evaluation

BERT

CLIP

GPT-3

Related works

Knowledge-based VQA datasets

FVQA does not require reasoning
KVQA focuses too much on names of people.
OK-VQA
S3VQA requires detecting objects in images

Commonsense

Wikipedia SQu-AD
CommonsenseQA

Dataset collection

COCO2017

Interesting images are filtered by using Resnet

Clustered images by CLIP

Workers are hired

Experiment

BERT, CLIP, CLIPcap

Conclusion

Commonsenseとは．．．．となる

Top comments (0)

Subscribe