[memo]mPLUG-Owl : Modularization Empowers Large Language Models with Multimodality

#ai #nlp #llm #openai

Abst
OwlEval
Experimental results show that their model is superior

Introduction
They propose mPLUG-Owl, a novel training paradigm
They carefully construct an instruction evaluation set, called as OwlEval.

Related works
Large language model
BERT
GPT
T5

Multi-modai large language models
Visual ChatGPT
MM-REACT
HuggingGPT

unified model
CLIP
BLIP,BLIP2

mPLUG-Owl not only aligns the representation between the vision and language foundation model.

Mainly 2 steps

Step1
Multimodal pertaining

Step2
Joint Instruction tuning

OwlEval is constructed about 80 questions and 50 images.

感想
ベンチマークにするにしてはデータ少ない気が

DEV Community