Yuanhan Zhangが第一著者,Bytedanceのグループ
Intro
This dataset explores the traditional dataset by using synthetic data.
Authors created LLaVA-Video-178K tailored for video instruction following.
Related works(dataset)
ActivityNet
Charades
Kinetics-700
Something-Something v2
Ego4d
VIDAL
HD-VILA
Method
They used video detail descriptions pipeline.
Experiment
128 H100 GPUってどう言うことすか...😭
なんかあまり学習方法について書いていないのが残念
Top comments (0)