UCSBのLuのグループ, Fair
Intro
Novel framework to generate and search for evidence chain-of-thought.
Evidence distillation
Related works
Video understanding with LLMs.
They uniformly sample frames.
Authors focus on generating and localizing relevant evidence to support the question.
Chain-of-thought reasoning in videos
CoT for video understanding
deliberate search
majority voting
VIP
VSOR-CoT
MotionEpic
Visual evidence
Generating an evidence pool
Standard flow: Q->A
This method flow: Q->Evidence->flow
Method
Divide videos and get the appropriate information.
Generated chain-of-thoughts are selected by the algorithms.
どのようなevidenceがchainとの尤度を最も高めるかを検索する
Distilling evidence chains into a single model
Stage 1: instruction tuning
Stage 2: predict answers and evidence chains
Trained using next token prediction with cross-entropy loss
Distillationなのに時間とかの議論の票がない?
Top comments (0)