DEV Community

Takara Taniguchi
Takara Taniguchi

Posted on

[memo]A Vision-Language-Action Flow Model for General Robot Control

途中まで,次回からexperiment以降

Abstract

  • Bringing robot learning to the general level is difficult

- Flow matching architecture

Introduction

Versatility is important: robot can achieve diverse tasks in diverse environments

Use action chunking architecture

Related works

Availability on long tasks

Overview

22 robots

task names and segmentations

Paligemma vision-language model

To generage continuous action distributions, they used flow matching

Architecture is inspired by Transfusion

\pi_0 uses conditional flow matching

Requires right dataset

Data collection

Multi-phase training procedure

Contribution

  • VLM-pretraining and flow matching
  • Laundry folding,clearing table…

Conclusion

  • Data: 10,000 hours of dexterous manipulation data
    • OXE, DROID, Bridge
  • Is transferring possible (future work)

Top comments (0)