Here is a simple way to understand Google's Genie 3 - it's Image-to-Video at real time with automatic directional prompts.
It will become useful when retention span is 1 hour - at which time human operators will be able to extract meaningful information from the generated result.
A simpler approach though, would be to allow existing image generators to perform such spatial manipulations on images directly.
Top comments (0)