DEV Community

0 seconds of 40 minutes, 8 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
00:00
00:00
40:08
 
Jimmy Guerrero for Voxel51

Posted on

ECCV 2024 - Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.

ECCV 2024 Paper: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

About the Speaker

Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

The Most Contextual AI Development Assistant

Pieces.app image

Our centralized storage agent works on-device, unifying various developer tools to proactively capture and enrich useful materials, streamline collaboration, and solve complex problems through a contextual understanding of your unique workflow.

👥 Ideal for solo developers, teams, and cross-company projects

Learn more

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay