LISA adapted to SamGIS

#llm #machinelearning #python #computervision

LISA adapted to SamGIS

Image segmentation is a crucial task in computer vision, where the goal is to extract the instance segmentation mask for a desired object within the image. I've already worked on a project, SamGIS, that focuses on this particular application of computer vision. A logical progression now would be incorporating the ability to recognize objects through text prompts. This apparently simple activity is actually different compared to what Segment Anything (the ML backend used by SamGIS) does. In fact "SAM" does not outputs descriptions nor categorizations for its input images. Starting from a written prompt at the contrary requires understanding which classes of objects exist in the image under analysis. A visual language model (or VLM) that performs well for this task is LISA. LISA's authors built their work on top of Segment Anything and Llava, a large language model with multimodal capabilities (it can process both text prompts and images). By leveraging LISA's "reasoned segmentation" abilities, SamGIS can now conduct "zero-shot" analyses, meaning it can operate without specific or specialistic prior training in geological, geomorphological, or photogrammetric fields.

Some input text prompts with their geojson outputs

I can't show this part on dev.to, then I refer you to my blog page.

Duration of segmentation tasks

At the moment, a prompt that also requires an explanation about the segmentation task slows down greatly the analysis. The same prompt on the same image without "descriptive" or "explanatory" questions instead finish much faster. Tests with explanatory text perform in more than 60 seconds while without duration is between 3 and 8 seconds, using the HuggingFace hardware profile "Nvidia T4 Small" with 4 vCPU, 15 GB RAM and 16 GB VRAM.

Software architecture

Technically and architecturally, the demo consists of a frontend page like SamGIS demo. Instead of the drawing tool bar there is a text prompt for natural language requests with some selectable examples displayed at the top of the page. The backend utilizes a FastAPI-based API that calls a custom LISA function wrapper.

Unfortunately I have to pause my demo due to GPU cost, but I am requesting the use of a free GPU from HuggingFace. Please feel free to reach out to me on LinkedIn for a live demonstration, ask for more information or further clarifications.

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Forem

LISA adapted to SamGIS

LISA adapted to SamGIS

Some input text prompts with their geojson outputs

Duration of segmentation tasks

Software architecture

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Top comments (0)

Tune in for AWS Security LIVE!

Read next

AI System Combines Face Analysis and Body Signals to Better Detect Human Emotions

ChromaDB for the SQL Mind

Test Python Code Like a Pro with Poetry, Tox, Nox and CI/CD

EchoAPI vs Bruno: A Comprehensive Comparison from Design to Testing 💡

Okay