gdemarcq

Posted on Oct 3, 2023 • Originally published at ikomia.ai

Easy stable diffusion inpainting with Segment Anything Model (SAM)

With the Ikomia API, creating a workflow using Segment Anything Model (SAM) for segmentation followed by Stable diffusion inpainting becomes effortless, requiring only a few lines of code. To get started, you need to install the API in a virtual environment.

How to install a virtual environment

pip install ikomia

API documentation

API repo

Run SAM and stable diffusion inpainting with a few lines of code

You can also charge directly the open-source notebook we have prepared.

Note: The workflow bellow requires 6.1 GB of GPU RAM. However, by choosing the smallest SAM model, the memory usage can be decreased to 4.9 GB of GPU RAM.

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()

# Add the SAM algorithm
sam = wf.add_task(ik.infer_segment_anything(
    model_name='vit_l',
    input_box='[204.8, 221.8, 769.7, 928.5]'
),
    auto_connect=True
)

# Add the stable diffusion inpainting algorithm
sd_inpaint = wf.add_task(ik.infer_hf_stable_diffusion_inpaint(
    model_name='stabilityai/stable-diffusion-2-inpainting',
    prompt='dog, high resolution',
    negative_prompt='low quality',
    num_inference_steps='100',
    guidance_scale='7.5',
    num_images_per_prompt='1'),
    auto_connect=True
)

# Run directly on your image
wf.run_on(url="https://raw.githubusercontent.com/Ikomia-dev/notebooks/main/examples/img/img_cat.jpg")

# Inspect your result
display(sam.get_image_with_mask())
display(sd_inpaint.get_output(0).get_image())

Introducing SAM: The Segment Anything Model

Image segmentation is a critical task in Computer Vision, enabling machines to understand and analyze the contents of images at a pixel level. The Segment Anything Model (SAM) is a groundbreaking instance segmentation model developed by Meta Research, which has taken the field by storm since its release in April 2023.

SAM offers unparalleled versatility and efficiency in image analysis tasks, making it a powerful tool for a wide range of applications.

SAM's promptable features

SAM was specifically designed to address the limitations of existing image segmentation models and to introduce new capabilities that revolutionize the field.

One of SAM's standout features is its promptable segmentation task, which allows users to generate valid segmentation masks by providing prompts such as spatial or text clues (feature not yet released at the time of writing) that identify specific objects within an image.

This flexibility empowers users to obtain precise and tailored segmentation results effortlessly:

Generate segmentation masks for all objects SAM can detect.

Provide boxes to guide SAM in generating a mask for specific objects in an image.

Provide a box and a point to guide SAM in generating a mask with an area to exclude.

Key features of the Segment Anything Model (SAM)

At the core of SAM lies its advanced architecture, which comprises three key components: an image encoder, a prompt encoder, and a lightweight mask decoder. This design enables SAM to perform real-time mask computation, adapt to new image distributions and tasks without prior knowledge, and exhibit ambiguity awareness in segmentation tasks.

By leveraging these capabilities, SAM offers remarkable flexibility and adaptability, setting new standards in image segmentation models.

The SA-1B dataset: enabling unmatched training data scale

A fundamental factor contributing to SAM's exceptional performance is the SA-1B dataset, the largest segmentation dataset to date, introduced by the Segment Anything project. With over 1 billion masks spread across 11 million carefully curated images, the SA-1B dataset provides SAM with a diverse and extensive training data source.

This abundance of high-quality training data equips SAM with a comprehensive understanding of various object categories, enhancing its ability to generalize and perform accurately across different segmentation tasks.

Zero-shot transfer: adapting to new tasks without prior knowledge

One of SAM's most impressive attributes is its zero-shot transfer capability. SAM has been trained to achieve outstanding zero-shot performance, surpassing previous fully supervised results in numerous cases.

Zero-shot transfer refers to SAM's ability to adapt to new tasks and object categories without requiring explicit training or prior exposure to specific examples. This feature allows users to leverage SAM for diverse applications with minimal need for prompt engineering, making it a truly versatile and ready-to-use tool.

Diverse applications of SAM in image segmentation

With its numerous applications and innovative features, SAM unlocks new possibilities in the field of image segmentation. As a zero-shot detection model, SAM can be paired with object detection models to assign labels to specific objects accurately. Additionally, SAM serves as an annotation assistant, supporting the annotation process by generating masks for objects that require manual labeling.

Moreover, SAM can be used as a standalone tool for feature extraction. It allows users to extract object features or remove backgrounds from images effortlessly.

Versatility in image analysis tasks

In conclusion, the Segment Anything Model represents a significant leap forward in the field of image segmentation. With its promptable segmentation task, advanced architecture, zero-shot transfer capability, and access to the SA-1B dataset, SAM offers unparalleled versatility and performance.

As the capabilities of Computer Vision continue to expand, SAM paves the way for cutting-edge applications and facilitates breakthroughs in various industries.

Exploring stable diffusion inpainting

Inpainting refers to the process of restoring or repairing an image by filling in missing or damaged parts. It is a valuable technique widely used in image editing and restoration, enabling the removal of flaws and unwanted objects to achieve a seamless and natural-looking final image. Inpainting finds applications in film restoration, photo editing, and digital art, among others.

Understanding stable diffusion inpainting

Stable Diffusion Inpainting is a specific type of inpainting technique that leverages the properties of heat diffusion to fill in missing or damaged areas of an image. It accomplishes this by applying a heat diffusion process to the surrounding pixels.

During this process, values are assigned to these pixels based on their proximity to the affected area. The heat equation is then utilized to redistribute intensity values, resulting in a seamless and natural patch. The repetition of this equation ensures the complete filling of the image patch, ultimately creating a smooth and seamless result that blends harmoniously with the rest of the image.

Unique advantages of stable diffusion inpainting

Stable Diffusion Inpainting sets itself apart from other inpainting techniques due to its notable stability and smoothness. Unlike slower or less reliable alternatives that can produce visible artifacts, Stable Diffusion Inpainting guarantees a stable and seamless patch. It excels particularly in handling images with complex structures, including textures, edges, and sharp transitions.

Applications of stable diffusion inpainting

Stable Diffusion Inpainting finds practical applications in various fields.

In photography

it proves valuable for removing unwanted objects or blemishes from images.

In film restoration

it aids in repairing damaged or missing frames.

Medical imaging

It benefits from Stable Diffusion Inpainting by removing artifacts or enhancing scan quality.

In digital art

It can be utilized to create seamless compositions or eliminate undesired elements.

Useful tips for effective inpainting

To achieve optimal inpainting results, consider the following tips:

Experiment with different inpainting techniques to find the most suitable one for your specific use case.
Utilize good-quality source images to achieve accurate and efficient inpainting results.
Adjust the parameters of Stable Diffusion Inpainting to optimize outcomes for your particular needs.
Combine Stable Diffusion Inpainting with other segmentation algorithm such as YOLOv8-seg, for enhanced results.

Stable Diffusion Inpainting stands out as an advanced and effective image processing technique for restoring or repairing missing or damaged parts of an image. Its applications include film restoration, photography, medical imaging, and digital art.

Step by step segmentation and inpainting with the Ikomia API

In this section, we will demonstrate how to utilize the Ikomia API to create a workflow for segmentation and diffusion inpainting as presented above.‍

Step 1: import

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik
from ikomia.utils.displayIO import display

The ‘Workflow’ class is the base object for creating a workflow. It provides methods for setting inputs (image, video, directory), configuring task parameters, obtaining time metrics, and retrieving specific task outputs, such as graphics, segmentation masks, and texts.
‘ik’ is an auto-completion system designed for convenient and easy access to algorithms and settings.
The ‘display’ function offers a flexible and customizable way to display images (input/output) and graphics, such as bounding boxes and segmentation masks

Step 2: create workflow

wf = Workflow()

We initialize a workflow instance. The “wf” object can then be used to add tasks to the workflow instance, configure their parameters, and run them on input data.

Step 3: add and connect SAM

sam = wf.add_task(ik.infer_segment_anything(
    model_name='vit_l',
    input_box='[204.8, 221.8, 769.7, 928.5]',
),
    auto_connect=True

‘model_name’: The SAM model can be loaded with three different encoders: ‘vit_b’, ‘vit_l’, ‘vit_h’. The encoders differ in parameter counts, with ViT-B (base) containing 91M, ViT-L (large) containing 308M, and ViT-H (huge) containing 636M parameters.
ViT-H offers significant improvements over ViT-B, though the gains over ViT-L are minimal.
Based on our tests, ViT-L presents the best balance between performance and accuracy. While ViT-H is the most accurate, it's also the slowest, and ViT-B is the quickest but sacrifices accuracy.
'input_box' (list): A Nx4 array of given box prompts to the model, in [XYXY] or [[XYXY], [XYXY]] format.

Additional SAM parameters

'draw_graphic_input' (Boolean): When set to True, it allows you to draw graphics (box or point) over the object you wish to segment. If set to False, SAM will automatically generate masks for the entire image.
'points_per_side' (int or None): The number of points to be sampled for mask generation when running automatic segmentation.
'input_point' (list): A Nx2 array of point prompts to the model. Each point is in [X,Y] in pixels.
'input_point_label' (list): A length N array of labels for the point prompts. 1 indicates a foreground point and 0 indicates a background point.

Step 4: add and connect the stable diffusion inpainting algorithm

sd_inpaint = wf.add_task(ik.infer_hf_stable_diffusion_inpaint(
    model_name='stabilityai/stable-diffusion-2-inpainting',
    prompt='tiger, high resolution',
    negative_prompt='low quality',
    num_inference_steps='100',
    guidance_scale='7.5',
    num_images_per_prompt='1'),
    auto_connect=True
)

'prompt' (str): Input prompt.
'negative_prompt' (str): The prompt not to guide the image generation. Ignored when not using guidance (i.e., ignored if **guidance_scale** is less than 1).
‘num_inference_steps’:Number of denoising steps (minimum: 1; maximum: 500).
‘guidance_scale’: Scale for classifier-free guidance (minimum: 1; maximum: 20).
‘num_images_per_prompt’: Number of images to output.

Step 5: Apply your workflow to your image

You can apply the workflow to your image using the ‘run_on()’ function. In this example, we use the image path:

wf.run_on(path="path/to/your/image")

Step 6: Display your results

Finally, you can display our image results using the display function:

display(sam.get_image_with_mask())
display(sd_inpaint.get_output(0).get_image())

First, we show the segmentation mask output from the Segment Anything Model. Then, display the stable diffusion inpainting output.

Here are some more stable diffusion inpainting outputs (prompts: ‘dog’, ‘fox’, ‘lioness’, ‘tiger’, ‘white cat’):

Image Segmentation with SAM and the Ikomia ecosystem

In this tutorial, we have explored the process of creating a workflow for image segmentation with SAM, followed by stable diffusion inpainting.

The Ikomia API simplifies the development of Computer Vision workflows and provides easy experimentation with different parameters to achieve optimal results.

To learn more about the API, refer to the documentation. You may also check out the list of state-of-the-art algorithms on Ikomia HUB and try out Ikomia STUDIO, which offers a friendly UI with the same features as the API.

Create and maintain end-to-end frontend tests

Learn best practices on creating frontend tests, testing on-premise apps, integrating tests into your CI/CD pipeline, and using Datadog’s testing tunnel.

Download The Guide

DEV Community