DEV Community

Cover image for Revolutionizing Deepfakes: Unveiling Wunjo AI's Latest Update with Stable Diffusion
Wladislav Radchenko
Wladislav Radchenko

Posted on • Originally published at github.com

Revolutionizing Deepfakes: Unveiling Wunjo AI's Latest Update with Stable Diffusion

Greetings to all enthusiasts of generative neural networks, image and video generation from prompts! In this article, I'm excited to share the latest developments in my open-source project, Wunjo AI. With Wunjo AI, I've embarked on a journey to revolutionize the way deepfakes are created using Stable Diffusion. Let's delve into what version 1.6 brings to the table and discover how Wunjo AI now empowers you to effortlessly manipulate videos using text queries and effortlessly craft masks for moving objects in just a click. Furthermore, I'll introduce a groundbreaking tool that facilitates the extraction of objects from videos with transparent backgrounds, enhancing their versatility for various applications, including design.

If you're new to this project, check out this comprehensive video highlighting all of Wunjo AI's functionalities.

Effortless Object Mask Extraction

Let's kick things off by exploring the innovative features in our latest update. Version 1.6 introduces a seamless method for extracting objects from videos with a transparent background. Here's how it works:

  1. Open the "Removal and Retouching" panel and upload your media content.

  2. Select the object you want to extract. Once you've obtained the object, review it to ensure it meets your expectations before proceeding.

  3. Utilize the right-click to include an area and the left-click to exclude it. For instance, you can create two distinct masks for two fast-moving objects with different timing characteristics. You can either select a specific frame or specify the desired start and end times for object extraction.

Panel for removing objects and retouching

  1. In the options panel, you can choose whether to retain the mask, set the color, or make it transparent. Additionally, you can fine-tune the similarity of the masks since moving objects may exhibit slight variations across different frames. It's worth noting that when you first select an object, the segmentation model will be loaded automatically if it hasn't been loaded beforehand. The model size ranges from 1 GB to 2 GB, depending on whether you choose CPU or GPU, so resource management is essential.

Options panel for obtaining masks

Keep in mind that complex objects may display artifacts in some frames, but you can rectify these issues in subsequent processing iterations since the mask extraction process also functions on individual images. Here's the end result:

Result

Enhanced Object Retouching

Version 1.6 introduces an advanced method for removing objects from videos, specifically optimized for GPU use. In the past, the neural network removed objects from each frame solely based on its weights and a general idea of what should replace the removed object. This often resulted in unnatural results due to noise. The enhanced object removal method, available in the "Object Removal and Retouching" panel, introduces a novel approach based on the analysis of a burst of 50 frames. The neural network analyzes this group of frames and, leveraging this information, attempts to fill the area, taking into account what should occupy the space left by the removed object.

Key Options:

  • Mask Thickness: This option adds an outline to the mask, useful when the segmentation fails to highlight the object's outline, yet you still need it included in the mask.

  • Resizing: Given the resource-intensive nature of this method, some frames may consume substantial resources and require resizing. The "Resize" option enables you to merge the original video with the enlarged area after object removal. This is achieved through scaling, and it becomes more noticeable when there's a substantial size difference between the original video and the result.

Parameters

GPU Resources and Video Resolution Limits:

  • 19 GB VRAM: 1280x1280
  • 7 GB VRAM: 720x720
  • 6 GB VRAM: 640x640
  • 2 GB VRAM: 320x320

Deleting objects

These constraints should be considered while working with videos on your device.

In the upcoming 1.6.1 update, we plan to further enhance results by introducing the capability to process intermediate frames in batches of 50 frames, resulting in smoother and imperceptible transitions.

Converting Deepfakes via Text Prompts

What's the innovation behind this? The concept involves using our previous object segmentation approach to automatically generate masks and then applying Stable Diffusion to each object using appropriate text cues. In this process, objects, and even the entire background, can be entirely redrawn based on text directives.

Video Conversion via Text Query Panel

Video conversion panel by text query

Let's illustrate this with an example: Suppose you have two objects moving simultaneously, and you want to apply the text cues "Blonde" and "Brown Jacket" to them, respectively. This example uses the default model, but you can easily replace it with your custom model. Moreover, two preprocessors are available, one of which smoothens image disparities while the other enhances brightness. You can also fine-tune the number of frames between generations and the mask merging options, similar to retouching and object removal.

Parameters

Additional Options:

  • ControlNet: This option is crucial, as changing objects doesn't necessarily mean they'll seamlessly blend with the surrounding context. For example, when generating a person, we want their head to remain fixed as in the original frames rather than turning in different directions. To achieve this, ControlNet is used with one of two annotators: Canny and Hed.

Generating each frame is a time-consuming process, and variations may arise between frames, particularly when generating videos with complex content. Therefore, frames are generated at specific intervals and then overlaid on the primary video using stylization (style transfer) with Ebsynth. This approach ensures smoother and imperceptible changes. The generation process unfolds as follows:

  1. Create masks for each object.

The first step is to create masks for each object

  1. Generate images with intervals and ControlNet.

The second step is generating images with interval and ControlNet

  1. The final result is in 512x512 resolution.

The final result is in resolution 512x512

Notably, this process is resource-intensive, exclusive to GPUs, and demands significant VRAM. The maximum resolution is contingent on the available VRAM:

VRAM Resources and Video Resolution Limits:

  • 24 GB VRAM: 1280x1280
  • 18 GB VRAM: 1024x1024
  • 14 GB VRAM: 768x768
  • 10 GB VRAM: 640x640
  • 8 GB VRAM: 576x576
  • 7 GB VRAM: 512x512

If you've experimented with Stable Diffusion, you've probably noticed that image quality isn't solely determined by the model; image size plays a crucial role. For instance, if you have 8 GB VRAM, the resulting video cannot exceed 512x512 resolution (with any aspect ratio), affecting the quality of the outcome.

You can integrate Stable Diffusion models from Hugging Face or CivitAI by adding them to your Wunjo AI framework. This provides greater flexibility in configuring the generation process.

To add a model to the application, place your model in the .wunjo/deepfake/diffusion directory, open .wunjo/deepfake/custom_diffusion.json, and add the model's name. For example:

{
   "revAnimated_v11.safetensors": "revAnimated_v11.safetensors"
}
Enter fullscreen mode Exit fullscreen mode

Let's put this model into action. Suppose you want to alter the entire frame except for one object. In this case, select the appropriate model, set a mask, and enable the "pass" option in the prompt field to exclude the object from generation. Then, select the "Change Background" option and input a new text query and a negative query. Notably, the "seed" parameter allows you to replicate results, handy when you only need to modify specific portions of the image while keeping others consistent.

Video Conversion by Text Queries Panel

Panel for converting video by text queries

Applying Your Stable Diffusion Model

It's essential to acknowledge that files for generating videos can be substantial, and problems with your internet connection may lead to unstable downloads of models from Hugging Face. For instance, downloading large models from Hugging Face without a VPN might pose challenges. To circumvent this, you can manually download models. A table of models is also provided in the text. After downloading, place the models in .wunjo/deepfake/diffusion, or the models will be automatically downloaded upon initial launch.

Applying your Stable Diffusion model

Notable models include ControlNet Canny, GMFlow, Head Annotator, ControlNet Head, VAE, and Stable Diffusion Model.

Initially, the application is distributed for Windows on the CPU for several reasons, including the author's limited access to the Windows platform and technical constraints like a 2 GB size limit for the installation file. However, the documentation includes instructions on running the application on Windows with GPU support.

For user convenience, it's possible to create a portable version of the application using the briefcase build feature. This portable version includes all the necessary libraries and a Python interpreter, allowing users to share the application without the need for additional component installations.

This update also enhances the optimization of face replacement in videos from photos and object removal from videos, reducing memory requirements and improving overall performance.

This wraps up the exciting new features introduced in Wunjo AI. We're eager to hear your thoughts on the continued development of video generation in Wunjo AI, despite its resource demands. Additionally, we're curious about your interest in seeing the application evolve to generate music and sounds based on text queries. Please share your comments!

Before we part ways, here's a link to the open-source code of the project and a website where you can easily download installers with a single click, along with a video showcasing the functionality of Wunjo AI.

Open Source Code
Download Installers
Video Showcase

Enjoy exploring, and we look forward to your return!

Top comments (0)