dlimeng

Posted on Dec 12, 2023

AI Painting Quickstart - StableDiffusionWebui

#python #ai #javascript

Here is the English translation:

Introduction

Stable Diffusion web UI is a web interface for generating images with Stable Diffusion based on the Gradio library.

Stable Diffusion is a system that uses deep learning for text-to-image generation. It was developed by Anthropic and has seen rapid development based on transformers.

This web interface utilizes the Gradio library to apply Stable Diffusion into a visualized web application. Users can generate high-quality images with simple prompt texts.

The main features of this project include:

Original text-to-image and image-to-image generation modes
One-click installation and running scripts for easy startup
Support for expansions, repairs, completions and other functions
Rich interfaces for adjusting generation parameters
Support for multiple post-processing models to enhance generated image quality
Training custom embedding vectors and other capabilities
Various extension scripts provided by the community
Optimized inference speed that can run in low memory environments

This project was created and maintained by Github user AUTOMATIC1111 and adopts the AGPL-3.0 open source protocol. It greatly facilitates the deployment and use of Stable Diffusion on local machines, provides abundant capabilities, and is one of the preferred tools for image generation based on this model. An active community continually adds new features and maintenance to it.

Installation and Deployment of the WebUI

Refer to https://github.com/AUTOMATIC1111/stable-diffusion-webui.git for installation and deployment.

Parameter Introduction

Prompt - Text prompt used to describe the content, style and other information about the target generated image.
Negative Prompt - Excluded text used to indicate undesired content in the generated image.
Steps - The number of iterations for image generation. More steps lead to higher image quality.
Sampling Method - Sampling method that affects image quality and style.
Seed - Random number seed used to control diversity of generated results.
Size - Resolution size of the generated image.
Model - Selection of the Stable Diffusion model variant to use.
Strength - Strength guiding the image generation's conformity to the prompt.
Scale - Controls the scaling extent of the generated image's style.
CFG Scale - Controls the scaling ratio of the text encoder output.
Batch Size - The number of images generated simultaneously.
Batch Count - The number of batches generated.
Text-to-Image (txt2img) - The most basic function that generates images corresponding to text descriptions input directly in the Prompt box. Supports controlling style, content, etc.
Image-to-Image (img2img) - Input an image and process it with the model to generate a revised version. Supports completions, expansions, style adjustments, etc.
Outpainting - Based on an image, expand the boundary area to generate a larger image.
Inpainting - Repair blocked or damaged areas in an image to make it complete.
Color Sketch - Input a line sketch and generate a color image.
Stable Diffusion Upscale - Use the model for super-resolution processing of images.
Attention - Use special syntax to emphasize key content in text that the model will focus on.
Prompt Matrix - Automatically generate image grids through matrix arrangements of different prompts.
Loopback - Input images into the model multiple times for iterative optimization.
CLIP Interrogator - Analyze images to determine the most likely generation prompt.
Seamless - Automatically process edges of generated images for seamless stitching.

Prompt Techniques

Beginner Prompt: Refining Direct Descriptions

When describing, be as specific as possible. For example, instead of "a happy dog and a cute girl", use "a joyful golden retriever playing with a smiling girl in a sunny park". Such detailed descriptions help the model more accurately capture your creative intent.

Intermediate Prompt: Expanding Tags

Now let's further improve the quality of this painting by using tags to continue optimizing. "best quality, masterpiece, a happy dog and a cute girl, watercolor style". In addition to "best quality" and "masterpiece", you can add more specific artistic styles or detail descriptions like "vibrant colors, intricate details". For example, "vibrant colors, intricate details, best quality, masterpiece, a happy dog and a cute girl, watercolor style".
Extensions: Explore tags for different art movements like "impressionist, surrealism, or baroque style", and styles of specific artists like "in the style of Van Gogh or Picasso".

Advanced Prompt: Deepening Use of Negative Prompts

When using negative prompts, specify unwanted elements more precisely, like "no crowds, avoid oversaturation, no photorealism".
Extensions: Use negative prompts to exclude common AI-generated errors, such as "no floating objects, no mismatched perspectives".

Expert Prompt: Refining Text Weight Adjustments

When emphasizing specific elements with parentheses, combine with adjectives to enhance the effect, like "a happy (big dog) and a (tiny cute girl), watercolor style".
Extensions: Try comparing effects under different weights, like "(dog:1.5) and (girl:0.5)", to control relative importance of elements in the image.

Introducing LoRA: Innovative Application of Model Effects

When using LoRA, ensure the model filename and weight suitably match your creative goals, like "lora:artistic_model:1.2".
Extensions: Experiment with different LoRA models to explore visual effects, like "lora:cinematic_effect:1.0" or "lora:dreamy_landscape:1.5", to create unique artworks.

Case Study - Generating Comics (LoRA)

LoRA (Long Range Arena) is a new method for image generation. Its basic principles are:

LoRA uses a similar Diffusion model structure to Stable Diffusion, including Encoder, Decoder, UNet etc.
LoRA proposes a new auto-regressive strategy that can capture longer range dependencies.
During training, LoRA trains by predicting distant tokens in the sequence, not just adjacent tokens.
During inference, LoRA samples sequences at different strides and combines them into the full sequence, achieving longer range dependency modeling.
LoRA also designed a Transformer-like cross-layer attention mechanism for dependencies between layers.

Through these characteristics, LoRA can model richer long-range dependencies and generate more coherent, reasonable images.

Base Model Used: https://civitai.com/models/9409?modelVersionId=30163

LoRA Used: https://civitai.com/models/88201?modelVersionId=93864

To construct a comic panel story narrating a girl's travels, a simple narrative flow can be:

Panel 1: Setting Off

Visuals: A little girl with a large backpack standing in front of her home doorway. Her cat sits by her feet gazing up at her.
Text: "Little Li has prepared for her adventure, though farewells at the doorstep are always bittersweet."
Prompt: [provided]
Negative prompt: [provided]

Panel 2: Train Station

Visuals: The girl sitting pensively on a bench at a small train station, looking expectantly down the railroad tracks.
Text: "The bustling train station fills Little Li's heart with excitement for the journey ahead."
Prompt: [provided]
Negative prompt: [provided]

Panel 3: Ancient City Exploration

Visuals: The girl gazing up in wonder at immense castle gates blocking her view inside.
Text: "The mysteries of the ancient city call to Little Li, with every stone telling a story."
Prompt: [provided]
Negative prompt: [provided]

Panel 4: Among Mountain Valleys

Visuals: The girl dancing and spinning happily in a lush green valley.
Text: "Surrounded by the vibrant valley, Little Li feels the power and beauty of nature."
Prompt: [provided]
Negative prompt: [provided]

Panel 5: Seaside Sunset

Visuals: The girl sitting on the beach gazing at the sunset on the horizon.
Text: "Where golden sun meets sea, Little Li is drawn in by the majestic view."
Prompt: [provided]
Negative prompt: [provided]

Panel 6: Night Market Lights

Visuals: The girl wandering through a lively night market surrounded by stalls and lanterns.
Text: "Beneath the dazzling lights, Little Li samples foods, each bite a new experience."
Prompt: [provided]
Negative prompt: [provided]

Each panel highlights a specific

DEV Community