Below is YouTube link for step by step tutorial and a 1-Click to installer having very advanced Gradio APP to use newest Text-to-Image SANA Model on your Windows PC locally and also on cloud services such as Massed Compute, RunPod and free Kaggle.
This above tutorial covers the newest SANA 2K model and I predict SANA 4K model will be published as well. Sana 2K model is 4 MegaPixel so it can generate the following aspect ratio and resolutions very well:
-
“1:1”: (2048, 2048), “4:3”: (2304, 1792), “3:4”: (1792, 2304),
-
“3:2”: (2432, 1664), “2:3”: (1664, 2432), “16:9”: (2688, 1536),
-
“9:16”: (1536, 2688), “21:9”: (3072, 1280), “9:21”: (1280, 3072),
-
“4:5”: (1792, 2240), “5:4”: (2240, 1792)
I have developed an amazing Gradio app with so many new features :
-
VAE auto offloading to reduce VRAM usage significantly which is not exists on official pipeline
-
Gradio APP built upon official pipeline with improvements so works perfect
-
Batch size working perfect
-
Number of images working perfect
-
Multi-line prompting working perfect
-
Aspect ratios for both 1K and 2K models working perfect
-
Randomized seed working perfect
-
1-Click installers for Windows (using Python 3.10 and VENV — isolated), RunPod, Massed Compute and even a free Kaggle account notebook
-
With proper latest libraries working perfect speed on Windows too
-
Automatically properly saving every generated image into accurate folder
🔗 Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial) ⤵️
🔗 SECourses Official Discord 9500+ Members ⤵️
🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub ⤵️
🔗 SECourses Official Reddit — Stay Subscribed To Learn All The News and More ⤵️
🔗 Official Repository of NVIDIA Labs SANA Model ⤵️
Gradio APP and Installers
Example Images 4 MegaPixel Raw SANA 2K Model
Tutorial Video Chapters
-
0:00 Introduction to the published by NVIDIA SANA model step by step tutorial
-
2:48 How to install SANA model on Windows and start using
-
5:35 How to verify installation and save installation logs in case of an error to report back to us
-
6:03 How to start the APP after installation on Windows and how to use the SANA model properly
-
9:38 Where the generated images are saved in which folder
-
12:11 How to edit the styles that the APP has — prompting styles
-
12:59 How to install and use SANA APP and any of SECourses published AI apps on Massed Compute
-
14:17 How to select accurate category and the template image on Massed Compute cloud service
-
14:25 How to apply our SECourses coupon to get 50% price discount on Massed Compute — permanently working
-
14:46 How to install and setup ThinLinc client to transfer files and use Massed Compute cloud desktop PC
-
15:51 How to connect Massed Compute after initialized and install any AI scripts that we publish e.g. SANA model
-
19:05 How to start the application after it has been installed and use it on your PC (but it will run in Massed Compute server)
-
20:31 How to download individually and as a folder the generated images on Massed Compute to your computer
-
21:30 How to terminate Massed Compute to not spend any credits / money
-
22:03 How to install and use SANA APP and any of SECourses published AI apps on RunPod cloud service
-
24:43 How to start the SANA APP after installation has been completed on RunPod
-
26:34 The speed of RTX 4090 on RunPod for SANA 2K model 4 MegaPixel image generation
-
26:44 How to download individually and as a folder the generated images on RunPod to your computer
-
27:09 How to stop the pod and terminate to not waste any credits / money on RunPod
-
27:24 How to start the APP again that was previously installed on RunPod (not terminated only stopped pod)
-
27:34 How to use SANA APP on a free Kaggle account and any of my developed Kaggle notebooks
-
28:38 Selecting accurate session options on Kaggle like GPUs, accelerator and Internet On
-
29:06 How to run cells and install SANA APP or any APP on Kaggle
-
29:44 How to get Ngrok token and set it up and use it to connect SANA APP from Kaggle
-
30:57 How to download all generates images as a zip file on Kaggle
-
31:46 How to restart the SANA app on Kaggle or any AI APPs same logic
-
32:11 How to see how much GPU time you have left for free on Kaggle — 30 hours every week
Info About SANA
Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include:
1) DC-AE: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens.
(2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality.
(3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment.
(4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.
As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024 × 1024 resolution image. Sana enables content creation at low cost.
Top comments (0)