Skip to content

DEV Community

Furkan Gözükara

Posted on Jan 17

BF16 vs GGUF, FP8 Scaled, NVFP4 Speed & Quality Compared + ComfyUI CUDA 13 Gains + FLUX 2 Klein 9B

#ai #beginners #tutorial

BF16 vs GGUF, FP8 Scaled, NVFP4 Speed & Quality Compared + ComfyUI CUDA 13 Gains + FLUX 2 Klein 9B

Full tutorial link > https://www.youtube.com/watch?v=XDzspWgnzxI

Compared Quality and Speed Difference (with CUDA 13 & Sage Attention) of BF16 vs GGUF Q8 vs FP8 Scaled vs NVFP4 for Z Image Turbo, FLUX Dev, FLUX SRPO, FLUX Kontext, FLUX 2

Video Info

It was always wondered how much quality and speed difference exists between BF16, GGUF, FP8 Scaled and NVFP4 precisions. In this tutorial I have compared all these precision and quantization variants for both speed and quality. The results are pretty surprising. Moreover, we have developed and published NVFP4 model quant generator app and FP8 Scaled quant generator apps. The links of the apps are below if you want to use them. Furthermore, upgrading ComfyUI to CUDA 13 with properly compiled libraries is now very much recommended. We have observed some noticeable performance gains with CUDA 13. So for both SwarmUI and ComfyUI solo users, CUDA 13 ComfyUI is now recommended.

📂 Resources & Links:

📥 Download ComfyUI CUDA 13 Installer: [ https://www.patreon.com/posts/ComfyUI-Installers-105023709 ]
📥 SwarmUI & ComfyUI Unified Model Downloader: [ https://www.patreon.com/posts/SwarmUI-Install-Download-Models-Presets-114517862 ]
🤖 NVFP4 Model Quantizer App: [ https://www.patreon.com/posts/nvfp4-quantizer-app-148217625 ]
🤖 SECourses Musubi Trainer (FP8 Scaled Quantization App): [ https://www.patreon.com/posts/nvfp4-quantizer-app-148217625 ]
🛠️ Image Comparison Slider Tool: [ https://www.patreon.com/posts/image-video-comparison-slider-app-133935178 ]
☁️ SimplePod AI: [ https://simplepod.ai/ref?user=secourses ]
New Model FLUX 2 Klein 9B: [ https://huggingface.co/black-forest-labs/FLUX.2-klein-9B ]
🎥 FLUX 1 Kontext Dev Tutorial (inpaint - outpaint - image fix): [ https://youtu.be/XWzZ2wnzNuQ ]
🎥 Previous ComfyUI Installation Tutorial: [ https://youtu.be/yOj9PYq3XYM ]
How to Use SwarmUI Presets & Workflows in ComfyUI + Custom Model Paths Setup for ComfyUI & SwarmUI Tutorial: [ https://youtu.be/EqFilBM3i7s ]
SECourses Discord Channel for 7/24 Support: [ https://discord.com/invite/software-engineering-courses-secourses-772774097734074388 ]
SECourses Musubi Tuner Tutorial: [ https://youtu.be/DPX3eBTuO_Y ]
NVIDIA NVFP4 Blog Post to learn More: [ https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/ ]

⏱️ Video Chapters:

00:00:00 Introduction: GGUF Q8 vs NVFP4 vs BF16 vs FP8 Precision Comparison
00:00:38 FP8 Quantization & New NVFP4 Model Quantizer App in Musubi Trainer
00:01:08 The New FLUX SRPO Mixed NVFP4 Model & FLUX 2 Klein 9B Announcement
00:01:56 Speed Comparison Setup: ComfyUI CUDA 13 & Compiled Libraries
00:02:41 Z Image Turbo Speed Test: GGUF Q8 vs NVFP4 (87% Faster)
00:03:09 Z Image Turbo Speed Test: BF16 vs FP8 Scaled vs GGUF Improvements
00:03:32 Installing & Using Image Comparison Slider Tool for Quality Check
00:03:55 Z Image Turbo Quality: BF16 vs GGUF Q8 vs FP8 Scaled
00:04:13 Z Image Turbo Quality: NVFP4 Degradation Analysis
00:04:27 FLUX 2 Dev Speed Test: GGUF Q8 vs NVFP4 (100% Faster)
00:04:43 FLUX 2 Dev Speed Test: FP8 Scaled vs BF16 Performance
00:05:12 FLUX 2 Dev Quality: BF16 vs GGUF Q8 vs Mixed FP8 Scaled
00:05:38 FLUX 2 Dev Quality: NVFP4 Mixed Precision Analysis
00:05:54 Benchmark Settings: 2048px Resolution & Quality 1 Preset Details
00:06:25 FLUX 1 Dev Speed Test: GGUF Q8 vs NVFP4 (118% Faster)
00:07:21 FLUX 1 Dev Speed Test: BF16 & FP8 Scaled Performance Stats
00:07:42 FLUX 1 Dev Quality: BF16 vs GGUF Q8 vs FP8 Scaled
00:07:55 FLUX 1 Dev Quality: NVFP4 Visual Degradation Review
00:08:06 FLUX 1 Kontext Dev: Model Intro & Outpainting Tutorial Reference
00:08:40 FLUX 1 Kontext Dev Speed: GGUF Q8 vs NVFP4 (93% Faster)
00:08:59 FLUX 1 Kontext Dev Speed: BF16 & FP8 Scaled Comparisons
00:09:12 FLUX 1 Kontext Dev Quality: Original vs Edited Image (Hair Change)
00:09:36 FLUX 1 Kontext Dev Quality: BF16 vs GGUF Q8 vs FP8 Scaled
00:09:51 How to Use SwarmUI Unified Model Downloader & Bundles
00:10:36 Downloading Models via URL from CivitAI & Hugging Face to Cloud
00:11:45 SECourses Musubi Trainer: Creating Custom FP8 Quantized Models
00:12:44 The New FLUX SRPO NVFP4 Mixed Precision Model Overview
00:13:15 Live Demo: FLUX SRPO NVFP4 Speed Test on RTX 5090 (5.7s)
00:13:52 VRAM Usage Analysis: NVFP4 on RTX 5090 (14GB Usage)
00:14:16 Live Comparison: BF16 Speed & VRAM Test on RTX 5090
00:15:15 Troubleshooting: Fixing Low RAM/VRAM Issues with Arguments
00:16:25 Why You Should Upgrade to ComfyUI CUDA 13 Version
00:16:51 SimplePod AI: Updated Instructions & Template Setup
00:17:29 RTX 6000 Blackwell Fix & nvitop Utilization Verification
00:18:18 Conclusion, Contact Info & Support Channels
In this video, you will learn:
Speed differences between GGUF Q8, NVFP4, BF16, and FP8.
Visual quality analysis using the Image Comparison Slider.
How to use the new NVFP4 and FP8 Quantizer tools.
How to fix Low VRAM/RAM issues with specific arguments.
Performance benchmarks on RTX 5090 and RTX 6000.

Comparison Screenshots

Top comments (0)

Subscribe