DEV Community

Cover image for BF16 vs GGUF, FP8 Scaled, NVFP4 Speed & Quality Compared + ComfyUI CUDA 13 Gains + FLUX 2 Klein 9B
Furkan Gözükara
Furkan Gözükara

Posted on

BF16 vs GGUF, FP8 Scaled, NVFP4 Speed & Quality Compared + ComfyUI CUDA 13 Gains + FLUX 2 Klein 9B

BF16 vs GGUF, FP8 Scaled, NVFP4 Speed & Quality Compared + ComfyUI CUDA 13 Gains + FLUX 2 Klein 9B

Full tutorial link > https://www.youtube.com/watch?v=XDzspWgnzxI

Compared Quality and Speed Difference (with CUDA 13 & Sage Attention) of BF16 vs GGUF Q8 vs FP8 Scaled vs NVFP4 for Z Image Turbo, FLUX Dev, FLUX SRPO, FLUX Kontext, FLUX 2

BF16 vs GGUF, FP8 Scaled, NVFP4 Speed & Quality Compared + ComfyUI CUDA 13 Gains + FLUX 2 Klein 9B

Video Info

It was always wondered how much quality and speed difference exists between BF16, GGUF, FP8 Scaled and NVFP4 precisions. In this tutorial I have compared all these precision and quantization variants for both speed and quality. The results are pretty surprising. Moreover, we have developed and published NVFP4 model quant generator app and FP8 Scaled quant generator apps. The links of the apps are below if you want to use them. Furthermore, upgrading ComfyUI to CUDA 13 with properly compiled libraries is now very much recommended. We have observed some noticeable performance gains with CUDA 13. So for both SwarmUI and ComfyUI solo users, CUDA 13 ComfyUI is now recommended.

📂 Resources & Links:

⏱️ Video Chapters:

  • 00:00:00 Introduction: GGUF Q8 vs NVFP4 vs BF16 vs FP8 Precision Comparison
  • 00:00:38 FP8 Quantization & New NVFP4 Model Quantizer App in Musubi Trainer
  • 00:01:08 The New FLUX SRPO Mixed NVFP4 Model & FLUX 2 Klein 9B Announcement
  • 00:01:56 Speed Comparison Setup: ComfyUI CUDA 13 & Compiled Libraries
  • 00:02:41 Z Image Turbo Speed Test: GGUF Q8 vs NVFP4 (87% Faster)
  • 00:03:09 Z Image Turbo Speed Test: BF16 vs FP8 Scaled vs GGUF Improvements
  • 00:03:32 Installing & Using Image Comparison Slider Tool for Quality Check
  • 00:03:55 Z Image Turbo Quality: BF16 vs GGUF Q8 vs FP8 Scaled
  • 00:04:13 Z Image Turbo Quality: NVFP4 Degradation Analysis
  • 00:04:27 FLUX 2 Dev Speed Test: GGUF Q8 vs NVFP4 (100% Faster)
  • 00:04:43 FLUX 2 Dev Speed Test: FP8 Scaled vs BF16 Performance
  • 00:05:12 FLUX 2 Dev Quality: BF16 vs GGUF Q8 vs Mixed FP8 Scaled
  • 00:05:38 FLUX 2 Dev Quality: NVFP4 Mixed Precision Analysis
  • 00:05:54 Benchmark Settings: 2048px Resolution & Quality 1 Preset Details
  • 00:06:25 FLUX 1 Dev Speed Test: GGUF Q8 vs NVFP4 (118% Faster)
  • 00:07:21 FLUX 1 Dev Speed Test: BF16 & FP8 Scaled Performance Stats
  • 00:07:42 FLUX 1 Dev Quality: BF16 vs GGUF Q8 vs FP8 Scaled
  • 00:07:55 FLUX 1 Dev Quality: NVFP4 Visual Degradation Review
  • 00:08:06 FLUX 1 Kontext Dev: Model Intro & Outpainting Tutorial Reference
  • 00:08:40 FLUX 1 Kontext Dev Speed: GGUF Q8 vs NVFP4 (93% Faster)
  • 00:08:59 FLUX 1 Kontext Dev Speed: BF16 & FP8 Scaled Comparisons
  • 00:09:12 FLUX 1 Kontext Dev Quality: Original vs Edited Image (Hair Change)
  • 00:09:36 FLUX 1 Kontext Dev Quality: BF16 vs GGUF Q8 vs FP8 Scaled
  • 00:09:51 How to Use SwarmUI Unified Model Downloader & Bundles
  • 00:10:36 Downloading Models via URL from CivitAI & Hugging Face to Cloud
  • 00:11:45 SECourses Musubi Trainer: Creating Custom FP8 Quantized Models
  • 00:12:44 The New FLUX SRPO NVFP4 Mixed Precision Model Overview
  • 00:13:15 Live Demo: FLUX SRPO NVFP4 Speed Test on RTX 5090 (5.7s)
  • 00:13:52 VRAM Usage Analysis: NVFP4 on RTX 5090 (14GB Usage)
  • 00:14:16 Live Comparison: BF16 Speed & VRAM Test on RTX 5090
  • 00:15:15 Troubleshooting: Fixing Low RAM/VRAM Issues with Arguments
  • 00:16:25 Why You Should Upgrade to ComfyUI CUDA 13 Version
  • 00:16:51 SimplePod AI: Updated Instructions & Template Setup
  • 00:17:29 RTX 6000 Blackwell Fix & nvitop Utilization Verification
  • 00:18:18 Conclusion, Contact Info & Support Channels

  • In this video, you will learn:

  • Speed differences between GGUF Q8, NVFP4, BF16, and FP8.

  • Visual quality analysis using the Image Comparison Slider.

  • How to use the new NVFP4 and FP8 Quantizer tools.

  • How to fix Low VRAM/RAM issues with specific arguments.

  • Performance benchmarks on RTX 5090 and RTX 6000.

Comparison Screenshots

01
02
03
04
05
06
07
08
09
10
11
12
13
13
14
15
16
17
18
19

Top comments (0)