DEV Community: Mild King

How to train FLUX.1 for custom emoji generation — dataset size, script, and deployment?

Mild King — Tue, 08 Apr 2025 07:10:55 +0000

I'm working on a personal project where I want to generate custom emoji-style images from text prompts — like turning this:

Flying pig → 🐖 with wings
(see cover image!)

I'm using black-forest-labs/FLUX.1-dev as the base model. It’s a diffusion model similar to Stable Diffusion, but optimized for low-VRAM generation.

What I have:

~25k 512x512 emoji-style images
Captions for each (in .txt files)
A train.json mapping image to caption

dataset/
├── images/image_001.png,...
├── captions/caption_001.txt,...
└── train.json  # [{ "image": "images/image_001.png", "caption": "captions/caption_001.txt" }, ...]

What I need help with:

How many images is “enough”? Is 25k too much or just fine?
Any working training script for FLUX.1?
- I tried one (PyTorch + diffusers), but outputs look like noise.
Best training config?
- Should I freeze VAE/text encoder?
- Recommended batch size, LR, etc?
How do I export the model to ONNX or TFLite?

Planning to use it in a Flutter app later.
A sample setup or any advice would be helpful for beginners to get started.