Xiaohongshu's FireRed-Image-Edit-1.1 Tops Charts at Launch! 7.94 Score Beats Alibaba's Qwen-Image-Edit-2511
Open-source image editing has a new SOTA champion.
TL;DR: Xiaohongshu released FireRed-Image-Edit-1.1 on March 3rd, surpassing Alibaba's Qwen-Image-Edit-2511 (released in December) across 5 authoritative benchmarks with a score of 7.943, setting a new record for open-source image editing models. Achieves SOTA-level performance in identity consistency, multi-element fusion, and portrait makeup.
01 The Battle for Open-Source Image Editing SOTA
2026 has been a year of fierce competition in image editing.
On December 23rd, Alibaba's Qwen team released Qwen-Image-Edit-2511, scoring 7.877 (GEdit-EN) to claim the top spot in open-source rankings.
Just 2 months later, Xiaohongshu delivered a surprise.
On March 3rd, Xiaohongshu's foundation model team released FireRed-Image-Edit-1.1, scoring 7.943 to set a new record.
Even more impressive: FireRed-Image-Edit-1.1 leads across all 5 authoritative benchmarks without a single loss:
| Metric | FireRed-1.1 | Qwen-2511 | Lead |
|---|---|---|---|
| GEdit (EN) | 7.943 | 7.877 | +0.066 |
| GEdit (CN) | 7.887 | 7.819 | +0.068 |
| ImgEdit | 4.56 | 4.51 | +0.05 |
| REDEdit (EN) | 4.26 | 4.23 | +0.03 |
| REDEdit (CN) | 4.33 | 4.18 | +0.15 |
Honestly, this lead is quite significant at the SOTA level. Especially the 0.15-point lead in Chinese REDEdit, indicating FireRed's advantage in Chinese scene understanding.
02 Identity Consistency: Best-in-Class Portrait Editing
What's the biggest headache in image editing? People's faces change when you edit them.
You change the clothes in a photo, and the face shape changes; change the background, and the facial features change too. This "edit-equals-deformation" problem has always been a pain point for image editing models.
FireRed-Image-Edit-1.1's solution is straightforward: SOTA-level identity consistency.
FireRed-1.1 scores 4.33 (Chinese) and 4.26 (English) on the REDEdit-Bench benchmark, claiming the open-source top spot. This comprehensive score includes identity consistency, instruction following, visual quality, and more.
What does this mean?
- Changing clothes: Excellent identity preservation
- Changing backgrounds: Complete retention of facial details
- Adding accessories: Original features not overwritten
Compared to Qwen-Image-Edit-2511's 4.18 (Chinese), FireRed-1.1 indeed excels in identity preservation.
03 Agent Intelligence: 10+ Elements Auto-Fusion
Consider this complex editing instruction:
"Place the man from image 2, wearing the black 'New York Bears' baseball jacket and camouflage pants and blue-black AJ1 high-top sneakers from image 2, on the spacious football field from image 1. The field is sunny, he's wearing the black cap from image 2 with a red brim... casually carrying the vintage brown leather travel bag from image 3 on his left shoulder... and easily dragging the white skateboard from image 3 with his right hand..."
How do traditional models handle such complex edits with 10+ elements?
The harsh answer: Segmented processing, multiple iterations, manual stitching—inefficient with poor results.
FireRed-Image-Edit-1.1's approach is smarter: Agent auto-processing.
The built-in Agent module automatically completes three steps:
- ROI Detection - Calls Gemini function-calling model to identify key regions in each image
- Crop & Stitch - Automatically crops and stitches into 2-3 composite images (~1024×1024)
- Instruction Rewriting - Automatically rewrites user instructions to ensure correct image references
The entire process requires no manual intervention—complex edits completed with one click.
Compared to Qwen-Image-Edit-2511 (supports multiple inputs), FireRed-1.1's Agent solution is clearly more intelligent.
04 Professional Makeup: Dozens of Makeup Styles
Makeup editing has always been the "deep end" of image editing.
Why is it difficult?
- Many makeup details (eyebrows, eyeshadow, lipstick, blush, highlights)
- Large style differences (Western makeup vs Japanese/Korean makeup vs Chinese makeup)
- Difficult skin tone adaptation (yellow skin, white skin, olive skin have different effects)
FireRed-Image-Edit-1.1's solution: Professional makeup LoRA models.
The official release includes specialized makeup LoRAs supporting dozens of makeup styles:
- Western Y2K Makeup: Cool-toned matte foundation, deep brown arched brows, silver-gray eyeshadow, mirror-finish glass lip gloss
- Satin Finish Base: Natural satin foundation, light brown brow powder, deep brown eyeshadow, moisturizing bean paste lipstick
- Halloween Witch Makeup, Creative Makeup, etc.
This "professional-grade" makeup editing is the first of its kind in open-source models.
05 Technical Approach Comparison: FireRed vs Qwen
What are the differences in technical approaches?
FireRed-Image-Edit-1.1
Training Data: 1.6B samples (900M T2I + 700M editing pairs)
Training Pipeline:
- Pretrain - Pre-training phase, establishing basic generation capabilities
- SFT - Supervised fine-tuning, injecting editing capabilities
- RL - Reinforcement learning, optimizing identity consistency and instruction following
Key Technologies:
- Multi-Condition Aware Bucket Sampler
- Asymmetric Gradient Optimization for DPO
- DiffusionNFT with layout-aware OCR rewards
- Consistency Loss for identity preservation
Qwen-Image-Edit-2511
Training Data: Not disclosed
Training Pipeline: Based on Qwen-Image-2512's MMDiT architecture
Key Technologies:
- MMDiT (Multimodal Diffusion Transformer)
- Native Chinese text rendering
- Unified architecture with Qwen-Image-2512
Comparison Conclusion:
FireRed is more transparent in training data scale and technical details, while Qwen has advantages in architecture unification and Chinese text rendering.
06 Engineering Optimization: 4.5s/Image, 30GB VRAM
Accuracy alone isn't enough—engineering deployment is key.
FireRed-Image-Edit-1.1's engineering optimization is quite solid:
- Inference Speed: 4.5s/image (optimized) based on v1.0 data
- VRAM Requirement: 30GB (optimized) based on v1.0 data
- Acceleration: Full support for distillation, quantization, static compilation
Compared to Qwen-Image-Edit-2511:
- Specific VRAM and speed data needs verification
- Has LightX2V providing 42.55x acceleration support for Qwen
Conclusion: FireRed-1.1 is more mature in engineering optimization; Qwen has acceleration solutions but requires additional configuration.
07 Open-Source Ecosystem: Fully Open Apache 2.0
Both use Apache 2.0 license, meaning:
✅ Commercial use allowed
✅ Code modification allowed
✅ Distribution allowed
✅ No requirement to open-source derivative works
FireRed-Image-Edit-1.1 Ecosystem:
- GitHub Stars: 600+ (as of 2026.03.03)
- HuggingFace: Released
- ModelScope: Released
- ComfyUI: Official node support
- Technical Report: arXiv:2602.13344
Qwen-Image-Edit-2511 Ecosystem:
- GitHub Stars: Needs verification
- HuggingFace: Released
- ModelScope: Released
- ComfyUI: Community support
- Technical Report: Needs verification
Conclusion: FireRed ecosystem is newer, Qwen ecosystem is more mature.
08 Summary: SOTA Changes Hands, But Competition Just Began
The release of FireRed-Image-Edit-1.1 has indeed set a new SOTA for open-source image editing.
Leading across all 5 benchmarks, achieving new heights in identity consistency, multi-element fusion, and portrait makeup.
But this is just the beginning.
Alibaba's Qwen team released version 2511 in December, Xiaohongshu released version 1.1 in March—the "arms race" in open-source image editing has just begun.
What to expect next:
- Will Qwen release a 2603 version to counter?
- Will FireRed continue iterating with 1.2, 1.3?
- Will other teams (Stability, Midjourney open-source) join the battle?
The SOTA battle in open-source image editing—the best is yet to come.
What's your take on the FireRed vs Qwen SOTA battle?
Feel free to comment and discuss the future of open-source image editing.






Top comments (0)