For this experiment, I summarized the workflow of using Gemini's image generation model, "Nano Banana 2," to generate sprite sheets for a 2D action game and implement them into Unity, along with tips to stabilize the output. To jump straight to the conclusion: the results were far more practical than expected.
【The Prompt Used】
//{Prompt}
//2D side-scrolling pixel art style (dot-picture).
//Only requires the side view; no other directions.
//Create a female sword-wielding protagonist with long red hair.
//The style should be RPG-like, a fusion of medieval and modern fashion, making it cute.
//The character naturally has red hair, which changes to platinum blonde when holding a sword.
//Include a visual sequence for the transition from red to platinum blonde while drawing the sword.
//Must include the following actions:
//1. Idle Action
//2. Attack Actions (Combo 1,2,3)
//3. Walking/Runing Action
//4. Death Action
//5. Action of eating a steamed bun
//Pure black background for all of the above actions.
//Each action must have its corresponding animation frames.
【Basic Action Generation and the Strength of the "Infinite Gacha"】
First, regarding basic motions like idling and walking. In this regard, Gemini handles the task perfectly.
The biggest advantage of using Gemini is that you can endlessly "pull the gacha" (regenerate) up to 100 times a day until you are satisfied. The consistency of the character design is also exceptionally high; by continuing the chat, you can successively derive different action modules based on the same character.
Furthermore, based on a generated image, you can give instructions like, "Keep this part unchanged, but fix only this specific area." This makes it possible to repeatedly make minor adjustments while maintaining the overall design and vibe of the character. When creating a series of motions for the same character over the long term, this feature is indispensable.
【Challenges and Limitations of Complex Attack Combos】
While simple movements are perfect, once you specify "complex and continuous actions" like attack combos, it suddenly becomes difficult to maintain movement consistency.
Countermeasures and Compromises:
Instead of trying to generate complex actions in a single shot, break down the flow and give instructions frame by frame.
Subdivide the prompts into steps like "raise the sword" and "swing down," and relentlessly regenerate (loop) like a gacha game until you pull an output that is closest to your ideal.
While there is still significant room for improvement in the overall fluidity of the animation, looking at it frame by frame reveals pixel art of more than sufficient quality. Sometimes, it even renders effects that can be used directly as VFX.
Background Transparency Workflow (Black Background + Photoshop)
A clear issue at this point is that the AI alone cannot directly output a "PNG image with a transparent background."
Countermeasure:
Forcefully instruct the prompt to "make the background completely black (or a solid color like green)." After that, bring the generated image into a tool like Photoshop and manually cut out the black background to apply transparency. It takes a bit of extra effort, but this is the most reliable method at present.
Summary
Pros:
Simple movements (idle, walk, etc.) can be generated perfectly while maintaining consistency.
By continuing the chat, you can mass-produce derivative motions of the same character.
Partial modification instructions are possible while preserving the base image.
Cons (Challenges):
It is difficult to maintain continuity for complex actions (combos, etc.), requiring frame-by-frame instructions and trial-and-error (gacha).
Since it cannot output with a transparent background, specifying a solid color background and manually cutting it out with an external tool is mandatory.
While still in the trial-and-error stage, it already has sufficient potential as a tool to dramatically speed up prototype production. I plan to continue verifying and testing this going forward!
- X (Twitter): @kenjiDev9662 (I post daily devlogs!)
- Portfolio: KenjiDev Portoflio
- GitHub: Github@Kenji966
- BLUESKY: @kenjidev9662 (I post daily devlogs!)
- 🇯🇵 Japanese Post (Zenn): Zenn@kenji966

Top comments (0)