Intro
Day 11! Back to cats 🐱
I took one photo of my cat (a black-and-white tuxedo boy) as a reference and had AI restyle him into anime, ukiyo-e, oil painting, and more.
The goal: change only the style while keeping "my cat" recognizable. But left alone, the AI started drawing humans instead of a cat. Here's what I did, step by step.
What I used: my home AI machine (DGX Spark) + an image-generation tool (ComfyUI) + one photo of my cat.
The reference is this one photo
A tomcat my family looks after for me, with yellow eyes and a slightly grumpy look.
Love that face. I'll turn him into various styles while keeping him recognizable as "my cat."
First, anime from text alone → a human
I started with no photo, just text: "a tuxedo cat, anime key visual." I clearly said cat.
Here's what came out. …A human girl.
Black hair, white collar. My cat's tuxedo pattern (black body, white chest) turned straight into clothing.
Next, I added the reference photo → still human
So I hand over the cat photo as a visual reference. The tool that applies it is IPAdapter.
What's the reference-photo trick (IPAdapter)? A tool that lets you pass a reference image, separate from the text prompt, and say "make it look like this." It's what preserves my cat's colors and face.
Surely this makes it a cat… nope. Still human.
And this habit wasn't limited to anime. Ask the same anime-style model for ukiyo-e or oil painting, and you still get anime-ish humans. It hijacks not just the subject (the cat), but the art style too.
Left: an "ukiyo-e" that's really an anime woman in a kimono. Right: an "oil painting" that's an anime woman in a tuxedo. Both are "humans painted in the cat's colors."
I tuned the settings → finally a cat
On top of the photo, I turned up its strength and added "don't draw humans" to the negatives (details below). That's when it finally became a sitting cat.
Why does it turn into a human?
Two reasons, as far as I can tell.
One: anime-savvy models tend to draw people, girls especially. Even with "cat" in the prompt, they drift toward a human if you let them.
Two: my cat's pose. He sits bolt upright, almost like a person, so the harder you push the reference, the more that upright posture rides along — tipping toward an "anthropomorphized" cat. The pop-art piece later is exactly that leftover.
Cyberpunk flipped to a cat with the photo alone
The interesting part: whether the photo alone was enough depended on the model. Anime was stubborn and needed tuning, but cyberpunk became a cat just by adding the photo.
Left (no reference): a human man in a neon city. Right (with reference): a cat with glowing ears.
I didn't change a single character of the prompt — the photo being there or not is the only difference between human and cat.
The styles that came out
Here's the gallery after the human problem was fixed — all with the reference photo, my cat as the base.
Top row, left to right: anime, ukiyo-e, oil painting (Van Gogh-ish), stained glass. Bottom row: cyberpunk, 3D (Pixar-ish), pop art.
"Likeness" and "style" are a tug-of-war
The oddly real 3D Pixar one shows this little trade-off nicely.
Left (no reference): a cute 3D cat, but "some cat." Right (with reference): it becomes my cat's face, but the 3D look washes out into basically a real photo.
Weaken the reference and the style shows but it's a different cat; strengthen it and it's my cat but the style fades. Finding that grip per style is what the tuning really is.
The boss I couldn't beat: storybook watercolor
"Gentle storybook watercolor" was the one style I never got to be a cat. Here's the result of seven retries.
A human, then somehow two cats, then a cat-eared girl holding a cat. "Single + watercolor + cat" wouldn't line up. Lower the reference → human; raise it → two cats. "Storybook" must be soaked in human imagery. Carrying this over.
The details
Here are the details.
The reference-photo mechanism (IPAdapter)
I added a custom node called ComfyUI_IPAdapter_plus to ComfyUI. It lets you hand over a reference image as a "visual guide," separate from the text prompt.
- Model used:
ip-adapter_sd15(44.6MB, from h94/IP-Adapter) - The part that reads the image features:
CLIP-ViT-H(reused an existing one) - The reference photo is cropped to a 768px square before handing it over
A number called the "reference strength (weight)" controls how closely it mimics. I moved between roughly 0.7 and 0.85 depending on the style.
What I did to suppress the "human" problem
I started at weight 0.7 plus words like "key visual" and "big eyes," which strongly invited humans. Three fixes:
- Raise the reference strength to 0.85
- Add
human, girl, person, 1girl, humanoidto the "things I don't want drawn" list - Strip human-summoning words from the request and emphasize
tuxedo cat, full body, animal
That corrected anime, ukiyo-e, and oil painting into cats. One catch: the phrase "tuxedo cat" itself tends to put an actual tuxedo (a suit) on the cat, so it cut both ways.
The base models I used
I switched the underlying image model by style.
- Anime / illustration:
AnythingV5 - Realistic / 3D:
Realistic Vision V6 - Plain base:
SD 1.5(base)
When storybook failed, switching to the plain base gave a real cat but weak watercolor feel, and raising the strength split it into two cats — a real bind. The base model's "habits" matter a lot.
Common generation settings
Across all styles: 768px, 30 steps, sampler dpmpp_2m karras, cfg 7, seed fixed at 110011. I only varied the text request and the reference strength, keeping everything else equal for a fair comparison. Generation is fired at ComfyUI from a small script I wrote.
Next up
Next time it's cats again — and this time I'm planning video generation 🐱









Top comments (0)