DEV Community

Cover image for [Day 11] I turned my cat into anime art — and the AI drew a human girl instead. One photo through IPAdapter pulls it back to a cat
PEPPERCORN
PEPPERCORN

Posted on

[Day 11] I turned my cat into anime art — and the AI drew a human girl instead. One photo through IPAdapter pulls it back to a cat

Intro

Day 11! Back to cats 🐱

I took one photo of my cat (a black-and-white tuxedo boy) as a reference and had AI restyle him into anime, ukiyo-e, oil painting, and more.

The goal: change only the style while keeping "my cat" recognizable. But left alone, the AI started drawing humans instead of a cat. Here's what I did, step by step.

What I used: my home AI machine (DGX Spark) + an image-generation tool (ComfyUI) + one photo of my cat.


The reference is this one photo

A tomcat my family looks after for me, with yellow eyes and a slightly grumpy look.

The reference photo of my cat

Love that face. I'll turn him into various styles while keeping him recognizable as "my cat."


First, anime from text alone → a human

I started with no photo, just text: "a tuxedo cat, anime key visual." I clearly said cat.

Anime from text alone, no reference photo

Here's what came out. …A human girl.

Black hair, white collar. My cat's tuxedo pattern (black body, white chest) turned straight into clothing.


Next, I added the reference photo → still human

So I hand over the cat photo as a visual reference. The tool that applies it is IPAdapter.

What's the reference-photo trick (IPAdapter)? A tool that lets you pass a reference image, separate from the text prompt, and say "make it look like this." It's what preserves my cat's colors and face.

Surely this makes it a cat… nope. Still human.

Even with the reference photo added, still a human

And this habit wasn't limited to anime. Ask the same anime-style model for ukiyo-e or oil painting, and you still get anime-ish humans. It hijacks not just the subject (the cat), but the art style too.

Ask for ukiyo-e or oil painting, you still get anime-style humans

Left: an "ukiyo-e" that's really an anime woman in a kimono. Right: an "oil painting" that's an anime woman in a tuxedo. Both are "humans painted in the cat's colors."


I tuned the settings → finally a cat

On top of the photo, I turned up its strength and added "don't draw humans" to the negatives (details below). That's when it finally became a sitting cat.

Photo plus tuned settings finally gives a cat


Why does it turn into a human?

Two reasons, as far as I can tell.

One: anime-savvy models tend to draw people, girls especially. Even with "cat" in the prompt, they drift toward a human if you let them.

Two: my cat's pose. He sits bolt upright, almost like a person, so the harder you push the reference, the more that upright posture rides along — tipping toward an "anthropomorphized" cat. The pop-art piece later is exactly that leftover.


Cyberpunk flipped to a cat with the photo alone

The interesting part: whether the photo alone was enough depended on the model. Anime was stubborn and needed tuning, but cyberpunk became a cat just by adding the photo.

Cyberpunk: same prompt, with vs. without the reference photo

Left (no reference): a human man in a neon city. Right (with reference): a cat with glowing ears.

I didn't change a single character of the prompt — the photo being there or not is the only difference between human and cat.


The styles that came out

Here's the gallery after the human problem was fixed — all with the reference photo, my cat as the base.

Gallery of 7 styles

Top row, left to right: anime, ukiyo-e, oil painting (Van Gogh-ish), stained glass. Bottom row: cyberpunk, 3D (Pixar-ish), pop art.


"Likeness" and "style" are a tug-of-war

The oddly real 3D Pixar one shows this little trade-off nicely.

3D style: without (left) and with (right) the reference

Left (no reference): a cute 3D cat, but "some cat." Right (with reference): it becomes my cat's face, but the 3D look washes out into basically a real photo.

Weaken the reference and the style shows but it's a different cat; strengthen it and it's my cat but the style fades. Finding that grip per style is what the tuning really is.


The boss I couldn't beat: storybook watercolor

"Gentle storybook watercolor" was the one style I never got to be a cat. Here's the result of seven retries.

Storybook failures: a person, two cats, a cat-girl

A human, then somehow two cats, then a cat-eared girl holding a cat. "Single + watercolor + cat" wouldn't line up. Lower the reference → human; raise it → two cats. "Storybook" must be soaked in human imagery. Carrying this over.


The details

Here are the details.

The reference-photo mechanism (IPAdapter)

I added a custom node called ComfyUI_IPAdapter_plus to ComfyUI. It lets you hand over a reference image as a "visual guide," separate from the text prompt.

  • Model used: ip-adapter_sd15 (44.6MB, from h94/IP-Adapter)
  • The part that reads the image features: CLIP-ViT-H (reused an existing one)
  • The reference photo is cropped to a 768px square before handing it over

A number called the "reference strength (weight)" controls how closely it mimics. I moved between roughly 0.7 and 0.85 depending on the style.

What I did to suppress the "human" problem

I started at weight 0.7 plus words like "key visual" and "big eyes," which strongly invited humans. Three fixes:

  1. Raise the reference strength to 0.85
  2. Add human, girl, person, 1girl, humanoid to the "things I don't want drawn" list
  3. Strip human-summoning words from the request and emphasize tuxedo cat, full body, animal

That corrected anime, ukiyo-e, and oil painting into cats. One catch: the phrase "tuxedo cat" itself tends to put an actual tuxedo (a suit) on the cat, so it cut both ways.

The base models I used

I switched the underlying image model by style.

  • Anime / illustration: AnythingV5
  • Realistic / 3D: Realistic Vision V6
  • Plain base: SD 1.5 (base)

When storybook failed, switching to the plain base gave a real cat but weak watercolor feel, and raising the strength split it into two cats — a real bind. The base model's "habits" matter a lot.

Common generation settings

Across all styles: 768px, 30 steps, sampler dpmpp_2m karras, cfg 7, seed fixed at 110011. I only varied the text request and the reference strength, keeping everything else equal for a fair comparison. Generation is fired at ComfyUI from a small script I wrote.


Next up

Next time it's cats again — and this time I'm planning video generation 🐱

100ExperimentsWithDGX #LocalLLM

Top comments (0)