PEPPERCORN

Posted on Jun 4

[Day 11] I turned my cat into anime art — and the AI drew a human girl instead. One photo through IPAdapter pulls it back to a cat

#localllm #ai #dgxspark #stablediffusion

Intro

Day 11! Back to cats 🐱

I took one photo of my cat (a black-and-white tuxedo boy) as a reference and had AI restyle him into anime, ukiyo-e, oil painting, and more.

The goal: change only the style while keeping "my cat" recognizable. But left alone, the AI started drawing humans instead of a cat. Here's what I did, step by step.

What I used: my home AI machine (DGX Spark) + an image-generation tool (ComfyUI) + one photo of my cat.

The reference is this one photo

A tomcat my family looks after for me, with yellow eyes and a slightly grumpy look.

Love that face. I'll turn him into various styles while keeping him recognizable as "my cat."

First, anime from text alone → a human

I started with no photo, just text: "a tuxedo cat, anime key visual." I clearly said cat.

Here's what came out. …A human girl.

Black hair, white collar. My cat's tuxedo pattern (black body, white chest) turned straight into clothing.

Next, I added the reference photo → still human

So I hand over the cat photo as a visual reference. The tool that applies it is IPAdapter.

What's the reference-photo trick (IPAdapter)? A tool that lets you pass a reference image, separate from the text prompt, and say "make it look like this." It's what preserves my cat's colors and face.

Surely this makes it a cat… nope. Still human.

And this habit wasn't limited to anime. Ask the same anime-style model for ukiyo-e or oil painting, and you still get anime-ish humans. It hijacks not just the subject (the cat), but the art style too.

Left: an "ukiyo-e" that's really an anime woman in a kimono. Right: an "oil painting" that's an anime woman in a tuxedo. Both are "humans painted in the cat's colors."

I tuned the settings → finally a cat

On top of the photo, I turned up its strength and added "don't draw humans" to the negatives (details below). That's when it finally became a sitting cat.

Why does it turn into a human?

Two reasons, as far as I can tell.

One: anime-savvy models tend to draw people, girls especially. Even with "cat" in the prompt, they drift toward a human if you let them.

Two: my cat's pose. He sits bolt upright, almost like a person, so the harder you push the reference, the more that upright posture rides along — tipping toward an "anthropomorphized" cat. The pop-art piece later is exactly that leftover.

Cyberpunk flipped to a cat with the photo alone

The interesting part: whether the photo alone was enough depended on the model. Anime was stubborn and needed tuning, but cyberpunk became a cat just by adding the photo.

Left (no reference): a human man in a neon city. Right (with reference): a cat with glowing ears.

I didn't change a single character of the prompt — the photo being there or not is the only difference between human and cat.

The styles that came out

Here's the gallery after the human problem was fixed — all with the reference photo, my cat as the base.

Top row, left to right: anime, ukiyo-e, oil painting (Van Gogh-ish), stained glass. Bottom row: cyberpunk, 3D (Pixar-ish), pop art.

"Likeness" and "style" are a tug-of-war

The oddly real 3D Pixar one shows this little trade-off nicely.

Left (no reference): a cute 3D cat, but "some cat." Right (with reference): it becomes my cat's face, but the 3D look washes out into basically a real photo.

Weaken the reference and the style shows but it's a different cat; strengthen it and it's my cat but the style fades. Finding that grip per style is what the tuning really is.

The boss I couldn't beat: storybook watercolor

"Gentle storybook watercolor" was the one style I never got to be a cat. Here's the result of seven retries.

A human, then somehow two cats, then a cat-eared girl holding a cat. "Single + watercolor + cat" wouldn't line up. Lower the reference → human; raise it → two cats. "Storybook" must be soaked in human imagery. Carrying this over.

The details

Here are the details.

The reference-photo mechanism (IPAdapter)

I added a custom node called ComfyUI_IPAdapter_plus to ComfyUI. It lets you hand over a reference image as a "visual guide," separate from the text prompt.

Model used: ip-adapter_sd15 (44.6MB, from h94/IP-Adapter)
The part that reads the image features: CLIP-ViT-H (reused an existing one)
The reference photo is cropped to a 768px square before handing it over

A number called the "reference strength (weight)" controls how closely it mimics. I moved between roughly 0.7 and 0.85 depending on the style.

What I did to suppress the "human" problem

I started at weight 0.7 plus words like "key visual" and "big eyes," which strongly invited humans. Three fixes:

Raise the reference strength to 0.85
Add human, girl, person, 1girl, humanoid to the "things I don't want drawn" list
Strip human-summoning words from the request and emphasize tuxedo cat, full body, animal

That corrected anime, ukiyo-e, and oil painting into cats. One catch: the phrase "tuxedo cat" itself tends to put an actual tuxedo (a suit) on the cat, so it cut both ways.

The base models I used

I switched the underlying image model by style.

Anime / illustration: AnythingV5
Realistic / 3D: Realistic Vision V6
Plain base: SD 1.5 (base)

When storybook failed, switching to the plain base gave a real cat but weak watercolor feel, and raising the strength split it into two cats — a real bind. The base model's "habits" matter a lot.

Common generation settings

Across all styles: 768px, 30 steps, sampler dpmpp_2m karras, cfg 7, seed fixed at 110011. I only varied the text request and the reference strength, keeping everything else equal for a fair comparison. Generation is fired at ComfyUI from a small script I wrote.

Next up

Next time it's cats again — and this time I'm planning video generation 🐱

100ExperimentsWithDGX #LocalLLM

DEV Community