Style transfer sounds simple in theory. You take an image, apply an artistic style, done. Except when your input is a photo of someone's actual pet, "done" is the beginning of the problem.
I've been building Pet Imagination, an AI pet art generator with 9 artistic styles. The pitch is straightforward: upload a photo of your dog, cat, bird, whatever, pick a style like Watercolor or Renaissance, and get back a portrait that actually looks like your pet. Not a generic cute animal wearing a costume.
Turns out that last part is really, really hard.
The Identity Problem
Most style transfer approaches treat the content image as a loose suggestion. You feed in a photo of a French Bulldog with a distinctive underbite and brindle pattern, and out comes... a vaguely dog-shaped blob in the target style. The "artistic interpretation" ate everything that made the dog recognizable.
This is fine for landscapes. Nobody cares if the mountains shift a bit. But pet owners notice everything. The ear shape is wrong? They'll tell you. The markings switched sides? Absolutely not. The expression changed from "skeptical" to "happy"? That's not their dog anymore.
We went through dozens of iterations trying to solve this. Early versions were basically unusable. The AI would latch onto the style so hard that breed-specific features just vanished. A Dalmatian without spots. A Husky with floppy ears. A Siamese cat that came out tabby.
Why Renaissance Works Better Than Anime
Here's something I didn't expect: some art styles are inherently better at preserving identity than others.
Renaissance style turned out to be one of the most reliable. It makes sense when you think about it. Renaissance portraiture was obsessed with capturing the exact likeness of the subject. The style itself demands accurate proportions, realistic fur textures, and faithful color reproduction. The AI just needs to add dramatic lighting, a dark background, and maybe a lace collar. The subject stays intact.
Anime, on the other hand, was a nightmare. The anime aesthetic actively fights against individual features. It wants to simplify. Round the eyes, smooth the fur, standardize the proportions. Every cat wants to become the same cat. We had to significantly constrain the style influence to keep breed characteristics visible, and even then it's the style most likely to drift.
Watercolor sits in a nice middle ground. The loose brushwork actually helps because it doesn't need pixel-perfect accuracy, but the composition stays true to the original. Sketch works similarly. The abstraction is in the rendering technique, not in the subject's features.
The costume styles (Sheriff, Wizard, Astronaut) introduced a completely different challenge. You're adding elements that don't exist in the source photo. A sheriff hat, a wizard robe, a space helmet. The AI has to figure out where the pet ends and the costume begins, and it has to do that without obscuring the face. We found that any costume element overlapping with the pet's face would tank recognition. So the generation pipeline specifically protects the facial region.
The 4K Upscale Problem
Generation at base resolution is one thing. Pushing to 4K for print-quality output is another game entirely.
The naive approach is generate-then-upscale. Run the style transfer at a manageable resolution, then use a super-resolution model to blow it up to 4K. Problem: super-resolution models are trained on photos, not art. They try to "fix" the artistic style by sharpening brushstrokes into photo-realistic textures. Your watercolor portrait suddenly has crispy fur detail that breaks the whole aesthetic.
We ended up needing style-aware upscaling that respects the artistic treatment. Watercolor gets soft upscaling that preserves the bleed edges. Sketch keeps the pencil texture without adding phantom detail. Renaissance gets the most aggressive sharpening because it wants that level of detail.
Processing time was another constraint. Pet owners aren't patient. They uploaded a photo for fun, probably on their phone, probably with three other tabs open. If it takes more than 60 seconds, they're gone. Getting 4K output within that window required some aggressive optimization on the pipeline side.
What Actually Ships
The live product at petimagination.com does 9 styles: Watercolor, Renaissance, Anime, Sketch, Sheriff, Wizard, Astronaut, Final Boss, and Blocky. Each one was individually tuned. There's no universal "style strength" slider because the right balance is different for every style.
It accepts any pet species. Dogs and cats are the majority of uploads, obviously, but we've had birds, rabbits, reptiles, hamsters. The identity preservation challenge scales with how distinctive the animal looks. A golden retriever is harder than a parrot because golden retrievers look more similar to each other than parrots do.
No account required. No watermarks. Print-ready output with optional 4K upscale. The whole thing runs in the browser, upload to download in under 60 seconds.
Lessons for Anyone Doing Style Transfer
If you're working on something similar, a few things I'd pass along:
Test with breed pairs, not random images. Get two photos of different dogs of the same breed and verify the output looks different. If your style transfer makes two different Golden Retrievers look identical, your identity preservation is broken.
Protect the face region explicitly. Whatever else you do with style strength, dial it back around the eyes and muzzle. That's where recognition lives.
Different styles need different pipelines. A single model with a style parameter will always compromise. Renaissance and Anime are so fundamentally different in what they preserve vs. abstract that treating them the same way guarantees mediocre results for both.
Upscaling is not an afterthought. If you're promising print quality, your upscaling strategy needs to be style-aware from day one. Bolting it on later creates artifacts that are obvious at print size.
This is one of ~14 products I'm building at Inithouse. We're a small team running lean experiments across different niches. Some of our other projects: Magical Song for AI-generated custom songs, Be Recommended for checking your AI visibility score, and Watching Agents, a prediction platform where AI agents track questions about the future.
If you're building something with generative AI and hitting similar identity-preservation challenges, I'd love to hear how you're solving it.
Top comments (0)