This is a simplified guide to an AI model called Pulid maintained by Zsxkib. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model overview
PuLID is a powerful text-to-image model developed by researchers at ByteDance Inc. Similar to other advanced models like Stable Diffusion, SDXL-Lightning, and BLIP, PuLID uses contrastive learning techniques to generate high-quality, customized images from textual prompts. Unlike traditional text-to-image models, PuLID has a unique focus on identity customization, allowing for fine-grained control over the appearance of generated faces and portraits.
Model inputs and outputs
PuLID takes in a textual prompt, as well as one or more reference images of a person's face. The model then generates a set of new images that match the provided prompt while retaining the identity and appearance of the reference face(s).
Inputs
- Prompt: A text description of the desired image, such as "portrait, color, cinematic, in garden, soft light, detailed face"
- Seed: An optional integer value to control the randomness of the generated images
- CF Scale: A scaling factor that controls the influence of the textual prompt on the generated image
- Num Steps: The number of iterative refinement steps to perform during image generation
- Image Size: The desired width and height of the output images
- Num Samples: The number of unique images to generate
- Identity Scale: A scaling factor that controls the influence of the reference face(s) on the generated images
- Mix Identities: A boolean flag to enable mixing of multiple reference face images
- Main Face Image: The primary reference face image
- Auxiliary Face Image(s): Additional reference face images (up to 3) to be used for identity mixing
Outputs
- Images: A set of generated images that match the provided prompt and retain the identity and appearance of the reference face(s)
Capabilities
PuLID excels at generating high-qual...
Top comments (0)