What is Stable Diffusion and why it matters
What is Stable Diffusion?
Stable diffusion is a text-to-image generation machine learning model released by Stability.Ai on August 22, 2022. Basically, this model is capable of generating highly detailed photo-realistic images from the text descriptions.
The model was released publicly and is open source so you can play around with it if you want.
What can Stable Diffusion do?
For now the Stable Diffusion model allows users to:
Convert text into brand new realistic images at 512x512 pixels in a few seconds
Use image modification, via image-to-image translation guided by a text prompt and upscaling, to transform an existing image into a new image
Use GFP-GAN modeling for the inpaiting, the process that helps to restore and upscale an existing image
How does Stable Diffusion work?
Stable diffusion is a form of a diffusion model (DM). Diffusion models were introduced back in 2015 and are trained with the objective of removing successive applications of Gaussian noise to training images, and can be thought of as a sequence of denoising autoencoders. There are different variants of the DMs but Stable Diffusion is powered by a variant known as Latent Diffusion or Latent Diffusion Model (LDM).
Latent Diffusion is a breakthrough text-to-image synthesis technique. It was described by AI researchers at the Ludwig Maximilian University of Munich in a paper called “High-Resolution Image Synthesis with Latent Diffusion Models”. In short, Latent Diffusion is trained to transform images into a lower-dimensional latent space instead of learning to denoise image data in “pixel space”. It makes the training cost lower and the inference speed faster.
How was Stable Diffusion trained?
Stable Diffusion was trained on massive datasets collected by LAION, a non-profit AI open network who received funding from Stability.Ai, the owner of the Stable Diffusion model. Those datasets contained 120 million image-text pairs from the complete set which contains nearly 6 billion image-text pairs.
Overall they sampled 12 million images. 47% of the total sample size came from 100 domains. Pinterest took up 8.5% of the entire dataset, followed by other sources such as WordPress.com, Blogspot, Flickr, DeviantArt, and Wikimedia Commons.
Is Stable Diffusion being regulated?
There are plenty of ethical, moral and legal issues concerning the misuse of AI devices. Compared to previous comparable AI models, Stable Diffusion permits a wider range of images to be generated. For example, users can generate pictures of real people or existing brand logos. The other expressed concern associated with Stable Diffusion is that widespread usage of image synthesis softwares may eventually lead to human artists, along with photographers, models and everyone involved into creation of the visual art, to gradually lose commercial viability against AI-based competitors.
In response to these concerns, Stability.ai emphasized the importance of “ethical and legal” use of the model in its public release announcement. There are some important points:
Users of the Stable Diffusion model are granted a “perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute” the Model, and any of its Complementary Material, such as its source code, and any Derivatives of the Model
Users are also granted a “perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable” patent license to make, use, sell or offer to sell, import or otherwise transfer, the Model and any of its Complementary Material
Users agree to use the Model or its Derivatives “in an ethical, moral and legal manner and contribute both to the community and discourse around it” and to not use the Model in a way which would cause harm to minors, defame anyone, discriminate against an individual or group or exploit the vulnerabilities of a specific group
The Licensor does not assert any rights in the output users generate using the model
Basically, the license of Stable Diffusion usage relies on users self-regulating their own actions and their ability to “do the right thing”. However, the license does not imply any punishment for non-compliance with the social agreement between Stable Diffusion and its users.
Commenting on that, the CEO of Stability.Ai Emad Mostaque explained that it is “peoples' responsibility as to whether they are ethical, moral, and legal in how they operate this technology” and the freedom given to users can provide an overall net benefit, even in spite of the potential negative consequences. Also Mostaque adds that image generating AI systems before Stable Diffusion were developed closed and controlled by large corporations, whilst the availability of Stable Diffusion ends corporate control and dominance over such technologies.
But is it considered art, though?
The US Copyright Office says that it is not. In February 2022, the Office’s Review Board rejected an application for the artwork called “A Recent Entrance to Paradise” that was generated by an AI algorithm that repurposed picture to create an image seen by a synthetic dying brain.
The Review Board stated that “human authorship is a prerequisite to copyright protection <...> but the Work [“A Recent Entrance to Paradise”] was autonomously created by artificial intelligence without any creative contribution from a human actor”.
Yet AI art is relatively popular. For example, in 2018 an auction house Christie sold an IA generated picture of a blurred face titled “Portrait of Edmond Belamy” for $432,500. But many claim that the only appeal AI generated art has is a quirkiness of being made by a non-human.
Proving that AI generated art is not art, the Rutgers AI lab concluded:
“Still, there’s something missing in [AI’s] artistic process: The algorithm might create appealing images, but it lives in an isolated creative space that lacks social context. Human artists, on the other hand, are inspired by people, places, and politics. They create art to tell stories and make sense of the world”.
Top comments (0)