<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: nabata</title>
    <description>The latest articles on DEV Community by nabata (@nabata).</description>
    <link>https://dev.to/nabata</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2138198%2F052fa389-9016-46f5-9850-03b685dc51e2.png</url>
      <title>DEV Community: nabata</title>
      <link>https://dev.to/nabata</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nabata"/>
    <language>en</language>
    <item>
      <title>Replacing Only the Background of an Image with AI Generation Using the Stable Diffusion Web API</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Fri, 08 Nov 2024 15:09:45 +0000</pubDate>
      <link>https://dev.to/nabata/replacing-only-the-background-of-an-image-with-ai-generation-using-the-stable-diffusion-web-api-2oia</link>
      <guid>https://dev.to/nabata/replacing-only-the-background-of-an-image-with-ai-generation-using-the-stable-diffusion-web-api-2oia</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This guide demonstrates how to replace the background of an image using Python code only, without relying on image editing software like Photoshop. The goal is to keep the subject intact while swapping in an AI-generated background.&lt;/p&gt;

&lt;p&gt;While this approach may not be revolutionary, it addresses a common need, so I hope it will be helpful for those with similar requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Input and Output Images
&lt;/h2&gt;

&lt;p&gt;Let’s start with the results.&lt;/p&gt;

&lt;p&gt;The following output image was generated from the input image shown below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Input Image
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymhmxukn1yl0ujcm7lvb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymhmxukn1yl0ujcm7lvb.png" alt="input" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Output Image
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4u8hf6kbqzgbaa92163g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4u8hf6kbqzgbaa92163g.png" alt="output" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Libraries
&lt;/h2&gt;

&lt;p&gt;Install &lt;a href="https://pypi.org/project/requests/" rel="noopener noreferrer"&gt;requests&lt;/a&gt; to handle the API calls.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I verified the version as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip list | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; requests
&lt;span class="go"&gt;requests                  2.31.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  API Key
&lt;/h2&gt;

&lt;p&gt;For background generation, we’ll use &lt;a href="https://ja.stability.ai/" rel="noopener noreferrer"&gt;Stability AI’s&lt;/a&gt; Web API.&lt;/p&gt;

&lt;p&gt;To access this API, you’ll need to obtain an API Key from their &lt;a href="https://platform.stability.ai/" rel="noopener noreferrer"&gt;Developer Platform&lt;/a&gt;. For pricing, refer to the &lt;a href="https://platform.stability.ai/pricing" rel="noopener noreferrer"&gt;Pricing&lt;/a&gt; page.&lt;/p&gt;

&lt;p&gt;To keep your key secure, save it as an environment variable rather than hardcoding it in your code.&lt;/p&gt;

&lt;p&gt;In my environment, I use the &lt;code&gt;zshrc&lt;/code&gt; settings file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;open ~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I saved the key under the name &lt;strong&gt;STABILITY_API_KEY&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;export STABILITY_API_KEY=your_api_key_here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;Here, we use the &lt;a href="https://platform.stability.ai/docs/api-reference#tag/Edit/paths/~1v2beta~1stable-image~1edit~1remove-background/post" rel="noopener noreferrer"&gt;Remove Background API&lt;/a&gt; to isolate the subject. We then pass the extracted image to the &lt;a href="https://platform.stability.ai/docs/api-reference#tag/Edit/paths/~1v2beta~1stable-image~1edit~1inpaint/post" rel="noopener noreferrer"&gt;Inpaint API&lt;/a&gt; to create the new background.&lt;/p&gt;

&lt;p&gt;The prompt used is "&lt;strong&gt;Large glass windows with a view of the metropolis behind&lt;/strong&gt;"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# File paths
&lt;/span&gt;&lt;span class="n"&gt;input_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./input.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Original image
&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./mask.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Mask image (temporarily generated)
&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./output.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Output image
&lt;/span&gt;
&lt;span class="c1"&gt;# Check for API Key
&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STABILITY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing Stability API key.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Call Remove Background API
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.stability.ai/v2beta/stable-image/edit/remove-background&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save mask image
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

&lt;span class="c1"&gt;# Call Inpaint API
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.stability.ai/v2beta/stable-image/edit/inpaint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Large glass windows with a view of the metropolis behind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grow_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Disable blurring around the mask
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Delete mask image
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save output image
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using rembg
&lt;/h2&gt;

&lt;p&gt;Another approach for background removal is to use &lt;a href="https://github.com/danielgatis/rembg" rel="noopener noreferrer"&gt;rembg&lt;/a&gt;. This method requires only one API call, making it more cost-effective, though it may result in differences in extraction accuracy.&lt;/p&gt;

&lt;p&gt;First, install &lt;code&gt;rembg&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;rembg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I verified the version as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip list | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; rembg
&lt;span class="go"&gt;rembg                     2.0.59
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s the code for this approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rembg&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;remove&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# File paths
&lt;/span&gt;&lt;span class="n"&gt;input_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./input.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Input image path
&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./mask.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Mask image path (temporarily generated)
&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./output.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Output image path
&lt;/span&gt;
&lt;span class="c1"&gt;# Generate mask image with background removed
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;input_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;mask_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check for API Key
&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STABILITY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing Stability API key.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Call Inpaint API
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.stability.ai/v2beta/stable-image/edit/inpaint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Large glass windows with a view of the metropolis behind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grow_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Delete mask image
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save output image
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s the output image. In this case, the accuracy of the extraction seems satisfactory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34cqijqkuqzckivlv25w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34cqijqkuqzckivlv25w.png" alt="output_2" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you set up a local Stable Diffusion environment, you can eliminate API call costs, so feel free to explore that option if it suits your needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Being able to achieve this through code alone is highly convenient.&lt;/p&gt;

&lt;p&gt;It’s exciting to witness the ongoing improvements in workflow efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Original Japanese Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/6f39d2a56fb4bce02a96" rel="noopener noreferrer"&gt;Stable DiffusionのWeb APIを用いて画像上の人物はそのままに背景だけをAI生成で入れ替えてみた&lt;/a&gt;&lt;/p&gt;

</description>
      <category>stablediffusion</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>Comparing Prompt Accuracy Across Various Image Generation AIs (Stable Diffusion 3.5, FLUX1.1, Imagen 3, DALL·E 3, Adobe Firefly)</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Sat, 02 Nov 2024 03:21:14 +0000</pubDate>
      <link>https://dev.to/nabata/comparing-prompt-accuracy-across-various-image-generation-ais-stable-diffusion-35-flux11-imagen-3-dalle-3-adobe-firefly-1jlp</link>
      <guid>https://dev.to/nabata/comparing-prompt-accuracy-across-various-image-generation-ais-stable-diffusion-35-flux11-imagen-3-dalle-3-adobe-firefly-1jlp</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Recently, &lt;a href="https://stability.ai/" rel="noopener noreferrer"&gt;Stability AI&lt;/a&gt; introduced &lt;strong&gt;Stable Diffusion 3.5&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Today we are introducing Stable Diffusion 3.5. This open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo. Additionally, Stable Diffusion 3.5 Medium will be released on October 29th.  &lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://stability.ai/news/introducing-stable-diffusion-3-5" rel="noopener noreferrer"&gt;Stable Diffusion 3.5 — Stability AI&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The release announcement also states, "&lt;strong&gt;Additionally, our analysis shows that Stable Diffusion 3.5 Large leads the market in prompt adherence and rivals much larger models in image quality&lt;/strong&gt;" This article aims to explore that claim by comparing the accuracy of prompt adherence across various popular image generation models.&lt;/p&gt;

&lt;p&gt;Please note that this evaluation is subjective and is intended as a reference for understanding how these models handle straightforward prompts that may not always yield ideal results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image Generation AIs Used in This Comparison
&lt;/h2&gt;

&lt;p&gt;The models tested include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ja.stability.ai/blog/introducing-stable-diffusion-3-5" rel="noopener noreferrer"&gt;Stable Diffusion 3.5 Large&lt;/a&gt; by &lt;a href="https://stability.ai/" rel="noopener noreferrer"&gt;Stability AI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blackforestlabs.ai/announcing-flux-1-1-pro-and-the-bfl-api/" rel="noopener noreferrer"&gt;FLUX1.1 [pro]&lt;/a&gt; by &lt;a href="https://blackforestlabs.ai/" rel="noopener noreferrer"&gt;Black Forest Labs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://deepmind.google/technologies/imagen-3/" rel="noopener noreferrer"&gt;Imagen 3&lt;/a&gt; by &lt;a href="https://www.google.com/" rel="noopener noreferrer"&gt;Google&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://openai.com/index/dall-e-3/" rel="noopener noreferrer"&gt;DALL·E 3&lt;/a&gt; by &lt;a href="https://openai.com/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.adobe.com/jp/products/firefly.html" rel="noopener noreferrer"&gt;Adobe Firefly&lt;/a&gt; by &lt;a href="https://www.adobe.com/" rel="noopener noreferrer"&gt;Adobe&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each model was tested once per prompt. If multiple images were generated simultaneously, I selected the "top-left" result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stable Diffusion 3.5 Large&lt;/strong&gt; and &lt;strong&gt;FLUX1.1 [pro]&lt;/strong&gt; images were generated through Web API, while the others were created directly in-browser. &lt;strong&gt;Imagen 3&lt;/strong&gt; was accessed through ImageFX and &lt;strong&gt;DALL·E 3&lt;/strong&gt; through ChatGPT.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;Firefly&lt;/strong&gt;, I used &lt;strong&gt;Firefly Image 3&lt;/strong&gt; with &lt;strong&gt;Fast mode&lt;/strong&gt; turned off, then upscaled the images after generation. As a result, &lt;strong&gt;Firefly&lt;/strong&gt; images are 2048x2048, while all other images are 1024x1024.&lt;/p&gt;

&lt;p&gt;The code used to generate &lt;strong&gt;FLUX1.1 [pro]&lt;/strong&gt; images is adapted from the article "&lt;a href="https://dev.to/nabata/using-the-web-api-for-flux-11-pro-the-latest-image-generation-ai-model-by-the-original-team-of-stable-diffusion-29pi"&gt;Using the Web API for FLUX 1.1 [pro]: The Latest Image Generation AI Model by the Original Team of Stable Diffusion&lt;/a&gt;" with the size updated to 1024x1024.&lt;/p&gt;

&lt;p&gt;Below is the code used for &lt;strong&gt;Stable Diffusion 3.5 Large&lt;/strong&gt;. The &lt;strong&gt;STABILITY_API_KEY&lt;/strong&gt; environment variable stores the API key. For more details, see the &lt;a href="https://platform.stability.ai/docs/api-reference" rel="noopener noreferrer"&gt;API Reference&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;api_host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;API_HOST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://api.stability.ai&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STABILITY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe the prompt here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Ensure API Key is available
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing Stability API key.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# API call
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_host&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/v2beta/stable-image/generate/sd3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sd3.5-large&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save image with timestamped filename
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let’s dive into the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  No.1 - A Single Banana
&lt;/h2&gt;

&lt;p&gt;One known issue in AI image generation is &lt;strong&gt;The Lone Banana Problem&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The bias to two bananas in a picture is, I believe, an example of a subtle bias (OK, it’s not that subtle, but it is more subtle than many of the more concerning news-grabbing biases that we regularly read about). A naïve explanation may be that in the training dataset there have been many pictures of bananas added to Midjourney’s database that have been labelled “banana” but not labelled “two bananas”. It may also be that Midjourney has never seen an individual banana, so it doesn’t know that a single banana is possible.&lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://www.digital-science.com/tldr/article/the-lone-banana-problem-or-the-new-programming-speaking-ai/" rel="noopener noreferrer"&gt;The Lone Banana Problem. Or, the new programming: “speaking” AI - TL;DR - Digital Science&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A similar phenomenon, known as &lt;a href="https://community.openai.com/t/incorrect-count-of-r-characters-in-the-word-strawberry/829618/5" rel="noopener noreferrer"&gt;The Strawberry Problem&lt;/a&gt;, has also recently become a topic of interest.&lt;/p&gt;

&lt;p&gt;To see how each model addresses this issue, I started with the following prompt:&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;There is a single banana on the table.There is a single banana hanging from the ceiling.There is a single banana placed on the chair.There is a man with a single banana on his head.There is a woman washing a single banana.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable Diffusion 3.5 Large
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh0zhruxo0t283kvwhy3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh0zhruxo0t283kvwhy3.png" alt="1_sd35" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FLUX1.1 [pro]
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcmevp7l093odeibjbor.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcmevp7l093odeibjbor.jpg" alt="1_flux" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagen 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz65udtyf6xwebgaorlli.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz65udtyf6xwebgaorlli.jpg" alt="1_fx" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL·E 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nxsokcv3mo1ib23ifg1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nxsokcv3mo1ib23ifg1.png" alt="1_dalle" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fab2nih4dj98i0lpb15vm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fab2nih4dj98i0lpb15vm.jpg" alt="1_ff" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments
&lt;/h3&gt;

&lt;p&gt;Unfortunately, none of the models performed well on this prompt.&lt;/p&gt;

&lt;p&gt;It’s possible the prompt was too complex. My apologies.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  No.2 - Retrying the Single Banana
&lt;/h2&gt;

&lt;p&gt;I simplified the initial prompt and tried again to see if reducing complexity would improve results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;A single banana placed in the center of a white background. The banana should be ripe, with a bright yellow peel and a few brown spots, indicating its ripeness. The shape of the banana should be curved in a natural way, and it should be clearly identifiable as one piece of fruit without any additional objects or bananas in the image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable Diffusion 3.5 Large
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsgiifk7490ynxxpunjb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsgiifk7490ynxxpunjb.png" alt="2_sd35" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FLUX1.1 [pro]
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2h4eln3pwzjynl1lt164.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2h4eln3pwzjynl1lt164.jpg" alt="2_flux" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagen 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnj1hwtq39fln3dhv2nn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnj1hwtq39fln3dhv2nn.jpg" alt="2_fx" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL·E 3
&lt;/h3&gt;

&lt;p&gt;The following error prevented image generation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I couldn't generate the requested image because it didn't align with the content policy. If you have another idea or request, feel free to share, and I'll do my best to create it!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkvyaztqizi771lgf4fy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkvyaztqizi771lgf4fy.jpg" alt="2_ff" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stable Diffusion 3.5 Large&lt;/strong&gt; produced some unusual results here, as with the previous attempt, highlighting potential limitations in handling simpler prompts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Imagen 3&lt;/strong&gt; generated a banana that appears slightly under-ripe, and &lt;strong&gt;Firefly&lt;/strong&gt;’s result has a subtle unnatural quality. However, both images reasonably reflect the prompt’s intent.&lt;/p&gt;

&lt;p&gt;It’s unclear what aspect of the prompt conflicted with &lt;strong&gt;DALL·E 3&lt;/strong&gt;’s content policy.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  No.3 - Space Battle
&lt;/h2&gt;

&lt;p&gt;To explore each AI's handling of more fantastical themes, I introduced a space battle scenario.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;A large-scale space battle between two fleets of futuristic spaceships. Lasers and missiles are being fired, with explosions happening in the background. The scene takes place in deep space, with a distant galaxy visible in the background and some debris floating nearby.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable Diffusion 3.5 Large
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foaawscik5ctl1muhsjuu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foaawscik5ctl1muhsjuu.png" alt="3_sd35" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FLUX1.1 [pro]
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zpzagpduvmdqrsv3cx6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zpzagpduvmdqrsv3cx6.jpg" alt="3_flux" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagen 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojx8insw7f5meya263m1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojx8insw7f5meya263m1.jpg" alt="3_fx" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL·E 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2o5zwkksv2hbqt6ly6zk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2o5zwkksv2hbqt6ly6zk.png" alt="3_dalle" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgn9rmxti00qz5bnij4sw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgn9rmxti00qz5bnij4sw.jpg" alt="3_ff" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments
&lt;/h3&gt;

&lt;p&gt;Direct comparisons were challenging due to varying interpretations by each AI. It’s difficult to identify both lasers and missiles in every image, and some results lack a strong sense of combat.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  No.4 - Steampunk Invention
&lt;/h2&gt;

&lt;p&gt;Next, I tried a prompt centered on the steampunk genre to see how well each AI captures this distinct aesthetic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Steampunk is a subgenre of science fiction that incorporates retrofuturistic technology and aesthetics inspired by, but not limited to, 19th-century industrial steam-powered machinery.&lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://en.wikipedia.org/wiki/Steampunk" rel="noopener noreferrer"&gt;Steampunk - Wikipedia&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;An intricate steampunk device on a workbench, made of brass, gears, and glass tubes. The device is emitting a faint steam cloud, with tiny dials and gauges displaying various readings. Nearby, a pair of leather gloves and a set of old blueprints are scattered on the wooden table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable Diffusion 3.5 Large
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgkj3qyq0kk7vk01ftr4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgkj3qyq0kk7vk01ftr4.png" alt="4_sd35" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FLUX1.1 [pro]
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu6fr5k3ipi7darkejg1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu6fr5k3ipi7darkejg1.jpg" alt="4_flux" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagen 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F416xb7d6sle9e5jhb6rn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F416xb7d6sle9e5jhb6rn.jpg" alt="4_fx" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL·E 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjwbvoeqn1z3pmchi3qb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjwbvoeqn1z3pmchi3qb.png" alt="4_de" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fja4hjuls1q8axofqvm7s.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fja4hjuls1q8axofqvm7s.jpg" alt="4_ff" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments
&lt;/h3&gt;

&lt;p&gt;Most images represented the prompt well, though &lt;strong&gt;DALL·E 3&lt;/strong&gt; missed the scattered blueprints, and the gloves did not appear as a pair.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Firefly&lt;/strong&gt; also did not include leather gloves.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  No.5 - Chibi-Style Character
&lt;/h2&gt;

&lt;p&gt;For this test, I prompted each AI to generate a distinctive character in a chibi style.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;A chibi-style character of a smiling young girl with big eyes, short pink hair, and a school uniform. She is holding a small cat in her arms, standing on a grassy hill under a bright blue sky with fluffy clouds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable Diffusion 3.5 Large
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj13vbytxlm70ebzzvcdj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj13vbytxlm70ebzzvcdj.png" alt="5_sd35" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FLUX1.1 [pro]
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv15801nzlv3eq02nu90r.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv15801nzlv3eq02nu90r.jpg" alt="5_flux" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagen 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm244oivtlxb8x0350hj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm244oivtlxb8x0350hj.jpg" alt="5_fx" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL·E 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufagpeqo5pbmhtz04y9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufagpeqo5pbmhtz04y9y.png" alt="5_de" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlp2j58as7naw03vyh77.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlp2j58as7naw03vyh77.jpg" alt="5_ff" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Imagen 3&lt;/strong&gt; did not meet the prompt specification for pink hair, and &lt;strong&gt;DALL·E 3&lt;/strong&gt; omitted the cat the girl was supposed to be holding.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Firefly&lt;/strong&gt;, the character was given cat ears, and both the cat and the girl’s hands are somewhat awkwardly rendered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stable Diffusion 3.5 Large&lt;/strong&gt; mostly captures the prompt details, though some aspects, like the cat’s body shape, appear slightly unnatural, so I rated it △.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  No.6 - Colorful Coral Reef
&lt;/h2&gt;

&lt;p&gt;Next, I asked the AIs to generate a serene and vibrant underwater scene.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;A colorful underwater scene featuring a coral reef filled with vibrant fish, sea turtles, and a few small sharks. Sunlight beams are penetrating through the water's surface, illuminating the sea life and creating a beautiful, serene atmosphere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable Diffusion 3.5 Large
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ky2o0tjbcahffk9kyod.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ky2o0tjbcahffk9kyod.png" alt="6_sd35" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FLUX1.1 [pro]
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4nec2y79ap0ryqrvrdd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4nec2y79ap0ryqrvrdd.jpg" alt="6_flux" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagen 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy25uodlfbfmz7tzaaht5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy25uodlfbfmz7tzaaht5.jpg" alt="6_fx" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL·E 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e6uaxdxgu20z75lf7az.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e6uaxdxgu20z75lf7az.png" alt="6_de" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;This prompt returned a processing error, so &lt;strong&gt;Firefly&lt;/strong&gt; could not generate an image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;FLUX1.1 [pro]&lt;/strong&gt; was missing sea turtles, &lt;strong&gt;Imagen 3&lt;/strong&gt; lacked multiple turtles and sharks, and &lt;strong&gt;DALL·E 3&lt;/strong&gt; did not include any sharks.&lt;/p&gt;

&lt;p&gt;It’s unclear what caused &lt;strong&gt;Firefly&lt;/strong&gt;’s processing error.&lt;/p&gt;

&lt;p&gt;Incidentally, I’ve noticed that &lt;strong&gt;Imagen 3&lt;/strong&gt; frequently fails to generate images, even with other prompts.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  No.7 - Japanese Tea Ceremony
&lt;/h2&gt;

&lt;p&gt;For the final test, I chose a prompt with a specific cultural theme to see how well each model captures details from a traditional Japanese tea ceremony.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;A traditional Japanese tea ceremony taking place in a tatami room. A woman in a kimono is gracefully preparing tea, while a guest kneels in front of her, observing respectfully. The room is decorated with traditional Japanese art and sliding shoji doors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable Diffusion 3.5 Large
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8069strzrbnkh0lwdkp2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8069strzrbnkh0lwdkp2.png" alt="7_sd35" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FLUX1.1 [pro]
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyvdb8q6orpnbto3jkal.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyvdb8q6orpnbto3jkal.jpg" alt="7_flux" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagen 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx254r4r689521wld767l.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx254r4r689521wld767l.jpg" alt="7_fx" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL·E 3
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4m9cna7g64rvtxnoary.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4m9cna7g64rvtxnoary.png" alt="7_de" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gmq8z3gm3chmm5lq3vv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gmq8z3gm3chmm5lq3vv.jpg" alt="7_ff" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments
&lt;/h3&gt;

&lt;p&gt;Strict accuracy was not evaluated here, as the details of tea ceremony protocol could disqualify all the images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stable Diffusion 3.5 Large&lt;/strong&gt; produced a somewhat ambiguous tea preparation scene and used an unconventional shoji door style.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DALL·E 3&lt;/strong&gt; displayed notably distorted tatami and other room elements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Firefly&lt;/strong&gt; lacked the observing guest, and its shoji doors and tatami differed from traditional interpretations.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  In Conclusion
&lt;/h2&gt;

&lt;p&gt;The following table summarizes the results.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;SD 3.5 Large&lt;/th&gt;
&lt;th&gt;FLUX1.1 [pro]&lt;/th&gt;
&lt;th&gt;Imagen 3&lt;/th&gt;
&lt;th&gt;DALL·E 3&lt;/th&gt;
&lt;th&gt;Firefly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Single Banana&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Single Banana Retry&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Space Battle&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Steampunk Invention&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Chibi Character&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Coral Reef&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Tea Ceremony&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Overall, this was a subjective review, and based on these results, it’s clear that &lt;strong&gt;Stable Diffusion 3.5 Large&lt;/strong&gt; does not definitively outperform other models.&lt;/p&gt;

&lt;p&gt;Here is a rough grouping by adherence accuracy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Adherence Level S&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;FLUX1.1 [pro], Imagen 3&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Prompt Adherence Level A&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Stable Diffusion 3.5 Large &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Prompt Adherence Level B&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;DALL·E 3, Firefly&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;One takeaway is that getting an AI to generate an image that perfectly matches the intent remains a challenging task—extremely, incredibly challenging. &lt;/p&gt;

&lt;p&gt;However, the pace of AI’s improvement is undeniably impressive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Original Japanese Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/75eaa797d5e943931b93" rel="noopener noreferrer"&gt;プロンプトがどれだけ正確に反映されるのかを様々な画像生成AIで比較してみた（Stable Diffusion 3.5、FLUX1.1、Imagen 3、DALL·E 3、Adobe Firefly）&lt;/a&gt;&lt;/p&gt;

</description>
      <category>stablediffusion</category>
      <category>ai</category>
      <category>fluxai</category>
      <category>openai</category>
    </item>
    <item>
      <title>Using Runway's "Gen-3 Alpha Turbo" API to Generate AI Videos</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Sat, 26 Oct 2024 03:38:09 +0000</pubDate>
      <link>https://dev.to/nabata/using-runways-gen-3-alpha-turbo-api-to-generate-ai-videos-42gb</link>
      <guid>https://dev.to/nabata/using-runways-gen-3-alpha-turbo-api-to-generate-ai-videos-42gb</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://runwayml.com/" rel="noopener noreferrer"&gt;Runway&lt;/a&gt; is a platform offering the video generation AI &lt;strong&gt;Gen-3-Alpha&lt;/strong&gt;, which I previously covered in an article titled "&lt;a href="https://dev.to/nabata/converting-images-to-video-using-the-video-generation-ai-gen-3-alpha-the-results-were-so-natural-it-was-almost-scary-5c9"&gt;Converting Images to Video Using the Video Generation AI "Gen-3 Alpha" : The Results Were So Natural, It Was Almost Scary&lt;/a&gt;."&lt;/p&gt;

&lt;p&gt;Now, Runway has introduced a more cost-effective version called &lt;strong&gt;Gen-3 Alpha Turbo&lt;/strong&gt;, which is accessible through a web API. In this post, I'll explore how to use it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We are excited to announce the launch of our new API, providing developers with access to our Gen-3 Alpha Turbo model for integration into various applications and products. This release represents a significant step forward in making advanced AI capabilities more accessible to a broader range of developers, businesses and creatives.&lt;/p&gt;

&lt;p&gt;Reference：&lt;a href="https://runwayml.com/news/introducing-the-runway-api" rel="noopener noreferrer"&gt;Runway News | Introducing the Runway API for Gen-3 Alpha Turbo&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Waitlist Registration
&lt;/h2&gt;

&lt;p&gt;To access the API, you currently need to register on the waitlist.&lt;/p&gt;

&lt;p&gt;Fill out the &lt;a href="https://forms.gle/eHDQQawT8QGKWHDC7" rel="noopener noreferrer"&gt;Google Form&lt;/a&gt; with your email, name, company, intended use case, and estimated number of videos you'll generate monthly.&lt;/p&gt;

&lt;p&gt;After submitting, wait for authorization—I received access about 10 days after applying. Once approved, you'll be asked to enter your organization name when logging in with your registered email.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7jbcsgvz37giysxc5su.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7jbcsgvz37giysxc5su.png" alt="Gen-3 Alpha Turbo API 2" width="589" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Purchasing Credits
&lt;/h2&gt;

&lt;p&gt;Video generation costs $0.25 for a 5-second video and $0.50 for a 10-second video.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet7vqcqm9s6urfrdjrsj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet7vqcqm9s6urfrdjrsj.png" alt="Gen-3 Alpha Turbo API 3" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can purchase credits on the Billing page, with a minimum purchase of $10.0 (1,000 credits). For $10, you can create either 40 videos of 5 seconds or 20 videos of 10 seconds.&lt;/p&gt;

&lt;p&gt;For the latest pricing, check the &lt;a href="https://docs.dev.runwayml.com/#price" rel="noopener noreferrer"&gt;Price&lt;/a&gt; page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating an API Key
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flouxauxd18g48q5pfujv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flouxauxd18g48q5pfujv.png" alt="Gen-3 Alpha Turbo API 4" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To create an API key, navigate to the API Keys page.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Securely copy the key above and store it in a safe place. Once you close this modal, the key will not be displayed again.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The key is only visible once, so make sure to save it before closing the modal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Environment
&lt;/h2&gt;

&lt;p&gt;I am using macOS 14 Sonoma for this setup and will use Python for the implementation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;Python 3.12.2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following the &lt;a href="https://docs.dev.runwayml.com/guides/quickstart/" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt;, I installed the SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;runwayml
&lt;span class="go"&gt;Collecting runwayml
  Downloading runwayml-2.0.0-py3-none-any.whl (71 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.2/71.2 kB 3.5 MB/s eta 0:00:00

&lt;/span&gt;&lt;span class="c"&gt;...
&lt;/span&gt;&lt;span class="go"&gt;
Installing collected packages: runwayml
Successfully installed runwayml-2.0.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, I saved the API key as an environment variable in my &lt;code&gt;zshrc&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;open ~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I set the environment variable as &lt;strong&gt;RUNWAYML_API_SECRET&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;export RUNWAYML_API_SECRET=&amp;lt;Your API Key Here&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be cautious with the variable name—incorrect input will result in this error:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The api_key client option must be set either by passing api_key to the client or by setting the RUNWAYML_API_SECRET environment variable&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Original Image
&lt;/h2&gt;

&lt;p&gt;The video will be based on the following image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57u069td4ixuu191ilnc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57u069td4ixuu191ilnc.jpg" alt="Original Image" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This image was generated using &lt;strong&gt;FLUX 1.1 [pro]&lt;/strong&gt; with dimensions of 1280x768, matching the &lt;strong&gt;Gen-3 Alpha Turbo&lt;/strong&gt;’s default aspect ratio of &lt;strong&gt;16:9&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For more details about &lt;strong&gt;FLUX 1.1 [pro]&lt;/strong&gt;, check out my previous article, "&lt;a href="https://dev.to/nabata/using-the-web-api-for-flux-11-pro-the-latest-image-generation-ai-model-by-the-original-team-of-stable-diffusion-29pi"&gt;Using the Web API for FLUX 1.1 [pro]: The Latest Image Generation AI Model by the Original Team of Stable Diffusion&lt;/a&gt;".&lt;/p&gt;

&lt;p&gt;Ensure the image is uploaded to an accessible URL for the API call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Generation
&lt;/h2&gt;

&lt;p&gt;Let’s generate the video.&lt;/p&gt;

&lt;p&gt;The code below is based on the &lt;a href="https://docs.dev.runwayml.com/guides/quickstart/" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt;, using the prompt "&lt;strong&gt;A Japanese woman is smiling happily&lt;/strong&gt;."&lt;/p&gt;

&lt;p&gt;I've added comments for extra parameters.&lt;/p&gt;

&lt;p&gt;For more details, refer to the &lt;a href="https://docs.dev.runwayml.com/api/" rel="noopener noreferrer"&gt;API Reference&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;runwayml&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunwayML&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RunwayML&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create a new image-to-video task using the "gen3a_turbo" model
&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_to_video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gen3a_turbo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt_image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;Image URL Here&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A Japanese woman is smiling happily&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Must be under 512 characters
&lt;/span&gt;    &lt;span class="c1"&gt;# seed=0,  # Default: random
&lt;/span&gt;    &lt;span class="c1"&gt;# watermark=True,  # Default: False
&lt;/span&gt;    &lt;span class="c1"&gt;# duration=5,  # Default: 10
&lt;/span&gt;    &lt;span class="c1"&gt;# ratio="9:16"  # Default: "16:9"
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;task_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;

&lt;span class="c1"&gt;# Check for completion
&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Wait
&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SUCCEEDED&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FAILED&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Wait
&lt;/span&gt;    &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Task complete:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the process completes, it outputs the video URL.&lt;/p&gt;

&lt;p&gt;Here is the result:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/KWMH-NT-mGE"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The video generated as expected, although the movement appears slightly uncanny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The rapid development and competition in the generative AI space are remarkable.&lt;/p&gt;

&lt;p&gt;How long will this trend continue?&lt;/p&gt;

&lt;h2&gt;
  
  
  Original Japanese Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/20337bbb9e08a65bf890" rel="noopener noreferrer"&gt;Runwayの「Gen-3 Alpha Turbo」のAPIを呼んで動画をAI生成してみた&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aivideo</category>
      <category>gen3</category>
    </item>
    <item>
      <title>Using the Web API for FLUX 1.1 [pro]: The Latest Image Generation AI Model by the Original Team of Stable Diffusion</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Sun, 20 Oct 2024 03:19:51 +0000</pubDate>
      <link>https://dev.to/nabata/using-the-web-api-for-flux-11-pro-the-latest-image-generation-ai-model-by-the-original-team-of-stable-diffusion-29pi</link>
      <guid>https://dev.to/nabata/using-the-web-api-for-flux-11-pro-the-latest-image-generation-ai-model-by-the-original-team-of-stable-diffusion-29pi</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Previously, I wrote an article titled “&lt;a href="https://dev.to/nabata/running-the-flux1-image-devschnell-generation-ai-model-by-stable-diffusions-original-developers-on-a-macbook-m2-4ld6"&gt;Running the FLUX.1 Image ([dev]/[schnell]) Generation AI Model by Stable Diffusion’s Original Developers on a MacBook (M2)&lt;/a&gt;.” It demonstrated the &lt;strong&gt;FLUX.1&lt;/strong&gt; image generation model from &lt;a href="https://blackforestlabs.ai/" rel="noopener noreferrer"&gt;Black Forest Labs&lt;/a&gt;, founded by the creators of Stable Diffusion.&lt;/p&gt;

&lt;p&gt;Now, two months later, &lt;strong&gt;FLUX 1.1 [pro]&lt;/strong&gt; (codenamed Blueberry) has been released, along with public access to its web API, though it’s still in beta.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Today, we release FLUX1.1 [pro], our most advanced and efficient model yet, alongside the general availability of the beta BFL API. This release marks a significant step forward in our mission to empower creators, developers, and enterprises with scalable, state-of-the-art generative technology.&lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://blackforestlabs.ai/announcing-flux-1-1-pro-and-the-bfl-api/" rel="noopener noreferrer"&gt;Announcing FLUX1.1 [pro] and the BFL API - Black Forest Labs&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this post, I will demonstrate how to use the &lt;strong&gt;FLUX 1.1 [pro]&lt;/strong&gt; web API.&lt;/p&gt;

&lt;p&gt;All code examples are written in &lt;strong&gt;Python&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating an Account and an API Key
&lt;/h2&gt;

&lt;p&gt;Start by registering an account and logging in on the &lt;a href="https://api.bfl.ml/" rel="noopener noreferrer"&gt;API page&lt;/a&gt; under the &lt;strong&gt;Register&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;Credits are priced at $0.01 each, and I received 50 credits upon registration (this may vary).&lt;/p&gt;

&lt;p&gt;Based on the &lt;a href="https://docs.bfl.ml/pricing/" rel="noopener noreferrer"&gt;Pricing page&lt;/a&gt;, the model costs are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FLUX 1.1 [pro]: $0.04 per image&lt;/li&gt;
&lt;li&gt;FLUX.1 [pro]: $0.05 per image&lt;/li&gt;
&lt;li&gt;FLUX.1 [dev]: $0.025 per image&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you’re logged in, generate an API key by selecting &lt;strong&gt;Add Key&lt;/strong&gt; and entering a name of your choice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchj8scdkl8ar0naiuche.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchj8scdkl8ar0naiuche.png" alt="FLB API Image 1" width="699" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your key will appear as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztmq1jmycwsizo6pnecy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztmq1jmycwsizo6pnecy.png" alt="FLB API Image 2" width="688" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Environment Setup
&lt;/h2&gt;

&lt;p&gt;I'm using macOS 14 Sonoma as my operating system.&lt;/p&gt;

&lt;p&gt;The Python version is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;Python 3.12.2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run the sample code, I installed &lt;a href="https://pypi.org/project/requests/" rel="noopener noreferrer"&gt;requests&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I confirmed the installed version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip list | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; requests 
&lt;span class="go"&gt;requests           2.31.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To avoid hardcoding, I saved the API key as an environment variable by editing the &lt;code&gt;zshrc&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;open ~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I named the environment variable &lt;strong&gt;BFL_API_KEY&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;export BFL_API_KEY=&amp;lt;Your API Key Here&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Example Code
&lt;/h2&gt;

&lt;p&gt;Below is the sample code from the &lt;a href="https://docs.bfl.ml/quick_start/gen_image/" rel="noopener noreferrer"&gt;Getting Started&lt;/a&gt;, with some additional comments. Ideally, it should handle errors using the &lt;a href="https://api.bfl.ml/scalar#model/resultresponse" rel="noopener noreferrer"&gt;status&lt;/a&gt;, but I left it unchanged for simplicity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="c1"&gt;# Request
&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://api.bfl.ml/v1/flux-pro-1.1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x-key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BFL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A cat on its back legs running like a human is holding a big silver fish with its arms. The cat is running away from the shop owner and has a panicked look on his face. The scene is situated in a crowded market.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;request_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Wait for completion
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://api.bfl.ml/v1/get_result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x-key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BFL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ready&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sample&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Status: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the prompt is: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A cat on its back legs running like a human is holding a big silver fish with its arms. The cat is running away from the shop owner and has a panicked look on his face. The scene is situated in a crowded market.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The final &lt;strong&gt;result&lt;/strong&gt; format looks like this. The response time was faster compared to other APIs I’ve tested.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Request ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ready&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sample&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;URL of the generated image&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Specified prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;sample&lt;/strong&gt; contains the URL of the generated image, which was hosted on &lt;strong&gt;bflapistorage.blob.core.windows.net&lt;/strong&gt; when I tested it.&lt;/p&gt;

&lt;p&gt;Here's the generated image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmktl8bga9vongfhff4bf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmktl8bga9vongfhff4bf.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result closely matches the prompt, capturing the sense of urgency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Experimenting with Alternative Prompts
&lt;/h2&gt;

&lt;p&gt;I tried different prompts to generate varied images.&lt;/p&gt;

&lt;h3&gt;
  
  
  Japanese Moe Heroine
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;Japanese moe heroine&lt;/strong&gt;," using &lt;strong&gt;anime style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anime style, Japanese moe heroine&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajjbn2fmawg0fuijcfrv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajjbn2fmawg0fuijcfrv.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sweets from Popular Japanese Anime
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;Sweets that appear in popular Japanese anime&lt;/strong&gt;," using &lt;strong&gt;anime style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anime style, sweets that appear in popular Japanese anime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3dst3621sqkgr4qi490.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3dst3621sqkgr4qi490.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Male High School Student on a School Trip
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;Male high school student on a school trip&lt;/strong&gt;," using &lt;strong&gt;anime style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anime style, male high school student on a school trip&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72tsg8cb6izm3t83r02x.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72tsg8cb6izm3t83r02x.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A Princess Playing Guitar
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;A princess playing guitar&lt;/strong&gt;," using &lt;strong&gt;fantasy-art style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fantasy-art style, a princess playing guitar&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdp2fejdt6lq5xksvuqk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdp2fejdt6lq5xksvuqk.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A Cute Fairy on Top of a White Laptop
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;A cute fairy on top of a white laptop&lt;/strong&gt;," using &lt;strong&gt;photographic style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;photographic style, a cute fairy on top of a white laptop&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndvpba53rdy4nex02pbb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndvpba53rdy4nex02pbb.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  28-Year-Old Japanese Woman with Black Bobbed Hair
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;28-year-old Japanese pretty woman with black bobbed hair&lt;/strong&gt;," using &lt;strong&gt;photographic style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;photographic style, 28-year-old Japanese pretty woman with black bobbed hair&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu32molmmh7gnbhluqo5f.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu32molmmh7gnbhluqo5f.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Hong Kong Downtown in the 1980s
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;Hong Kong downtown in the 1980s&lt;/strong&gt;," using &lt;strong&gt;photographic style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;photographic style, Hong Kong downtown in the 1980s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5lwc5ad93sw7d95ltai.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5lwc5ad93sw7d95ltai.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Shinjuku Kabukicho in 2020
&lt;/h3&gt;

&lt;p&gt;Prompt: "&lt;strong&gt;Shinjuku Kabukicho in 2020&lt;/strong&gt;," using &lt;strong&gt;photographic style&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;photographic style, Shinjuku Kabukicho in 2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hg22vhwegcaz1m2amq8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hg22vhwegcaz1m2amq8.jpg" alt="Generated Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All of the generated images were of exceptional quality.&lt;/p&gt;

&lt;p&gt;After generating so many high-quality AI images, reality almost feels surreal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Black Forest Labs&lt;/strong&gt; continues to innovate and enhance its AI models.&lt;/p&gt;

&lt;p&gt;I’m looking forward to the future release of video generation capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Original Japanese Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/1bea5313df7398041b4b" rel="noopener noreferrer"&gt;Stable Diffusionのオリジナル開発陣による画像生成AIモデル最新版FLUX 1.1 [pro]のWeb APIを呼んでいくつかの画像を生成してみた&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>stablediffusion</category>
      <category>ai</category>
      <category>flux1</category>
    </item>
    <item>
      <title>Using the "Dream Machine" Video Generation AI Service via Web API</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Sun, 06 Oct 2024 13:26:45 +0000</pubDate>
      <link>https://dev.to/nabata/using-the-dream-machine-video-generation-ai-service-via-web-api-ap</link>
      <guid>https://dev.to/nabata/using-the-dream-machine-video-generation-ai-service-via-web-api-ap</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Recently, the Web API for &lt;a href="https://lumalabs.ai/dream-machine" rel="noopener noreferrer"&gt;Dream Machine&lt;/a&gt; was released. In this article, I’ll walk you through how to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logging In and Purchasing Credits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67k5guchdlk9mapwawa5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67k5guchdlk9mapwawa5.png" alt="Dream Machine 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, log in with your Google account.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa61hf4jegg8vnp2c4l5d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa61hf4jegg8vnp2c4l5d.png" alt="Dream Machine 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can purchase credits from the &lt;strong&gt;Billing &amp;amp; Credits&lt;/strong&gt; page by selecting &lt;strong&gt;Add More Credits&lt;/strong&gt;. You can specify an amount ranging from $5 to $500. For further details, check out the &lt;a href="https://lumalabs.ai/dream-machine/api/pricing" rel="noopener noreferrer"&gt;Dream Machine API Pricing&lt;/a&gt; page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating an API Key
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dw36504yudw4a035cf6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dw36504yudw4a035cf6.png" alt="Dream Machine 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, create an API Key. Note that once it’s generated, you won’t be able to view it again, so make sure to store it securely.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It is your responsibility to record the key below as you will not be able to view it again.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Setting Up the Environment
&lt;/h2&gt;

&lt;p&gt;I’m running macOS 14 Sonoma, and the version of Python installed on my machine is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;Python 3.12.2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install the &lt;a href="https://docs.lumalabs.ai/docs/python" rel="noopener noreferrer"&gt;Python SDK&lt;/a&gt; as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;lumaai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll also need the &lt;a href="https://pypi.org/project/requests/" rel="noopener noreferrer"&gt;requests&lt;/a&gt; package for making HTTP requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the installed versions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip list | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; lumaai &lt;span class="nt"&gt;-e&lt;/span&gt; requests 
&lt;span class="go"&gt;lumaai             1.0.2
requests           2.31.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To avoid hard-coding the API key, I stored it as an environment variable named &lt;strong&gt;LUMAAI_API_KEY&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;export LUMAAI_API_KEY=your obtained API Key here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Generating a Video from Text
&lt;/h2&gt;

&lt;p&gt;Now, let’s generate a video from text by referring to the &lt;a href="https://docs.lumalabs.ai/docs/python#text-to-video" rel="noopener noreferrer"&gt;Text to Video section of the Python SDK Guide&lt;/a&gt;. For more details, check the &lt;a href="https://docs.lumalabs.ai/reference/ping-1" rel="noopener noreferrer"&gt;API Reference&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lumaai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LumaAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LumaAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;auth_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LUMAAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Get the API Key
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;generation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A teddy bear in sunglasses playing electric guitar and dancing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;aspect_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16:9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Enables looping
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The state can be one of queued, dreaming, completed, or failed
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;generation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Re-fetch the status
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;

&lt;span class="c1"&gt;# You can retrieve a list of all previous requests like this:
# print(client.generations.list())
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For this example, I used the prompt &lt;strong&gt;"A teddy bear in sunglasses playing electric guitar and dancing"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I set the aspect ratio to 16:9 and enabled &lt;strong&gt;loop&lt;/strong&gt; to make the video loop smoothly. This ensures that the first and last frames match.&lt;/p&gt;

&lt;p&gt;Here’s the generated video:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/T8jdaAy1FvY"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The result perfectly matches the prompt. Pretty impressive!&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating a Video from an Image
&lt;/h2&gt;

&lt;p&gt;Next, I tried generating a video from an image, based on the &lt;a href="https://docs.lumalabs.ai/docs/python#image-to-video" rel="noopener noreferrer"&gt;Image to Video section of the Python SDK Guide&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You should upload and use your own cdn image urls, currently this is the only way to pass an image&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means you’ll need to upload your image to a server that can provide a URL.&lt;/p&gt;

&lt;p&gt;For this test, I uploaded the following image to a server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7z54oatjfta0yd9cx5yw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7z54oatjfta0yd9cx5yw.png" alt="Original Image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s possible to specify different images for the first and last frames, but I only used an image for the first frame.&lt;/p&gt;

&lt;p&gt;Here’s the code, with the prompt &lt;strong&gt;"Japanese woman smiling"&lt;/strong&gt;. I did not set the loop option this time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lumaai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LumaAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LumaAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;auth_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LUMAAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Get the API Key
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;generation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Japanese woman smiling&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;keyframes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frame0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Specify_image_url_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The state can be one of queued, dreaming, completed, or failed
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;generation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Re-fetch the status
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s the video that was generated:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/Qf0r_hWTyeQ"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Once again, the prompt was accurately reflected in the output.&lt;/p&gt;

&lt;p&gt;It’s fascinating to see how far AI technology has come every time I test something like this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Being able to generate videos purely through code is incredible.&lt;/p&gt;

&lt;p&gt;I’m excited to experiment further and see what else is possible with this tool!&lt;/p&gt;

&lt;h2&gt;
  
  
  Japanese Version of the Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/895a6b3b3a2f85d748f5" rel="noopener noreferrer"&gt;動画生成AIサービス「Dream Machine」をWeb APIで呼び出してみた&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aivideo</category>
      <category>gen3</category>
      <category>dreammachine</category>
    </item>
    <item>
      <title>Converting Images to Video Using the Video Generation AI "Gen-3 Alpha" : The Results Were So Natural, It Was Almost Scary</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Sun, 29 Sep 2024 11:16:40 +0000</pubDate>
      <link>https://dev.to/nabata/converting-images-to-video-using-the-video-generation-ai-gen-3-alpha-the-results-were-so-natural-it-was-almost-scary-5c9</link>
      <guid>https://dev.to/nabata/converting-images-to-video-using-the-video-generation-ai-gen-3-alpha-the-results-were-so-natural-it-was-almost-scary-5c9</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this article, I’ll explore the process of converting images into videos using &lt;a href="https://runwayml.com/research/introducing-gen-3-alpha" rel="noopener noreferrer"&gt;Gen-3 Alpha&lt;/a&gt;, a well-known video generation AI similar to &lt;a href="https://lumalabs.ai/dream-machine" rel="noopener noreferrer"&gt;Dream Machine&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Runway
&lt;/h2&gt;

&lt;p&gt;Runway, a platform named after its parent company, offers a variety of generative AI tools, one of which is &lt;strong&gt;Gen-3 Alpha&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It is also available as an &lt;a href="https://apps.apple.com/app/apple-store/id1665024375" rel="noopener noreferrer"&gt;iOS app&lt;/a&gt;, but for this article, I’ll be using the browser version.&lt;/p&gt;

&lt;h2&gt;
  
  
  Account Creation
&lt;/h2&gt;

&lt;p&gt;You can create an account via the &lt;a href="https://app.runwayml.com/signup" rel="noopener noreferrer"&gt;Sign up&lt;/a&gt; page.&lt;/p&gt;

&lt;p&gt;You can register using an email address, Google, or Apple account. The Enterprise plan also supports SSO.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;Runway offers a free plan, but &lt;strong&gt;Gen-3 Alpha&lt;/strong&gt; is currently only accessible with paid plans. For this test, I subscribed to the Standard plan at $15/month, which provides 625 credits per month (&lt;strong&gt;Gen-3 Alpha&lt;/strong&gt; consumes 10 credits per second of video). The free plan gives access to &lt;strong&gt;Gen-3 Alpha Turbo&lt;/strong&gt;, a lower-cost version.&lt;/p&gt;

&lt;p&gt;For more details about the different plans, check out &lt;a href="https://runwayml.com/pricing/" rel="noopener noreferrer"&gt;pricing&lt;/a&gt;. If you choose the annual payment option, there’s a 20% discount.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Generation 1
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrxyo2fluu0lfsse2puu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrxyo2fluu0lfsse2puu.png" alt="Gen-3 Alpha 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I selected &lt;strong&gt;Gen-3 Alpha&lt;/strong&gt; and began generating a video.&lt;/p&gt;

&lt;p&gt;Since the required size is &lt;strong&gt;1280x768&lt;/strong&gt;, I used the following image that fits those dimensions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzsm3u9kobl4mpp57h03v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzsm3u9kobl4mpp57h03v.png" alt="Original Image 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your image is a different size, you can crop it directly in the browser.&lt;/p&gt;

&lt;p&gt;I used the following prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A Japanese woman is smiling happily&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can also generate videos without specifying a prompt. For more details on how to use prompts, refer to the &lt;a href="https://help.runwayml.com/hc/en-us/articles/30586818553107-Gen-3-Alpha-Prompting-Guide" rel="noopener noreferrer"&gt;Gen-3 Alpha Prompting Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvumxf87jfkj79ayq2nt6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvumxf87jfkj79ayq2nt6.png" alt="Gen-3 Alpha 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s possible to use different images for the first and last frames, but I used the same image for both in this case.&lt;/p&gt;

&lt;p&gt;Here’s the generated video, which is 10 seconds by default.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/q3ZB4sQclyg"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The result looks quite natural. The woman in the original image is genuinely smiling. If someone showed me this video without telling me it was AI-generated, I probably wouldn’t have noticed.&lt;/p&gt;

&lt;p&gt;For comparison, I generated a similar video using &lt;strong&gt;Dream Machine&lt;/strong&gt; with the same image and prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr15qsxv50s14jurb1106.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr15qsxv50s14jurb1106.png" alt="Gen-3 Alpha 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s the 5-second video generated by Dream Machine.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/NKjyD2TQcQU"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Although there’s significant movement, there is noticeable distortion, especially around the face, creating a sense of unease. This wasn’t as evident in the videos I generated in my previous article, so I thought it was worth mentioning as a reference point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Generation 2
&lt;/h2&gt;

&lt;p&gt;For further experimentation, I generated another video using a completely different image.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz0mphbnyu95fga2olnbe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz0mphbnyu95fga2olnbe.png" alt="Original Image 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used the following prompt for this image:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Japanese man dancing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here’s the generated video.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/wj8I_Jcmzkc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;This one also turned out very well.&lt;/p&gt;

&lt;p&gt;There’s a slight awkwardness in certain areas like the hands, but the video maintains consistency over its 10-second duration.&lt;/p&gt;

&lt;p&gt;For comparison, I also generated a video from the same image using &lt;strong&gt;Dream Machine&lt;/strong&gt;. Here’s the result:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/XpyZN2wHlpc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;I’m not sure if this counts as dancing, but there’s definitely movement, which is a nice touch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Although my testing was limited and the prompts were simple, I noticed distinct characteristics in the videos generated by both &lt;strong&gt;Gen-3 Alpha&lt;/strong&gt; and &lt;strong&gt;Dream Machine&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The field of video generation has made incredible advancements, and I’m excited to see where it goes next.&lt;/p&gt;

&lt;p&gt;There have also been some interesting recent developments in the video generation space.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI-generated videos aren't just the future: They're here, and they're scary. AI companies are rolling out tech that can produce realistic videos from simple text prompts. Adobe is just the latest, and their AI-generated videos are impressive—even if the demos are brief.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference&lt;/strong&gt;: &lt;a href="https://lifehacker.com/tech/adobe-is-also-making-an-ai-video-generator" rel="noopener noreferrer"&gt;Adobe's AI Video Generator Might Be as Good as OpenAI's | Lifehacker&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m looking forward to trying it out myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Japanese Version of the Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/4976abc57c4504edd377" rel="noopener noreferrer"&gt;動画生成AI「Gen-3 Alpha」のImage to Videoで画像を動画に変換してみたらやっぱり自然すぎて恐くなりもした&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aivideo</category>
      <category>gen3</category>
      <category>dreammachine</category>
    </item>
    <item>
      <title>Running the FLUX.1 Image ([dev]/[schnell]) Generation AI Model by Stable Diffusion's Original Developers on a MacBook (M2)</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Sun, 29 Sep 2024 01:23:36 +0000</pubDate>
      <link>https://dev.to/nabata/running-the-flux1-image-devschnell-generation-ai-model-by-stable-diffusions-original-developers-on-a-macbook-m2-4ld6</link>
      <guid>https://dev.to/nabata/running-the-flux1-image-devschnell-generation-ai-model-by-stable-diffusions-original-developers-on-a-macbook-m2-4ld6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;There has been a lot of buzz recently about &lt;a href="https://blackforestlabs.ai/" rel="noopener noreferrer"&gt;Black Forest Labs&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI researchers involved in the development of image generation AI such as '&lt;br&gt;
Stable Diffusion ' have launched a new AI development company ' Black Forest Labs '. In addition, Black Forest Labs has also announced ' Flux ', an open source image generation AI model with a parameter size of 12 billion.&lt;/p&gt;

&lt;p&gt;Reference：&lt;a href="https://gigazine.net/gsc_news/en/20240802-black-forest-labs-image-ai-flux/" rel="noopener noreferrer"&gt;Stable Diffusion's original developers launch AI company Black Forest Labs and release their own image generation AI model, Flux - GIGAZINE&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this post, I'll walk through the process of running the image generation AI model &lt;strong&gt;FLUX.1&lt;/strong&gt; on my MacBook (M2).&lt;/p&gt;

&lt;p&gt;All the code used here is written in &lt;strong&gt;Python&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  About FLUX.1
&lt;/h2&gt;

&lt;p&gt;For a detailed explanation, you can check out the official announcement “&lt;a href="https://blackforestlabs.ai/announcing-black-forest-labs/" rel="noopener noreferrer"&gt;Announcing Black Forest Labs - Black Forest Labs&lt;/a&gt;.” Here’s a quick overview of the three available models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FLUX.1 [pro]&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;A cutting-edge image generation model available via API access.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;FLUX.1 [dev]&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;A model for non-commercial applications, available on Hugging Face. For commercial use, inquiries are required.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;FLUX.1 [schnell]&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;A fast model optimized for local development and personal use, released under the Apache 2.0 license. It’s available on Hugging Face, and the inference code is found on GitHub and Hugging Face’s Diffusers, supporting ComfyUI integration.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;While I initially wanted to try &lt;strong&gt;FLUX.1 [pro]&lt;/strong&gt;, the &lt;a href="https://docs.bfl.ml/" rel="noopener noreferrer"&gt;API&lt;/a&gt; is currently invite-only for selected partners. While it’s possible to use it via platforms like &lt;a href="https://replicate.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Replicate&lt;/strong&gt;&lt;/a&gt; or &lt;a href="https://fal.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;Fal.ai&lt;/strong&gt;&lt;/a&gt;, I’ll be using &lt;strong&gt;FLUX.1 [dev]&lt;/strong&gt;, the highest-quality model that supports local generation. Towards the end of this article, I’ll also include an example using &lt;strong&gt;FLUX.1 [schnell]&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For those interested in quickly testing image generation, you can try the following links. Note that some services require payment.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FLUX.1 [pro]&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://replicate.com/black-forest-labs/flux-pro" rel="noopener noreferrer"&gt;Replicate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fal.ai/models/fal-ai/flux-pro" rel="noopener noreferrer"&gt;Fal.ai&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;FLUX.1 [dev]&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/spaces/black-forest-labs/FLUX.1-dev" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://replicate.com/black-forest-labs/flux-dev" rel="noopener noreferrer"&gt;Replicate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fal.ai/models/fal-ai/flux/dev" rel="noopener noreferrer"&gt;Fal.ai&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;FLUX.1 [schnell]&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://replicate.com/black-forest-labs/flux-schnell" rel="noopener noreferrer"&gt;Replicate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fal.ai/models/fal-ai/flux/schnell" rel="noopener noreferrer"&gt;Fal.ai&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  My Setup
&lt;/h2&gt;

&lt;p&gt;I’m running this on a &lt;a href="https://support.apple.com/en-us/111340" rel="noopener noreferrer"&gt;MacBook Pro (Apple M2 Pro Chip / 16GB RAM)&lt;/a&gt;, with the operating system being &lt;a href="https://www.apple.com/jp/macos/sonoma/" rel="noopener noreferrer"&gt;macOS 14 Sonoma&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The necessary packages include &lt;a href="https://github.com/huggingface/diffusers" rel="noopener noreferrer"&gt;diffusers&lt;/a&gt;, &lt;a href="https://pypi.org/project/sentencepiece/" rel="noopener noreferrer"&gt;sentencepiece&lt;/a&gt;, &lt;a href="https://huggingface.co/docs/transformers/model_doc/t5" rel="noopener noreferrer"&gt;t5&lt;/a&gt;, &lt;a href="https://pytorch.org/" rel="noopener noreferrer"&gt;torch&lt;/a&gt;, and &lt;a href="https://huggingface.co/docs/transformers/index" rel="noopener noreferrer"&gt;transformers&lt;/a&gt;. For &lt;strong&gt;diffusers&lt;/strong&gt;, I followed the &lt;a href="https://huggingface.co/black-forest-labs/FLUX.1-dev#diffusers" rel="noopener noreferrer"&gt;official installation guide&lt;/a&gt; from GitHub.&lt;/p&gt;

&lt;p&gt;Here’s the installation command, with a specific version of &lt;strong&gt;torch&lt;/strong&gt; for reasons I’ll explain later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;sentencepiece &lt;span class="nv"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.3.1 transformers git+https://github.com/huggingface/diffusers.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are the versions running in my environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip list | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; diffusers &lt;span class="nt"&gt;-e&lt;/span&gt; sentencepiece &lt;span class="nt"&gt;-e&lt;/span&gt; t5 &lt;span class="nt"&gt;-e&lt;/span&gt; torch &lt;span class="nt"&gt;-e&lt;/span&gt; transformers
&lt;span class="go"&gt;diffusers                    0.30.0.dev0
sentencepiece                0.2.0
t5                           0.9.4
torch                        2.3.1
transformers                 4.43.3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although &lt;strong&gt;torch 2.4.0&lt;/strong&gt; was the latest version as of August 5, 2024, I downgraded to &lt;strong&gt;2.3.1&lt;/strong&gt;. I initially tried &lt;strong&gt;2.4.0&lt;/strong&gt;, but the generated images were too noisy. After some research, I found that downgrading to &lt;strong&gt;2.3.1&lt;/strong&gt; resolved the issue, though I couldn’t confirm the exact reason.&lt;/p&gt;

&lt;p&gt;Here’s an example of a noisy image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexbdbgiy6zeliofa6nxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexbdbgiy6zeliofa6nxg.png" alt="flux 4"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Obtaining Access
&lt;/h2&gt;

&lt;p&gt;For this setup, I accessed the model via &lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;, so you’ll need a &lt;strong&gt;Hugging Face&lt;/strong&gt; account.&lt;/p&gt;

&lt;p&gt;Once you’ve created an account and logged in, go to the &lt;a href="https://huggingface.co/black-forest-labs/FLUX.1-dev" rel="noopener noreferrer"&gt;FLUX.1-dev page&lt;/a&gt;, where you’ll be asked to agree to the terms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfw06fus6kwdihq1y75r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfw06fus6kwdihq1y75r.png" alt="flux 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After reading and agreeing to the terms, click &lt;strong&gt;Agree and access repository&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You’ll see a confirmation message like this, indicating that access has been granted:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60wup37b2wpko9vik86c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60wup37b2wpko9vik86c.png" alt="flux 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating an Access Token
&lt;/h2&gt;

&lt;p&gt;To authenticate the model download, you’ll need to create an access token on &lt;strong&gt;Hugging Face&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Navigate to &lt;a href="https://huggingface.co/settings/profile" rel="noopener noreferrer"&gt;Settings&lt;/a&gt;, then to &lt;a href="https://huggingface.co/settings/tokens" rel="noopener noreferrer"&gt;Access Tokens&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;Create new Access Token&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Since you only need read permissions, select &lt;strong&gt;Read&lt;/strong&gt;, give it a name, and generate the token.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmxbdbuq66sxeb7faeo8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmxbdbuq66sxeb7faeo8.png" alt="flux 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To avoid hardcoding, store this token as an environment variable.&lt;/p&gt;

&lt;p&gt;Open your configuration file (in my case, it’s &lt;strong&gt;zshrc&lt;/strong&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;open ~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then record the generated access token. I used &lt;strong&gt;HUGGING_FACE_TOKEN&lt;/strong&gt; as the variable name.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;export HUGGING_FACE_TOKEN=your_access_token_here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Specifying PYTORCH_MPS_HIGH_WATERMARK_RATIO
&lt;/h2&gt;

&lt;p&gt;Since my MacBook has only 16GB of RAM, I ran into a memory shortage when trying to execute the model, resulting in this error:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RuntimeError: MPS backend out of memory&lt;br&gt;&lt;br&gt;
Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable the upper limit for memory allocations (may cause system failure).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This issue stems from insufficient memory for the &lt;strong&gt;MPS&lt;/strong&gt; GPU on my MacBook.&lt;/p&gt;

&lt;p&gt;As suggested, I set the following environment variable to remove the upper limit for &lt;strong&gt;MPS&lt;/strong&gt; usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be cautious, as removing this limit may cause the system to crash due to memory exhaustion.&lt;/p&gt;

&lt;p&gt;Initially, I tried using a non-zero value, but that led to the following error, so I set it to 0.0:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RuntimeError: invalid low watermark ratio 1.4&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s also possible to clear the &lt;strong&gt;MPS&lt;/strong&gt; cache with &lt;a href="https://pytorch.org/docs/stable/generated/torch.mps.empty_cache.html#torch.mps.empty_cache" rel="noopener noreferrer"&gt;torch.mps.empty_cache()&lt;/a&gt;, which may help avoid memory issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixing &lt;code&gt;transformer_flux.py&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Even after these adjustments, I encountered another error:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim&lt;br&gt;&lt;br&gt;
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This error means &lt;strong&gt;float32&lt;/strong&gt; needs to be used instead of &lt;strong&gt;float64&lt;/strong&gt;. You can fix it by modifying &lt;a href="(https://github.com/huggingface/diffusers/blob/a054c784955023385cf91e792db10970d00ca281/src/diffusers/models/transformers/transformer_flux.py#L41)"&gt;transformer_flux.py&lt;/a&gt; like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_default_dtype&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This issue is also mentioned in the GitHub issue “&lt;a href="https://github.com/huggingface/diffusers/issues/9047" rel="noopener noreferrer"&gt;flux does not work on MPS devices&lt;/a&gt;,” and it might be addressed soon. A similar issue was raised in ComfyUI: “&lt;a href="https://github.com/comfyanonymous/ComfyUI/issues/4165" rel="noopener noreferrer"&gt;FLUX Issue | MPS framework doesn’t support float64&lt;/a&gt;.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://huggingface.co/black-forest-labs/FLUX.1-dev#diffusers" rel="noopener noreferrer"&gt;official sample code&lt;/a&gt; doesn’t work directly, so I made the following changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Specifying the access token.&lt;/li&gt;
&lt;li&gt;Setting &lt;strong&gt;mps&lt;/strong&gt; as the &lt;strong&gt;device&lt;/strong&gt; to utilize the &lt;strong&gt;MacBook&lt;/strong&gt; GPU.&lt;/li&gt;
&lt;li&gt;Removing the call to &lt;strong&gt;enable_model_cpu_offload&lt;/strong&gt;.

&lt;ul&gt;
&lt;li&gt;This function assumes &lt;strong&gt;CUDA&lt;/strong&gt;, leading to an &lt;strong&gt;AssertionError: Torch not compiled with CUDA enabled&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s the modified code, with comments noting the changes from the official sample:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;diffusers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;  &lt;span class="c1"&gt;# Added for environment variable access
&lt;/span&gt;
&lt;span class="n"&gt;hf_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HUGGING_FACE_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Retrieve the Hugging Face token
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black-forest-labs/FLUX.1-dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                    &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hf_token&lt;/span&gt;  &lt;span class="c1"&gt;# Specify the Hugging Face token
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Specify MPS as the device
# pipe.enable_model_cpu_offload()  # Removed
&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A cat holding a sign that says hello world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guidance_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pil&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_inference_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_sequence_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;manual_seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flux-dev.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re testing, you can speed things up by reducing &lt;strong&gt;num_inference_steps&lt;/strong&gt;. You can control seed generation with &lt;strong&gt;generator&lt;/strong&gt;. For details on the parameters, refer to the &lt;a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#diffusers.FluxPipeline.__call__" rel="noopener noreferrer"&gt;API reference&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The model generates an image based on the prompt “&lt;strong&gt;A cat holding a sign that says hello world&lt;/strong&gt;.”&lt;/p&gt;

&lt;p&gt;The first time you run the code, the model and dependencies (over 30GB) will be downloaded, which may take some time depending on your network speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzd6jol2ouxghdwtal5jv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzd6jol2ouxghdwtal5jv.png" alt="flux 5"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While there are areas for improvement, the result is quite clean, and the text “&lt;strong&gt;Hello World&lt;/strong&gt;” is clearly legible. The prompt is faithfully represented in the image.&lt;/p&gt;

&lt;p&gt;I also tried a different prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anime style, Japanese moe heroine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s the result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqdtyvy1duzyrv6lec9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqdtyvy1duzyrv6lec9u.png" alt="flux 6"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While I won’t go into the nuances of &lt;strong&gt;moe&lt;/strong&gt;, the generated image is a high-quality, accurate representation of the prompt.&lt;/p&gt;

&lt;p&gt;For comparison, here’s an image generated with the same prompt using &lt;a href="https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1ultra/post" rel="noopener noreferrer"&gt;Stable Image Ultra&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc97n1fo9f0dntgehrmwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc97n1fo9f0dntgehrmwi.png" alt="Stable Diffusion Ultra"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Comparing them, I’d say &lt;strong&gt;FLUX.1&lt;/strong&gt; is closer to the concept of a &lt;strong&gt;moe heroine&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing FLUX.1 [schnell]
&lt;/h2&gt;

&lt;p&gt;I also tested &lt;strong&gt;FLUX.1 [schnell]&lt;/strong&gt;, the faster model that generates images in fewer steps. Using the &lt;a href="https://huggingface.co/black-forest-labs/FLUX.1-schnell#diffusers" rel="noopener noreferrer"&gt;official sample code&lt;/a&gt;, I generated the following image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;diffusers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;  &lt;span class="c1"&gt;# Added for environment variable access
&lt;/span&gt;
&lt;span class="n"&gt;hf_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HUGGING_FACE_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Retrieve the Hugging Face token
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black-forest-labs/FLUX.1-schnell&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                    &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hf_token&lt;/span&gt;  &lt;span class="c1"&gt;# Specify the Hugging Face token
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Specify MPS as the device
# pipe.enable_model_cpu_offload()  # Removed
&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A cat holding a sign that says hello world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guidance_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pil&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_inference_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_sequence_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;manual_seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flux-schnell.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As with &lt;strong&gt;FLUX.1 [dev]&lt;/strong&gt;, the first run will involve downloading over 30GB of data, which might take a while.&lt;/p&gt;

&lt;p&gt;Here’s the result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5uiwdaduf3aaacju7x0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5uiwdaduf3aaacju7x0.png" alt="flux 7"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also tried using the prompt &lt;strong&gt;“anime style, Japanese moe heroine”&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F883xjjc71saavz61fvgh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F883xjjc71saavz61fvgh.png" alt="flux 8"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While the quality isn’t quite as high as &lt;strong&gt;FLUX.1 [dev]&lt;/strong&gt;, both images faithfully reflect the prompt and are visually appealing. With fewer inference steps, &lt;strong&gt;FLUX.1 [schnell]&lt;/strong&gt; generates images much faster than &lt;strong&gt;FLUX.1 [dev]&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I’m eager to try the API when it becomes available.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Today we release the FLUX.1 text-to-image model suite. With their strong creative capabilities, these models serve as a powerful foundation for our upcoming suite of competitive generative text-to-video systems. Our video models will unlock precise creation and editing at high definition and unprecedented speed. We are committed to continue pioneering the future of generative media.&lt;/p&gt;

&lt;p&gt;Reference：&lt;a href="https://blackforestlabs.ai/announcing-black-forest-labs/" rel="noopener noreferrer"&gt;Announcing Black Forest Labs - Black Forest Labs&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m really looking forward to seeing what they develop next in the area of video generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Japanese Version of the Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/ed97f7238f13bd4450a3" rel="noopener noreferrer"&gt;Stable Diffusionのオリジナル開発陣が発表した画像生成AIモデルFLUX.1([dev]/[schnell])をMacBook(M2)で動かしてみた&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>stablediffusion</category>
      <category>ai</category>
      <category>flux1</category>
    </item>
    <item>
      <title>Using "Hive Moderation AI-GENERATED CONTENT DETECTION" to Identify Images from Tools like DALL·E 3, FLUX.1, and ImageFX</title>
      <dc:creator>nabata</dc:creator>
      <pubDate>Sat, 28 Sep 2024 13:58:00 +0000</pubDate>
      <link>https://dev.to/nabata/using-hive-moderation-ai-generated-content-detection-to-identify-images-from-tools-like-dalle-3-flux1-and-imagefx-3jf7</link>
      <guid>https://dev.to/nabata/using-hive-moderation-ai-generated-content-detection-to-identify-images-from-tools-like-dalle-3-flux1-and-imagefx-3jf7</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this article, we will evaluate and identify AI-generated images using &lt;a href="https://hivemoderation.com/ai-generated-content-detection" rel="noopener noreferrer"&gt;Hive Moderation AI-GENERATED CONTENT DETECTION&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The AI image generation tools we’ll focus on are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/dall-e-3/" rel="noopener noreferrer"&gt;DALL·E 3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/black-forest-labs/FLUX.1-dev" rel="noopener noreferrer"&gt;FLUX.1[dev]&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aitestkitchen.withgoogle.com/tools/image-fx" rel="noopener noreferrer"&gt;ImageFX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ja.stability.ai/stable-diffusion" rel="noopener noreferrer"&gt;Stable Diffusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.adobe.com/jp/products/firefly.html" rel="noopener noreferrer"&gt;Firefly&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Premise
&lt;/h2&gt;

&lt;p&gt;In the world of generative AI, &lt;a href="https://contentcredentials.org/" rel="noopener noreferrer"&gt;Content Credentials&lt;/a&gt; are becoming increasingly prevalent.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;With the improved quality of content from generative AI models, the need for transparency regarding the origin of AI-generated content has grown. All AI-generated images from Azure OpenAI Service now include Content Credentials—a tamper-evident method to disclose the origin and history of content. Content Credentials are based on an open technical specification from the Coalition for Content Provenance and Authenticity (C2PA), a Joint Development Foundation project.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference&lt;/strong&gt;: &lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-credentials" rel="noopener noreferrer"&gt;Content Credentials in Azure OpenAI - Microsoft Learn&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some of the images used in this evaluation include Content Credentials, yet there were many cases where the detection results didn’t align perfectly. For instance, images generated on the web using &lt;strong&gt;Firefly&lt;/strong&gt; by &lt;a href="https://www.adobe.com/" rel="noopener noreferrer"&gt;Adobe&lt;/a&gt; do include Content Credentials but were still identified as having a low probability of being AI-generated.&lt;/p&gt;

&lt;p&gt;Because of this, we will proceed without considering Content Credentials in this article.&lt;/p&gt;

&lt;p&gt;Additionally, while this article only includes two to three images per tool, more were tested in practice. Since the trends were consistent, we are presenting only a selection here.&lt;/p&gt;

&lt;h2&gt;
  
  
  DALL·E 3
&lt;/h2&gt;

&lt;p&gt;We first evaluated images generated by &lt;strong&gt;DALL·E 3&lt;/strong&gt; from &lt;a href="https://openai.com/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;, created by providing prompts through &lt;a href="https://openai.com/chatgpt/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzzbczce83n2i5s0c40e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzzbczce83n2i5s0c40e.png" alt="DALL·E 3 Image 1" width="800" height="561"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fer1colztl5ar3zwjpbvn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fer1colztl5ar3zwjpbvn.png" alt="DALL·E 3 Image 2" width="800" height="561"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro71kwxac03shqrg4tji.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro71kwxac03shqrg4tji.png" alt="DALL·E 3 Image 3" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although some of the labels were inaccurate, all images were correctly identified as AI-generated with a high probability score of over 99%. This is quite accurate, as they do look like AI-generated images.&lt;/p&gt;

&lt;p&gt;On a related note, I’ve recently heard people say, "&lt;strong&gt;This image looks AI-generated&lt;/strong&gt;." I wonder what criteria people use to make that judgment.&lt;/p&gt;

&lt;h2&gt;
  
  
  FLUX.1[dev]
&lt;/h2&gt;

&lt;p&gt;Next, we evaluated images generated using &lt;strong&gt;FLUX.1[dev]&lt;/strong&gt; by &lt;a href="https://blackforestlabs.ai/" rel="noopener noreferrer"&gt;Black Forest Labs&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2bl64dafpr46j3jvb90k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2bl64dafpr46j3jvb90k.png" alt="FLUX.1 Image 1" width="800" height="561"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftj2kaozw2q6fvo9slque.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftj2kaozw2q6fvo9slque.png" alt="FLUX.1 Image 2" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As expected, these were also identified as AI-generated with a high probability exceeding 99%.&lt;/p&gt;

&lt;h2&gt;
  
  
  ImageFX
&lt;/h2&gt;

&lt;p&gt;Next, we tested images generated by Google’s &lt;strong&gt;ImageFX&lt;/strong&gt;, which has gained attention for its impressive realism.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezhwcxsb1j4a5k4gfzn1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezhwcxsb1j4a5k4gfzn1.png" alt="ImageFX Image 1" width="800" height="561"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag53xz64n6haupwbgwl3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag53xz64n6haupwbgwl3.png" alt="ImageFX Image 2" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first image scored slightly below 90%, but it was still detected as AI-generated with a high probability. &lt;/p&gt;

&lt;p&gt;On a side note, the quality of the photo-like images, especially of people, is truly impressive and lives up to the hype. It’s becoming harder to believe that these images are AI-generated without prior knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stable Diffusion
&lt;/h2&gt;

&lt;p&gt;Next, we evaluated images generated by &lt;strong&gt;Stable Diffusion&lt;/strong&gt; from &lt;a href="https://ja.stability.ai/" rel="noopener noreferrer"&gt;Stability AI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6chdqg7ncqse6gdlzff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6chdqg7ncqse6gdlzff.png" alt="Stable Diffusion Image 1" width="800" height="561"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbajxzastmwfhj75p6pk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbajxzastmwfhj75p6pk.png" alt="Stable Diffusion Image 2" width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both images were detected as AI-generated, as expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing Image to Image
&lt;/h3&gt;

&lt;p&gt;Since almost all images so far were detected as AI-generated with over 99% accuracy, I decided to try something different.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;Image to Image&lt;/strong&gt; generation, you can specify a base image for the AI to modify. I’ll test how well this detection tool identifies AI modifications.&lt;/p&gt;

&lt;p&gt;Here’s a photo I took at a certain location. I’ve reduced the file size for easier viewing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49nbyr8en04nww22dsot.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49nbyr8en04nww22dsot.jpg" alt="Original Image" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using the same &lt;strong&gt;Stable Diffusion 3 Large&lt;/strong&gt;, I applied the prompt "&lt;strong&gt;Night view of buildings&lt;/strong&gt;" with &lt;strong&gt;Image to Image&lt;/strong&gt; generation. The degree to which the prompt modifies the original image can be adjusted between &lt;strong&gt;0.0 and 1.0&lt;/strong&gt;, with higher values deviating more from the original image. I generated images at various stages of modification and tested them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Original Image
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feg72v74mvc4nees66oux.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feg72v74mvc4nees66oux.png" alt="Stable Diffusion Original" width="800" height="561"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Detection result: near 0.&lt;/p&gt;

&lt;h3&gt;
  
  
  strength 0.3 (original image 0.7)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zdy28ru5k1cq2hzu90e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zdy28ru5k1cq2hzu90e.png" alt="Stable Diffusion 0.3" width="800" height="561"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Probability increased by only 0.2%, still near 0.&lt;/p&gt;

&lt;h3&gt;
  
  
  strength 0.5 (original image 0.5)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6hmpry0wczkhhgqdbme.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6hmpry0wczkhhgqdbme.png" alt="Stable Diffusion 0.5" width="800" height="561"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Now entering double digits.&lt;/p&gt;

&lt;h3&gt;
  
  
  strength 0.7 (original image 0.3)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikxcqtdgxyqpk6m7dis7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikxcqtdgxyqpk6m7dis7.png" alt="Stable Diffusion 0.7" width="800" height="561"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Approaching 50%.&lt;/p&gt;

&lt;h3&gt;
  
  
  strength 0.9 (original image 0.1)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3nv6p1261ij01pi7exo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3nv6p1261ij01pi7exo.png" alt="Stable Diffusion 0.9" width="800" height="561"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Surpassed 50%.&lt;/p&gt;

&lt;p&gt;As the degree of modification increases, the probability of AI detection rises proportionally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Firefly
&lt;/h2&gt;

&lt;p&gt;Lastly, we tested &lt;strong&gt;Firefly&lt;/strong&gt; from Adobe, with a slightly different approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Modifying the Sky
&lt;/h3&gt;

&lt;p&gt;Using &lt;a href="https://www.adobe.com/jp/creativecloud/roc/products/photoshop/beginner.html" rel="noopener noreferrer"&gt;Photoshop&lt;/a&gt;, I applied &lt;strong&gt;AI-generated content focused on the sky&lt;/strong&gt; in the previous night view image. Here’s the result. Only the upper half of the image is AI-generated, while the lower half remains unchanged from the original.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhclz7vub2osg6i4liwbg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhclz7vub2osg6i4liwbg.jpg" alt="Firefly Original Image 1" width="800" height="600"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;The detection result is as follows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlmmjb9gcrnhvictpx8o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlmmjb9gcrnhvictpx8o.png" alt="Firefly 1" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Detection result: 0.1%.&lt;/p&gt;

&lt;h3&gt;
  
  
  Modifying the River
&lt;/h3&gt;

&lt;p&gt;Next, I left the upper half as-is and applied AI generation to the lower half of the image.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ehgpctkdg7xw6gkuud9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ehgpctkdg7xw6gkuud9.jpg" alt="Firefly Original Image 2" width="800" height="600"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;The detection result is as follows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6yjzhzrqp84cv691nfc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6yjzhzrqp84cv691nfc.png" alt="Firefly 2" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Detection result: similarly low probability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing 100% AI-Generated Image
&lt;/h3&gt;

&lt;p&gt;Finally, I tested an entirely AI-generated image from &lt;strong&gt;Firefly&lt;/strong&gt; using a prompt. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixqzppgz992m4evk0irj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixqzppgz992m4evk0irj.jpg" alt="Firefly Original Image 3" width="800" height="800"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;The detection result is as follows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frl2y7sm3uxdhijs9myww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frl2y7sm3uxdhijs9myww.png" alt="Firefly 3" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Detection result: slightly above 1%, but still relatively low.&lt;/p&gt;

&lt;p&gt;While &lt;strong&gt;Firefly&lt;/strong&gt; is listed as a supported tool in &lt;strong&gt;AI-GENERATED CONTENT DETECTION&lt;/strong&gt;, could it be that it does not yet fully account for the latest generation logic? Alternatively, could &lt;strong&gt;Firefly&lt;/strong&gt; operate on fundamentally different principles?&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;While this tool provides a useful benchmark, it remains difficult to accurately detect AI-generated content in every case.&lt;/p&gt;

&lt;p&gt;As I’ve noted previously, with the prevalence of AI in digital tools today, attempting to definitively distinguish between &lt;strong&gt;AI-generated and non-AI&lt;/strong&gt; content may be a futile effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Japanese Version of the Article
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qiita.com/nabata/items/ed3f7a6bd0e17677b000" rel="noopener noreferrer"&gt;AI生成かどうかを判定する「Hive Moderation AI-GENERATED CONTENT DETECTION」にDALL·E 3やFLUX.1、ImageFX等の生成画像を判定させてみた&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>flux1</category>
      <category>openai</category>
      <category>firefly</category>
    </item>
  </channel>
</rss>
