DEV Community

Cover image for The main pitfalls in generating images with DALL-E API.
Serhii Korol
Serhii Korol

Posted on

The main pitfalls in generating images with DALL-E API.

I highly recommend reading to end this article before you start using DALL-E API. I want to share my experience and what I faced when using DALL-E.

Entry

What's the DALL-E? It's a product from OpenAI that created ChatGPT. If you already use ChatGPT, you might receive a trial period for DALL-E. Your status can see by this link. In this article, we consider 3 APIs for generating images.

Preparations

You should create a simple ASP.NET MVC project and execute a small setting in your Open AI profile. Go to the https://platform.openai.com/account/api-keys and make the secret key.

profile

After you'll do it, save this key anywhere you can find it.

Models.

Let's create several models for input data and result data. I made it in the Models folder and created a Dalle record. It'll be a root model for exchanging data between View and Controller.

public record Dalle
{
    public GenerateInput? GenerateInput { get; set; }
    public List<string>? Links { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

And create a model for input data.

public record GenerateInput
{
    [JsonPropertyName("prompt")] public string? Prompt { get; set; }
    [JsonPropertyName("n")] public short? N { get; set; }
    [JsonPropertyName("size")] public string? Size { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

And also you should create a couple of models for results.

public record ResponseModel
{
    [JsonPropertyName("created")]
    public long Created { get; set; }
    [JsonPropertyName("data")]
    public List<Link>? Data { get; set; }
}

public record Link
{
    [JsonPropertyName("url")]
    public string? Url { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

First API. Generating images from text.

And now when we already have all the needed models. We are able to start creating endpoints and input forms. Follow Views => Home => Index.cshtml and paste this markup:

@model TextToPicture.Models.Dalle
@{
    ViewData["Title"] = "Home Page";
}
<h1 class="display-4">Hi, my name is DALL-E</h1>

<div class="container-fluid">
    <div class="row">
        <div class="col-4">
            @using (Html.BeginForm("GenerateImage", "Home", FormMethod.Post, new { @class = "form-horizontal" }))
            {
                @Html.AntiForgeryToken()
                <div class="mb-3">
                    @Html.LabelFor(m => m.GenerateInput!.Prompt, "Prompt: ", new {@class = "form-label"})
                    @Html.TextAreaFor(m => m.GenerateInput!.Prompt, new {@class = "form-control", required = "required"})
                    @Html.ValidationMessageFor(m => m.GenerateInput!.Prompt)
                </div>
                <div class="mb-3">
                    @Html.LabelFor(m => m.GenerateInput!.N, "Number of Images: ", new {@class = "form-label"})
                    @Html.TextBoxFor(m => m.GenerateInput!.N, new {@class = "form-control", type = "number", min = "1", max = "10", required = "required"})
                    @Html.ValidationMessageFor(m => m.GenerateInput!.N)
                </div>
                <div class="mb-3">
                    @Html.LabelFor(m => m.GenerateInput!.Size, "Source Currency", new {@class = "form-label"})
                    @Html.DropDownListFor(m => m.GenerateInput!.Size, new SelectList(new List<string> { "256x256", "512x512", "1024x1024" }), new { @class = "form-control" })
                    @Html.ValidationMessageFor(m => m.GenerateInput!.Size)
                    <span>(e.g., 1024x1024)</span>
                </div>
                <div class="btn-group" role="group" aria-label="Generate">
                    <button type="submit" class="btn btn-success">Generate</button>
                </div>
            }
        </div>
    </div>
</div>
Enter fullscreen mode Exit fullscreen mode

We'll be sending text, image quantity, and picture size. Go to the HomeController.cs and we'll start creating a new action.

    private readonly HttpClient _httpClient;
    private readonly IWebHostEnvironment _hostingEnvironment;

    public HomeController(IWebHostEnvironment hostingEnvironment)
    {
        _hostingEnvironment = hostingEnvironment;
        _httpClient = new HttpClient();
        _httpClient.BaseAddress = new Uri("https://api.openai.com/");
    }

    [HttpPost]
    public async Task<ActionResult> GenerateImage(GenerateInput generateInput)
    {
        try
        {

        }
        catch (Exception ex)
        {

        }
    }
Enter fullscreen mode Exit fullscreen mode

I'll step by step build this action and explain how it works. Let's move on. First, let's create a base request where authorization will be.

//Request creating
var request = CreateBaseRequest(HttpMethod.Post, "v1/images/generations");
var jsonRequest = JsonSerializer.Serialize(generateInput);
request.Content = new StringContent(jsonRequest);
request.Content!.Headers.ContentType = new MediaTypeHeaderValue("application/json");
Enter fullscreen mode Exit fullscreen mode

And also add this method and set your secret key that you kept earlier.

private HttpRequestMessage CreateBaseRequest(HttpMethod method, string uri)
{
        var httpRequestMessage = new HttpRequestMessage(method, uri);
        var apiKey = "your-secret-key";
        httpRequestMessage.Headers.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
        return httpRequestMessage;
}
Enter fullscreen mode Exit fullscreen mode

Pay attention, you should mandatory set the content type or you'll get bad requests.

request.Content!.Headers.ContentType = new MediaTypeHeaderValue("application/json");
Enter fullscreen mode Exit fullscreen mode

At the following step, we need execute request. It's ordinary action.

//Result 
var response = await _httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
response.EnsureSuccessStatusCode();
Enter fullscreen mode Exit fullscreen mode

In the last step, you need to handle the response and save the received image.

// Pass the image URL to the view
var model = new Dalle { Links = await GetUrls(response)};
return RedirectToAction("Index", model);
Enter fullscreen mode Exit fullscreen mode

In this method, we get URLs from the response and return them. However, there is one detail. The API returns a URL with limited access time. After a while, you are able to get access by link only with a secret key. For this reason, I decided to save every picture. Add this method for creating folders and files by the current date.

public async Task SaveFileByLink(string link)
{
        var date = DateTime.Now.Ticks.ToString();
        var fileName = $"file" + date + ".png";
        var uploadsFolder = Path.Combine(_hostingEnvironment.WebRootPath, "Uploads");
        var filePath = Path.Combine(uploadsFolder, fileName);
        await using var fileStream = new FileStream(filePath, FileMode.Create);
        await fileStream.WriteAsync(await DownloadImage(link));
}
Enter fullscreen mode Exit fullscreen mode

Add this method for getting image bytes.

public async Task<byte[]> DownloadImage(string url)
{
    return await _httpClient.GetByteArrayAsync(url);
}
Enter fullscreen mode Exit fullscreen mode

In block, catch add simple error handler.

// Handle any error that occurred during the API request
ViewBag.Error = ex.Message;
return View("Index");
Enter fullscreen mode Exit fullscreen mode

And sure pass data to the Index action.

public IActionResult Index(Dalle model)
{
    return View(model);
}
Enter fullscreen mode Exit fullscreen mode

And also let's add markup for showing images:

<hr style="height:2px;border-width:0;color:gray;background-color:gray">
<div class="container-fluid">
    <div class="row">
        @if (@Model?.Links != null)
        {
            <div class="img-fluid">
                <h4>Generated Images:</h4>
                @foreach (var link in Model.Links)
                {
                    <img src="@link" alt="Generated Image" />
                }
            </div>
        }
    </div>
</div>
Enter fullscreen mode Exit fullscreen mode

And now let's check this out.

api 1

Fill the form.

form

Don't try to make prompts dedicated to famous people, since DALL-E is censored, and return bad requests if you'll try to indicate Elon Mask's name or something else.

You should get results with four pictures:

result

Second API. Editing pictures.

This API has issues and pitfalls. I'll tell you what I could find about it.

In the beginning, let's add a new input model.

public record EditInput
{
    [JsonPropertyName("prompt")] public string Prompt { get; set; }
    [JsonPropertyName("n")] public short N { get; set; }
    [JsonPropertyName("size")] public string Size { get; set; }
    [JsonPropertyName("image")] public IFormFile Image { get; set; }
    [JsonPropertyName("mask")] public IFormFile Mask { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

And don't forget to add to the root model.

public record Dalle
{
    public GenerateInput? GenerateInput { get; set; }
    public EditInput? EditInput { get; set; }
    public List<string>? Links { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

Go to the markup and paste this code:

<div class="col-4">
            @using (Html.BeginForm("EditImage", "Home", FormMethod.Post, new { @class = "form-horizontal", enctype = "multipart/form-data" }))
            {
                @Html.AntiForgeryToken()
                <div>
                    @Html.LabelFor(m => m.EditInput!.Image, "Image: ", new {@class = "form-label"})
                    @Html.TextBoxFor(m => m.EditInput!.Image, new { type = "file", required = "required" })
                    @Html.ValidationMessageFor(m => m.EditInput!.Image)
                </div>
                <div>
                    @Html.LabelFor(m => m.EditInput!.Mask, "Mask: ", new {@class = "form-label"})
                    @Html.TextBoxFor(m => m.EditInput!.Mask, new { type = "file", required = "required" })
                    @Html.ValidationMessageFor(m => m.EditInput!.Mask)
                </div>
                <div class="mb-3">
                    @Html.LabelFor(m => m.EditInput!.Prompt, "Prompt: ", new {@class = "form-label"})
                    @Html.TextAreaFor(m => m.EditInput!.Prompt, new {@class = "form-control", required = "required"})
                    @Html.ValidationMessageFor(m => m.EditInput!.Prompt)
                </div>
                <div class="mb-3">
                    @Html.LabelFor(m => m.EditInput!.N, "Number of Images: ", new {@class = "form-label"})
                    @Html.TextBoxFor(m => m.EditInput!.N, new {@class = "form-control", type = "number", min = "1", max = "10", required = "required"})
                    @Html.ValidationMessageFor(m => m.EditInput!.N)
                </div>
                <div class="mb-3">
                    @Html.LabelFor(m => m.EditInput!.Size, "Source Currency", new {@class = "form-label"})
                    @Html.DropDownListFor(m => m.EditInput!.Size, new SelectList(new List<string> { "256x256", "512x512", "1024x1024" }), new { @class = "form-control" })
                    @Html.ValidationMessageFor(m => m.EditInput!.Size)
                    <span>(e.g., 1024x1024)</span>
                </div>
                <div class="btn-group" role="group" aria-label="Edit">
                    <button type="submit" class="btn btn-success">Edit</button>
                </div>
            }
        </div>
Enter fullscreen mode Exit fullscreen mode

You can return to HomeController and ad new action:

[HttpPost]
    public async Task<IActionResult> EditImage(EditInput editInput)
    {
        try
        {
            // Add the form data
            var formData = new MultipartFormDataContent();
            formData.Add(new StringContent(editInput.Prompt), "prompt");
            formData.Add(new StringContent(editInput.N.ToString()), "n");
            formData.Add(new StringContent(editInput.Size), "size");

            // Add the image file
            await AddFormDataFile(formData, editInput.Image, "image");

            //Add the mask file
            await AddFormDataFile(formData, editInput.Mask, "mask");

            // Prepare the form data
            var request = CreateBaseRequest(HttpMethod.Post, "v1/images/edits");
            request.Content = formData;

            // Make the API request
            var response = await _httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
            response.EnsureSuccessStatusCode();

            // Pass the image URL to the view
            var model = new Dalle { Links = await GetUrls(response)};
            return RedirectToAction("Index", model);
        }
        catch (Exception ex)
        {
            // Handle any error that occurred during the API request
            ViewBag.Error = ex.Message;
            return View("Index");
        }
    }
Enter fullscreen mode Exit fullscreen mode

There different requests. We create a request with MultipartFormDataContent. For setting files I decided to create the method:

private async Task AddFormDataFile(MultipartFormDataContent formData, IFormFile file, string name)
    {
        using var memoryStream = new MemoryStream();
        await using (var fileStream = file.OpenReadStream())
        {
            await fileStream.CopyToAsync(memoryStream);
        }

        var imageData = ConvertRgb24ToRgba32(memoryStream.ToArray());
        var imageContent = new ByteArrayContent(imageData);
        imageContent.Headers.ContentType = new MediaTypeHeaderValue("image/png");
        formData.Add(imageContent, name, file.FileName);
    }
Enter fullscreen mode Exit fullscreen mode

Here I want to tell about pitfalls. You'll not find this in the documentation. All files you should send as a bytes array. In the documentation written that you can upload similar pictures in PNG format. But there has one nuance. The file should be in RGBA format, but the first API generates pictures in RGB format. Keep in mind this nuance. For this reason, I created a converter that converts from RGB to RGBA.

public byte[] ConvertRgb24ToRgba32(byte[] inputImage)
    {
        using var inputStream = new MemoryStream(inputImage);
        using var outputStream = new MemoryStream();

        // Load the input image using ImageSharp
        using var image = Image.Load<Rgb24>(inputStream);

        // Create a new image with RGBA32 pixel format
        using var convertedImage = new Image<Rgba32>(image.Width, image.Height);

        // Convert RGB to RGBA
        for (int y = 0; y < image.Height; y++)
        {
            for (int x = 0; x < image.Width; x++)
            {
                Rgb24 inputPixel = image[x, y];
                Rgba32 outputPixel = new Rgba32(inputPixel.R, inputPixel.G, inputPixel.B, byte.MaxValue);
                convertedImage[x, y] = outputPixel;
            }
        }

        // Save the converted image to the output stream
        convertedImage.Save(outputStream, new PngEncoder());

        // Return the converted image as a byte array
        return outputStream.ToArray();
    }
Enter fullscreen mode Exit fullscreen mode

For using this you need the SixLabors.ImageSharp package.
But it's not all. This API isn't working. I'll show you it. You should see this form.

Second API

You need to upload the source image and mask image.

image

mask

And fill form.

form

The result will without changes.

issue

I thought that it was an issue with C# or I made a mistake. However, I created a request from cURL and got the same result.

curl https://api.openai.com/v1/images/edits -H "Authorization: Bearer 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" -F image="@sunlit_lounge_rgba.png" -F mask="@mask_rgba.png" -F prompt="A sunlit indoor lounge area with a pool containing a flamingo" -F n=1 -F size="1024x1024" > output.json
Enter fullscreen mode Exit fullscreen mode

This snippet was grabbed from the documentation. Actually, this API doesn't work and is useless.

Third API. Generating different variations of picture.

Let's go to add new model.

public record VariationInput
{
    [JsonPropertyName("n")] public short N { get; set; }
    [JsonPropertyName("size")] public string Size { get; set; }
    [JsonPropertyName("image")] public IFormFile Image { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

Add new markup.

<div class="col-4">
            @using (Html.BeginForm("VariationImage", "Home", FormMethod.Post, new { @class = "form-horizontal", enctype = "multipart/form-data" }))
            {
                @Html.AntiForgeryToken()
                <div>
                    @Html.LabelFor(m => m.VariationInput!.Image, "Image: ", new {@class = "form-label"})
                    @Html.TextBoxFor(m => m.VariationInput!.Image, new { type = "file", required = "required" })
                    @Html.ValidationMessageFor(m => m.VariationInput!.Image)
                </div>
                <div class="mb-3">
                    @Html.LabelFor(m => m.VariationInput!.N, "Number of Images: ", new {@class = "form-label"})
                    @Html.TextBoxFor(m => m.VariationInput!.N, new {@class = "form-control", type = "number", min = "1", max = "10", required = "required"})
                    @Html.ValidationMessageFor(m => m.VariationInput!.N)
                </div>
                <div class="mb-3">
                    @Html.LabelFor(m => m.VariationInput!.Size, "Source Currency", new {@class = "form-label"})
                    @Html.DropDownListFor(m => m.VariationInput!.Size, new SelectList(new List<string> { "256x256", "512x512", "1024x1024" }), new { @class = "form-control" })
                    @Html.ValidationMessageFor(m => m.VariationInput!.Size)
                    <span>(e.g., 1024x1024)</span>
                </div>
                <div class="btn-group" role="group" aria-label="Variation">
                    <button type="submit" class="btn btn-success">Variation</button>
                </div>
            }
        </div>
Enter fullscreen mode Exit fullscreen mode

The action is similar to the previous one. I won't be stopping on this.

[HttpPost]
    public async Task<IActionResult> VariationImage(VariationInput variationInput)
    {
        try
        {
            // Add the form data
            var formData = new MultipartFormDataContent();
            formData.Add(new StringContent(variationInput.N.ToString()), "n");
            formData.Add(new StringContent(variationInput.Size), "size");

            // Add the image file
            await AddFormDataFile(formData, variationInput.Image, "image");


            // Prepare the form data
            var request = CreateBaseRequest(HttpMethod.Post, "v1/images/variations");
            request.Content = formData;

            // Make the API request
            var response = await _httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
            response.EnsureSuccessStatusCode();

            // Pass the image URL to the view
            var model = new Dalle { Links = await GetUrls(response)};
            return RedirectToAction("Index", model);
        }
        catch (Exception ex)
        {
            // Handle any error that occurred during the API request
            ViewBag.Error = ex.Message;
            return View("Index");
        }
    }
Enter fullscreen mode Exit fullscreen mode

The form is without text and you need to upload a picture.

third api

I uploaded a picture and got four similar pictures.

result

This API works fine.

Last words.

Generating pictures can be useful, but it's very raw technology. Midjourney is much better. I used not a trial subscription, it was billed. I spent 1 dollar but I don't know how it can be useful.

The code you are able to get this link.

That's all. Happy coding!

Buy Me A Beer

Top comments (0)