When I first started experimenting with GPT image generation for my Garden Visualizer project, I honestly thought the workflow would be simple:
- Upload image
- Write one good prompt
- Receive perfect result
A prompt like:
“Add low-maintenance shrubs to this Auckland backyard.”
And it works?
Not exactly!…
After weeks of experimenting with ChatGPT app with staging photos, gardens, landscaping concepts, and “future visualizations”, I slowly realized something surprising:
The impressive results I was getting were not coming from a single prompt at all. They were coming from hidden context that ChatGPT had been accumulating throughout the conversation.
The Strange Thing I Noticed
At some point, I could type something incredibly vague like:
“Show after 3 years.”
…and GPT somehow knew:
- Which shrubs I meant
- Where the lemon tree it was
- That I disliked overcrowded gardens
- That I preferred realistic suburban styling
- That I wanted sparse planting
- That I preferred low-maintenance layouts
That was the moment I realized: The “prompt” was not the whole product. The conversation itself had become part of the generation engine.
My Wrong Assumption About the Image Edit API
Initially, I assumed the Image Edit API worked similarly to ChatGPT conversations.
I thought:
- the model would somehow “remember”
- the AI would learn over time
- previous generations would influence future ones automatically
But API calls are usually stateless - which means every request is effectively isolated unless you manually resend the context.
So this:
{
"users_image": "messy_backyard.jpg",
"prompt": "plz generate a nicely layout idea for this garden"
}
…actually contains almost no useful information.
The model has no idea:
- How dense should be the plants
- What realism level
- What maintenance standard
- What climate
- What style direction
- Budget constraints
The API is not “remembering”. MY app has to carry the memory.
How About Using Reference Images?
Then I got curious. Instead sending the original image and a pure text prompt, what if I also include example images I liked?
Suddenly the AI had a visual target. Not exact copying — but guidance. The reference image could influence:
- density
- lighting mood
- realism
- staging style
- color palette
- suburban vs luxury feel
And honestly, this may become one of the most important patterns in AI apps going forward.
Not:
“Here is my magic prompt.”
But:
“Here is the style, taste, realism, and emotional direction I want you to follow.”
Final Thought
I started this project thinking:
“The better the prompt, the better the image.”
Now I think a more accurate statement is:
“The better the accumulated context, the better the direction.”
And that subtle difference completely changed how I think about building AI products.
Top comments (0)