How I Added Multi-Turn Image Generation Support to LlamaIndex

#ai #opensource #automation #startup

The agent could generate an image once, but when you asked it to modify or create variations - it had no idea what image you were talking about. The conversation had no memory of the previous image.

That broke a lot of interesting multi-turn creative workflows.

Context

While contributing to LlamaIndexTS (the TypeScript version of LlamaIndex), I noticed that image generation tools only worked for single-turn interactions. There was no clean way to reference a previously generated image in follow-up messages. This was especially painful when building agents that iterate on visuals - like creating logos, editing images, or generating multiple versions.

The Investigation

I started by reproducing the issue. The tool was calling OpenAI’s image generation API correctly the first time, but the response didn’t preserve any identifier for the generated image. Later messages had no context about which image to modify.

After digging through the tool calling flow, response parsing logic, and how messages were being stored, I found that the image_id returned by OpenAI wasn’t being extracted or passed forward in the conversation history.

The Solution

I added support for image_id across the board:

Added an image_id parameter to image generation tools

Enhanced response parsing to properly extract and store the image_id from OpenAI responses

Updated message options and tool configurations so subsequent requests can reference previous images

Created a working example showing the full multi-turn image generation workflow

The PR got merged smoothly: feat: multi-turn image generation support #2106

Lessons Learned

Streaming + tool calling can get tricky when the API returns important metadata (like image IDs) that isn’t part of the main content. Always check what fields the model actually returns, not just what you expect. Small details in response parsing can unlock much bigger capabilities.

What about me?
I love diving deep into agent frameworks and fixing core interaction loops. If you're building AI agents (especially ones involving images, tools, or complex workflows) and need help shipping fast - feel free to reach out.

DEV Community

How I Added Multi-Turn Image Generation Support to LlamaIndex

Top comments (0)