Peng Qian

Posted on Jul 27 • Originally published at dataleadsfuture.com

Use LLamaIndex Workflow to Create an Ink Painting Style Image Generation Workflow

#ai #programming #agentaichallenge #openai

In today's article, I'll help you build a workflow that can generate ink illustrations with strong Eastern style. This workflow also allows for multiple rounds of prompt adjustments and final image tweaks, helping save on token and time costs.

You can find the source code for this project at the end of the article.

Introduction

Recently I wanted to create an agent workflow that could quickly generate images for my blog at low cost.

I wanted my blog images to have strong artistic flair and classical Eastern charm. So I hoped my workflow could precisely control the LLM context and continuously adjust the prompts for drawing as well as the final image effects, while keeping token and time costs to a minimum.

Then I immediately faced a dilemma:

If I chose low-code platforms like dify or n8n, I wouldn't get enough flexibility. These platforms can't support adjusting prompts in conversations or generating blog images according to article styles.

If I chose popular agent development frameworks like LangGraph or CrewAI, these frameworks are too high-level in abstraction, preventing fine control over the execution process of agent applications.

If you were faced with this task, how would you choose?

Fortunately, the world isn't black and white. After countless failures and continuous attempts, I finally found a great solution: LLamaIndex Workflow.

It provides an efficient workflow development process while not abstracting too much from my agent execution process, allowing the image generation program to run precisely as I require.

Today, let me use the latest LlamaIndex Workflow 1.0 version to build a workflow for generating ink painting style illustrations for you.

Why should you care?

In today's article, I will:

Guide you to learn the basic usage of LlamaIndex Workflow through project practice.
Use chainlit to build a chatbot interface where you can visually see the generated images.
Leverage deepseek to generate more project-appropriate drawing prompts outside of the DALL-E-3 model.
Use multi-turn conversations to make further adjustments to the generated prompts or the final generated images.
Optimize costs at the token and time level through fine control of the LLM context.

Ultimately, you only need a simple description to draw a beautiful ink painting style image.

More importantly, through practicing this project, you will gain a preliminary understanding of how we use workflows to complete complex customized requirement development in enterprise-level agent applications.

If you need some prerequisite knowledge, I wrote an article explaining in detail the event-driven architecture of LlamaIndex Workflow. You can read it by clicking here:

Deep Dive into LlamaIndex Workflow: Event-driven LLM architecture

Business Process Design

For the development of agent workflow type applications, I strongly recommend designing the business process flow before starting coding. This helps you grasp the entire program operation process.

In this chapter, I will demonstrate my complete design thinking for the business process flow of this project:

Prompt generation process

In today's project, I'm using DALL-E-3 for drawing. DALL-E-3 itself has the ability to rewrite user intentions into detailed prompts suitable for drawing.

But our requirements are higher. We want DALL-E-3 to draw the picture exactly as I imagine. So we will move the process of rewriting user intentions into detailed prompts from the DALL-E-3 model to our own workflow node.

Since today's theme is to draw beautiful ink painting style illustrations, I need to select an LLM that can fully understand the Eastern ambiance in user intentions and expand it into a drawing prompt that DALL-E-3 can understand. Here I chose the DeepSeek-Chat model.

Since DeepSeek has been prompted to generate DALL-E-3 drawing prompts, the generated prompts are in pure English.

If you are proficient in English, then this step can be ended here. But if you, like me, are a non-native English speaker and want to accurately understand the content of the generated prompts, you can add a translation node. This step is not troublesome.

Finally, the generated prompts and their translations will be returned to the user through LlamaIndex Workflow's StopEvent.

Using context sharing to decouple workflow loops

After generating the prompt, next we either provide the prompt to DALL-E to generate the image or return to let DeepSeek readjust the prompt again.

At this point, you must be thinking of adding user feedback and workflow iteration features. Based on the final generated image effect, provide modification suggestions and ask the workflow to regenerate the prompt, looping until satisfied output is obtained.

Since we need to support user feedback after both prompt generation and image generation nodes, this loop will greatly increase the complexity of the workflow.

But in today's project, I plan to use a small trick to significantly simplify the implementation of the workflow.

Instead of adding user feedback after prompt or image generation to determine whether to regenerate the prompt, we add an if-else node at the very beginning of the workflow to judge whether the user input contains specific keywords (here APPROVE, you can replace it with your own). If it does contain, go to the image generation branch; if not, go to the prompt generation branch.

This way, each iteration is a re-execution of the workflow, thus decoupling the workflow from the iteration.

After adding the branch node, we will face a problem: the image generation branch doesn't know what the prompt generated from the previous run was. And I don't want to save all the message history from the last run because the message history also includes translations of the prompt and other information that doesn't need to be sent to the LLM.

At this point, the best choice is to let all runs of the workflow use the same context and save the generated prompt into the context.

Fortunately, LlamaIndex Workflow supports sharing the same context across multiple runs, so we can add logic to save variables into the Workflow Context.

Rewrite user's historical drawing requests

Since users will gradually adjust the prompts generated by the LLM through multiple conversations, we need to provide the LLM with the complete conversation history.

The common approach is to use messages with role as user and role as assistant to save and provide the historical conversations between users and the LLM to DeepSeek.

But doing this, as the adjustments continue, the conversation messages will become longer and longer, causing the LLM to ignore truly important key information.

We can't adopt the method of truncating historical information and only keeping the recent few rounds of conversations. Because the most detailed drawing intention is usually provided in the user's initial request.

So here I will take the approach of rewriting the user's request history, rewriting the user's multiple rounds of adjustments to the prompt and image into a complete request.

The specific method is: create a list container in the Context, when generating the prompt, append the user's latest input into this list. Then call DeepSeek to rewrite all user inputs into a complete drawing description and store it in the Context.

Improve system prompt

Finally, we modify the node where DeepSeek generates drawing prompts. Before instructing the LLM to generate prompts, retrieve the rewritten user historical requests and the last generated drawing prompt from the Context, and merge them into the system prompt.

At the very first execution of the workflow, the user's historical requests and the last generated prompt in the Context are empty. Still, it doesn't matter because the latest user requests are always provided to the LLM with role as user message.

Thus, the complete business process diagram is designed, as shown in the figure below.

Next, we can start coding according to the design of the business process diagram.

Develop Your Drawing Workflow Using LlamaIndex Workflow

In today's project, in addition to using LlamaIndex Workflow to build the drawing workflow, I will also utilize Chainlit to create an interactive interface, facilitating interaction with the workflow application.

The dialogue interface is shown in the figure below:

Next, let's start writing the actual code logic.

Overall project structure

The code structure of this project is as follows:

project_source_code
  |-app.py
  |-ctx_manager.py
  |-events.py
  |-prompts.py
  |-workflow.py

app.py is the entry file of the application, used to store Chainlit code.

workflow.py contains the core code logic implementation of LlamaIndex Workflow.

prompts.py stores all prompts that will be provided to the LLM.

events.py contains LlamaIndex Workflow related event definitions.

ctx_manager.py stores all operations logic for Workflow Context.

Environment variables

In this project, since we need to use two sets of LLMs, DeepSeek and DALL-E-3, we need to prepare two sets of environment variables in the .env file:

OPENAI_API_KEY=<Your DeepSeek API key>
OPENAI_API_BASE=<DeepSeek endpoint>
REAL_OPENAI_API_KEY=<Your OpenAI API key>
REAL_OPENAI_BASE_URL=<OpenAI endpoint>

According to my habits, OPENAI_API_KEY and OPENAI_API_BASE still point to the DeepSeek service. And I use REAL_OPENAI_API_KEY and REAL_OPENAI_BASE_URL to point to OpenAI's service. You can adjust according to your own habits.

Define several events needed by workflow

LlamaIndex Workflow is based on an event-driven architecture, so transitions between code nodes need to define corresponding events. These event definitions are all placed in events.py.

But before writing the core code of the Workflow, directly explaining Event definitions might make you feel a bit lost. Don't worry, because I've already explained the functions of various workflow nodes in detail earlier, so you can still understand where each event is used.

GenPromptEvent, this event will drive the DeepSeek code node to start generating drawing prompts. The content attribute stores the user's latest input.

class GenPromptEvent(Event):
    content: str

PromptGeneratedEvent, when the drawing prompt is generated, this event will be thrown. The content contains the prompt generated by DeepSeek. Downstream nodes can subscribe to this event for prompt translation, image generation, and user request rewriting work.

class PromptGeneratedEvent(Event):
    content: str

StreamEvent, since I want to display the prompt content on chainlit through streaming output, I need a StreamEvent. The chainlit code can iterate over this event to get streaming messages.

class StreamEvent(Event):
    target: str
    delta: str

GenImageEvent, this event will drive the DALL-E-3 node to start generating images. The content attribute still contains the prompt generated by DeepSeek.

class GenImageEvent(Event):
    content: str

RewriteQueryEvent, this Event will call the LLM node to rewrite the user's historical input into a complete drawing intention. Since historical input is taken from Context, this event has no attributes.

class RewriteQueryEvent(Event):
    pass

Implement workflow code logic

After defining all the Workflow events, next we can implement the Workflow code according to the previously drawn business process diagram.

In workflow.py, we define a class named ImageGeneration as a subclass of LlamaIndex Workflow.

In the init method, we need to initialize two LLM clients. One is used to generate drawing prompts with the DeepSeek model, using LlamaIndex's OpenAILike client. The other is the DALL-E-3 model, using OpenAI's client directly.

class ImageGeneration(Workflow):
    def __init__(self, *args, **kwargs):
        self.deepseek_client = OpenAILike(
            model="deepseek-chat",
            is_chat_model=True,
            is_function_calling_model=True
        )
        self.openai_client = AsyncOpenAI(
            api_key=os.getenv("REAL_OPENAI_API_KEY"),
            base_url=os.getenv("REAL_OPENAI_BASE_URL"),
        )

        super().__init__(*args, **kwargs)

The on_start method is the entry method of the workflow, it only does conditional branch judgment. If the user input contains "APPROVE", it throws the GenImageEvent to start drawing pictures. Otherwise, it throws the GenPromptEvent to start generating drawing prompts or adjust existing prompts.

class ImageGeneration(Workflow):
    ...

    @step
    async def on_start(self, ctx: Context, ev: StartEvent) -> GenImageEvent | GenPromptEvent:
        query = ev.query
        if len(query) > 0 and ("APPROVE" in query.upper()):
            return GenImageEvent(content=query)
        else:
            return GenPromptEvent(content=ev.query)

The prompt_generator method subscribes to the GenPromptEvent event, used to generate or adjust drawing prompts.

class ImageGeneration(Workflow):
    ...
    @step
    async def prompt_generator(self, ctx: Context, ev: GenPromptEvent) \
            -> PromptGeneratedEvent | RewriteQueryEvent | None:
        user_query = ev.content
        hist_query = await ctx_mgr.get_rewritten_hist(ctx)
        hist_prompt = await ctx_mgr.get_image_prompt(ctx)
        system_prompt = PROMPT_GENERATE_SYSTEM.format(
            hist_query=hist_query,
            hist_prompt=hist_prompt
        )
        messages = [
            ChatMessage(role="system", content=system_prompt),
            ChatMessage(role="user", content=user_query)
        ]
        image_prompt = ""
        events = await self.deepseek_client.astream_chat(messages)
        async for event in events:
            ctx.write_event_to_stream(StreamEvent(target="prompt", delta=event.delta))
            image_prompt += event.delta

        await ctx_mgr.add_query_hist(ctx, user_query)
        await ctx_mgr.set_image_prompt(ctx, image_prompt)
        ctx.send_event(PromptGeneratedEvent(content=image_prompt))
        ctx.send_event(RewriteQueryEvent())

The prompt_generator method first gets the rewritten user input historical intentions and the prompt generated in the last round of workflow from the Context, and integrates them with the predefined system prompt template.

After integration, the system prompt will be submitted to the DeepSeek model together with the user's latest input to generate the latest drawing prompt.

I called the llm client's streaming api, so I use StreamEvent to throw the messages returned by the large model into the stream of Context. At the same time, I will piece together all messages into a complete prompt and pass it down.

Of course, after the prompt is generated, I will write back the user's latest input and the generated prompt into Context.

The translate_prompt method is optional. Its function is to translate the prompts generated by DeepSeek so that you can accurately understand the content of the prompt. The translated results will only be displayed on the interface, so they won't be written into Context.

class ImageGeneration(Workflow):
    ...
    @step
    async def translate_prompt(self, ctx: Context, ev: PromptGeneratedEvent) -> StopEvent:
        image_prompt = ev.content
        messages = [
            ChatMessage(role="system", content=PROMPT_TRANSLATE_SYSTEM),
            ChatMessage(role="user", content=image_prompt)
        ]
        events = await self.deepseek_client.astream_chat(messages)
        translate_result = ""
        async for event in events:
            ctx.write_event_to_stream(StreamEvent(target="translate", delta=event.delta))
            translate_result += event.delta
        return StopEvent(target="prompt", result=translate_result)

The translate_prompt method will return StopEvent, marking the end of this workflow execution. If you don't need to translate the drawing prompt, you can return StopEvent directly in the prompt_generate method to end the workflow.

If you include the keyword "APPROVE" in your input, the workflow will directly enter the generate_image method. This method will get the latest drawing prompt from Context and call the _image_generate method to start generating images.

class ImageGeneration(Workflow):
    ...
    @step
    async def generate_image(self, ctx: Context, ev: GenImageEvent) -> StopEvent:
        prompt = await ctx_mgr.get_image_prompt(ctx)
        image_url, revised_prompt = await self._image_generate(prompt=prompt)
        return StopEvent(target="image", 
                         result={
                             "image_url": image_url, 
                             "revised_prompt": revised_prompt
                         }
                         )

We have already used DeepSeek to generate good drawing prompts in advance, but DALL-E-3 will still rewrite the input prompt out of safety considerations. This may cause the drawn image to deviate too far from our intention. So we can add the following content in front of the drawing prompt to prevent DALL-E-3 from rewriting the prompt:

class ImageGeneration(Workflow):
    ...
    async def _image_generate(self, prompt: str) -> tuple[str, str]:
        ## Stop DALL-E 3 from rewriting incoming prompts
        final_prompt = f"""
        I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:
        {prompt}
        """
        response = await self.openai_client.images.generate(
            model="dall-e-3",
            prompt=final_prompt,
            n=1,
            size="1792x1024",
            quality="hd",
            style="vivid",
        )
        return response.data[0].url, response.data[0].revised_prompt

Finally, we can get the generated image url from the DALL-E-3 model. Meanwhile, we can also obtain the revised_prompt actually used by DALL-E-3 to draw the image, which helps us compare it with our own prompt to see the difference.

We also need to implement a rewrite_query method. This method will take the list of user historical inputs from Context, then submit it to DeepSeek to rewrite into a complete drawing intention for use when adjusting the drawing prompt or the generated image next time.

class ImageGeneration(Workflow):
    ...
    @step
    async def rewrite_query(self, ctx: Context, ev: RewriteQueryEvent) -> None:
        query_hist_str = await ctx_mgr.get_query_hist(ctx)
        messages = [
            ChatMessage(role="system", content=PROMPT_REWRITE_SYSTEM),
            ChatMessage(role="user", content=query_hist_str)
        ]
        response = await self.deepseek_client.achat(messages)
        rewritten_prompt = response.message.content
        await ctx_mgr.set_rewritten_hist(ctx, rewritten_prompt)

Prompts provided to LLMs

The prompts.py file contains several system prompts provided to large models:

PROMPT_GENERATE_SYSTEM is used to prompt the DeepSeek model to generate ink style drawing prompts. This prompt has {hist_query} and {hist_prompt} placeholders to include user historical requests and the last generated drawing prompt:

PROMPT_GENERATE_SYSTEM = """
## Role
You're a visual art designer who's great at writing prompts perfect for DALL-E-3 image generation.

## Task
Based on the [image content] I give you, and considering [previous requests], rewrite the prompt to be ideal for DALL-E-3 drawing.

## Length
List 4 detailed sentences describing the prompt only - no intros or explanations.

## Context Handling
If the message includes [previous prompts], modify them based on the new info.

## Art Style
The artwork should be ink-wash style illustrations on slightly yellowed rice paper.

## Previous Requests
{hist_query}

## Previous Prompts
{hist_prompt}
"""

PROMPT_TRANSLATE_SYSTEM is used to translate the prompts generated in the previous step. The content is relatively simple:

PROMPT_TRANSLATE_SYSTEM = """
## Role
You're a professional translator in the AI field, great at turning English prompts into accurate Chinese.

## Task
Translate the [original prompt] I give you into Chinese.

## Requirements
Only provide the Chinese translation, no intros or explanations.

-----------
Original prompt:
"""

PROMPT_REWRITE_SYSTEM is used to rewrite user historical requests.

PROMPT_REWRITE_SYSTEM = """
You're a conversation history rewrite assistant.

I'll give you a list of requests describing a scene, and you'll rewrite them into one complete sentence. 
Keep the same description of the scene, and don't add anything not in the original list.
"""

Context operation module

Since LlamaIndex Workflow version 1.0 adjusted the api for Context, we need to access Context.store to read and save status data. Therefore, I specifically wrote a Context operation module ctx_manager.py.

The set_image_prompt and get_image_prompt methods are used to store and retrieve the prompts generated by DeepSeek for drawing.

async def set_image_prompt(ctx: Context, image_prompt: str) -> None:
    await ctx.store.set("image_prompt", image_prompt)


async def get_image_prompt(ctx: Context) -> str:
    image_prompt = await ctx.store.get("image_prompt", "")
    return image_prompt

The add_query_hist method will add all user historical requests into a list container in Context. The get_query_hist method will take out the historical requests from the container and concatenate them into a string.

async def add_query_hist(ctx: Context, user_query: str) -> None:
    query_hist = await ctx.store.get("query_hist", [])
    query_hist.append(user_query)
    await ctx.store.set("query_hist", query_hist)


async def get_query_hist(ctx: Context) -> str:
    query_hist = await ctx.store.get("query_hist", [])
    query_hist_str = "; ".join(query_hist)
    return query_hist_str

The set_rewritten_hist and get_rewritten_hist methods are used to store and retrieve the rewritten user drawing intentions.

async def set_rewritten_hist(ctx: Context, rewritten_hist: str) -> None:
    await ctx.store.set("rewritten_hist", rewritten_hist)


async def get_rewritten_hist(ctx: Context) -> str:
    rewritten_prompt = await ctx.store.get("rewritten_hist", "")
    return rewritten_prompt

Use Chainlit to make user dialogue interface

Chainlit uses lifecycle management to organize code. Among them, the on_chat_start method annotated with @cl.on_chat_start will be called when the user starts a conversation. The main method annotated with @cl.on_message is used to respond to the user's single conversation.

Since we need to share Context between the user's multiple rounds of dialogues, we need to initialize Workflow and Context in the on_chat_start method and store them in user_session. This way, they can be reused in the main method.

@cl.on_chat_start
async def on_chat_start():
    workflow = ImageGeneration(timeout=300)
    context = Context(workflow)
    cl.user_session.set("context", context)
    cl.user_session.set("workflow", workflow)

In the main method, besides getting workflow and context, we also need to initialize a cl.Message instance. This instance will keep updating content as it gets messages from workflow. Also, if the workflow returns an image generation event, it will be displayed through this Message instance.

@cl.on_message
async def main(message: cl.Message):
    workflow = cl.user_session.get("workflow")
    context = cl.user_session.get("context")
    msg = cl.Message(content="Generating...")
    await msg.send()
    ...

We need to display streaming messages returned from both prompt_generator and translate_prompt nodes in the same Message instance. Therefore, we can use prompt_result and translate_result to concatenate the current round of messages and update them together with a template.

@cl.on_message
async def main(message: cl.Message):
    ...
    prompt_result = ""
    translate_result = ""

    handler = workflow.run(query=message.content, ctx=context)
    async for event in handler.stream_events():
        if isinstance(event, StreamEvent):
            # # await msg.stream_token(event.delta)
            match event.target:
                case "prompt":
                    prompt_result += event.delta
                case "translate":
                    translate_result += event.delta
            msg.content = dedent(f"""
### Prompt\n
{prompt_result}

### Translate
{translate_result}

APPROVE?
            """)
            await msg.update()
        ...

    await handler

If the message returned by the workflow is an image drawing message, we can use a cl.Image to display the image content.

@cl.on_message
async def main(message: cl.Message):
    ...
        if isinstance(event, StopEvent) and event.target == "image":
            image = cl.Image(url=event.result["image_url"], name="image1", display="inline")
            msg.content = f"Revised prompt: \n{event.result['revised_prompt']}"
            msg.elements = [image]
            await msg.update()

Meanwhile, we can also show the actual revised_prompt used by DALL-E-3 to generate the image in the Message, making it convenient for us to compare the accuracy of the image.

Check the running effect of the workflow

Thus far, all project codes have been developed.

We can start the Chainlit app through the command line to begin interacting with the workflow:

chainlit run app.py

Enter a drawing intention to see the prompt returned by the workflow:

If not satisfied, you can supplement details to make adjustments:

You can see that since we adjusted the prompt used for drawing before the image generation started, this saves a lot of expensive drawing tokens.

Of course, this workflow still supports you to continue adjusting the prompt after the image is generated:

Conclusion

The debate about whether to use low-code workflow platforms like dify and n8n to build agent workflows or use coding frameworks to develop agent applications has always existed.

Fortunately, the emergence of LlamaIndex Workflow gives us a third option. Its low-level abstraction and simple API make it convenient to develop a production-ready agent workflow, and it can also be customized for various details according to enterprise-level application needs.

In today's article, we demonstrated this capability by creating a customized workflow for generating ink painting style images.

At the same time, this article itself helps you achieve a beautiful project. Through practicing this project, I have shared several tips for developing agent workflows.

These tips condense our experience and insights gained during this period of enterprise-level workflow development. Hopefully, it can help with your multi-agent application development.

Thank you for reading. I am collecting agent development ideas. If you need help with agent application development, you can leave me a message.

Enjoyed this read? Subscribe now to get more cutting-edge data science tips straight to your inbox! Your feedback and questions are welcome — let’s discuss in the comments below!

This article was originally published on Data Leads Future.

DEV Community