DEV Community

Cover image for Introducing Google ADK Artifacts for Multi-Modal File Handling (a Rickbot Blog)
derailed-dash for Google Developer Experts

Posted on • Originally published at Medium on

Introducing Google ADK Artifacts for Multi-Modal File Handling (a Rickbot Blog)

Hello and welcome to another installment of Rickbot! You don’t need to have read the rest of the series to benefit from this latest article.

Today I’m going to show you how to make use of Google Agent Development Kit (ADK) Artifacts — a very specific way of handling files and binary data. The ADK documentation describes an artifact as:

“a piece of binary data (like the content of a file) identified by a unique filename string within a specific scope (session or user).”

Artifacts are intended for use cases such as:

  • Non-textual data, like images and videos
  • Working with reports
  • Storing intermediate data (including between agents in multi-agent applications)
  • Any large data
  • User file management, e.g. such as uploading and downloading
  • Caching binary content

I’ll show you how to go from traditional file uploading and “context dumping”, to the vastly more efficient use of artifacts. And more importantly: I’ll explain WHY you should do this.

The Rickbot Series — Where We Are

But just for orientation, here’s where we are in the series:

  1. Creating a Rick & Morty Chatbot with Google Cloud and the Gen AI SDK
  2. Adding Authentication and Authorisation to our Rickbot Streamlit Chatbot with OAuth and the Google Auth Platform
  3. Building the Rickbot Multi-Personality Agentic Application using Gemini CLI, Google Agent-Starter-Pack and the Agent Development Kit (ADK)
  4. Updating the Rickbot Multi-Personality Agentic Application — Integrate Agent Development Kit (ADK) using Gemini CLI
  5. Guided Implementation of Agent Development Kit (ADK) with the Rickbot Multi-Personality Application (Series)
  6. Productionising the Rickbot ADK Application and More Gemini CLI Tips
  7. Get Schwifty with the FastAPI: Adding a FastAPI to our Agentic Application
  8. Introducting ADK Artifacts for Multi-Modal File Handling (a Rickbot Blog)  — YOU ARE HERE
  9. Using Gemini File Search Tool for RAG (a Rickbot Blog)

Motivation

Let’s talk about growing up.

When I first dreamt up the Rickbot project — a “multi-persona” chatbot where I can argue with Rick Sanchez about quantum mechanics or ask Yoda for mindfulness tips — I wanted to build and iterate quickly. The project was all about experimenting with the Google agentic ecosystem. I needed a UI that would let me experiment and immediately see my changes visually. And, because I’m okay with Python, I naturally reached for Streamlit.

But that was the start. As this series has progressed, we’ve turned Rickbot into a much more robust and well-architected application. We now have:

  • The Frontend: A slick, reactive Next.js app written in TypeScript. We chose Next.js because we wanted portals, rich animations, and a UI that doesn’t refresh the entire page every time you press a button.
  • The Backend: A scalable FastAPI service written in Python. We chose this for its speed, its async capabilities, and its ability to define a rigorous API contract.
  • Framework: The Google Agent Development Kit (ADK). ADK is the skeleton that holds everything together, and allows us to leverage Google Gemini as the AI brain.

This upgrade brought a challenge. I wanted feature parity with the original Streamlit interface. And one feature I needed to re-introduce into the new UI is the ability to handle Multimodal File Uploads. Because: Rick needs to see your memes; he needs to debug your Python scripts; he needs to analyze your 50-page PDF reports. And this is where we faced a critical architectural decision about state.

Option 1: The Context Dump

Let’s peek under the hood of my original streamlit_fe\chat.py and see how I handled user file uploads in the Streamlit UI:

async def get_agent_response(runner: Runner, prompt: str, uploaded_files: list[Any], rate_limiter: RateLimiter) -> None:
    """
    Handles user input and generates the bot's response using the Rickbot ADK agent.
    """

    # First we do some checks (ommitted from this snippet)

    # Create the user message object, including any attachments
    # And attach to the Streamlit session
    user_message: dict[str, Any] = {"role": "user", "content": prompt}
    if uploaded_files:
        user_message["attachments"] = [
            {"data": uploaded_file.getvalue(), "mime_type": uploaded_file.type or ""} 
                     for uploaded_file in uploaded_files
        ]
    st.session_state.messages.append(user_message)

    # Display the user message and attachment in the chat
    # I.e. we want to see the uploaded file, inline (if we can)
    with st.chat_message("user", avatar=USER_AVATAR):
        if uploaded_files:
            for uploaded_file in uploaded_files:
                mime_type = uploaded_file.type or ""
                if "image" in mime_type:
                    st.image(uploaded_file.getvalue())
                elif "video" in mime_type:
                    st.video(uploaded_file.getvalue())
        st.markdown(prompt)

    # Assemble the prompt that we'll send to the model...
    message_parts = [Part(text=prompt)]
    if uploaded_files:
        for uploaded_file in uploaded_files:
            # Grab raw bytes from memory abnd attach directly to the message
            message_parts.append(Part(inline_data=Blob(data=uploaded_file.getvalue(), mime_type=uploaded_file.type)))

    new_message = Content(role="user", parts=message_parts)

    # ... Send to model using runner.run_async()
Enter fullscreen mode Exit fullscreen mode

This approach is what Google refers to — in this whitepaper — as Context Dumping. It sounds harmless, right? You have a file, you send the file. Simple. But in a production agent, “Context Dumping” is, well...

Context dumping is a trap

The Token Tax

Imagine you upload a 2MB PDF text extraction to Rickbot. In the “Context Dump” model, that 2MB of text lives in your chat history list.

  • Turn 1: You send the PDF. The model reads 100k tokens.
  • Turn 2: You ask, “What is the conclusion?” The agent sends the PDF again (100k tokens) + your question.
  • Turn 3: You ask, “And what about the budget?” The agent sends the PDF again (100k tokens) + history + question.

You are paying a Token Tax on every single interaction. This creates a number of problems:

  1. You are burning tokens, and therefore money.
  2. As the model’s context fills, it gets slower and slower at responding. So you increase your agent’s latency.
  3. You are flooding the brain’s attention mechanism with static noise. This leads to so-called “signal rot”, or the “lost-in-the-middle” problem.
  4. You eventually exhaust the size of the context window. Yes, even with Gemini’s 1 million token window, you still need to be careful!

The “Ghost” File Problem

Furthermore, where is that file? In Streamlit, it lives in the session memory. If you refresh the browser?

Annnnd it’s gone” — From South Park

It’s gone. If the model wants to refer to it later? It can’t. It’s just an unnamed blob of bytes in the history. If you want to download it later? You can’t. It has no URL.

We need a better way.

Option 2: The “Managed Artifact”

Okay, so we could have just replicated the context dump in FastAPI — taking the bytes from Next.js and shoving them into the LLM.

But here’s the better way: the ADK Artifact Pattern.

In the ADK world, huge blobs of data don’t belong in the conversation history. They belong in a dedicated store. The flow looks like this:

  1. Frontend uploads file(s) via FormData.
  2. Backend intercepts the file.
  3. Backend saves the file to the ADK Artifact Service.
  4. Backend generates a lightweight Handle (e.g. user:myfile.pdf).
  5. Agent sees the Handle.

If the Agent needs to read the file, it can load it. But it doesn’t have to carry it around like a backpack full of bricks.

Context dumping is like carrying a backpack full of bricks (made with Nano Banana and Veo)

The Architecture of Statelessness

Before I show you the code, we need to understand the architecture.

In Streamlit, the UI is the backend. They share memory. In our new world, the UI (Next.js) and the API (FastAPI) are completely separate. They might as well be on different continents. They speak only via HTTP.

This means every interaction is Stateless.

  • The UI doesn’t know about the Agent’s memory.
  • The API doesn’t know about the UI’s DOM state.
  • The Agent doesn’t know about the HTTP request.

So where does the State live? It lives in the ADK Services.

  • Session State: Lives in the SessionService.
  • File State: Lives in the ArtifactService.

When a request comes in, we re-hydrate the state, run the logic, and then persist the state back to the service. This is the secret sauce that allows us to scale.

The Implementation

The Frontend (Next.js + Typescript)

We built the frontend using Next.js and React. This gave us the power of the extensive React ecosystem. Specifically, for file uploads, we didn’t use any heavy libraries. We went back to the reliable workhorse of the web: multipart/form-data.

In src/nextjs_fe/components/Chat.tsx (our TypeScript component), we manage the user’s selected files in local React state. When they hit send, we package it up:

const handleSendMessage = async () => {
    // Some preamble I'm skipping here

    // 1. Create a FormData object
    // This is a native Browser API.
    const formData = new FormData();
    formData.append('prompt', newMessage.text);
    formData.append('personality', selectedPersonality.name);
    if (sessionId) {
        formData.append('session_id', sessionId);
    }

    // 2. Append the native File objects
    // React state holds the array of native JS File objects
    if (newMessage.attachments) {
        newMessage.attachments.forEach(f => {
            formData.append('files', f); 
        });
    }

    // 3. Send it to FastAPI using streaming endpoint
    // Note: We do NOT set Content-Type header manually
    // The browser automatically sets it to multipart/form-data with the correct boundary.
    const response = await fetch(`${API_URL}/chat_stream`, {
        method: 'POST',
        headers: { Authorization: `Bearer ${token}` },
        body: formData, 
    });

    // Handle the response - omitted here
}
Enter fullscreen mode Exit fullscreen mode

The Backend (Artifact Handoff)

On the server side (src/main.py) FastAPI makes receiving these files trivial using UploadFile. But the real magic is how we hand them off to the ADK.

We don’t just dump these files into a temporary directory. We use the ADK Artifact Service. In src/rickbot_agent/services.py we initialize our service:

from google.adk.artifacts import InMemoryArtifactService

@cache
def get_artifact_service():
    # This creates a singleton service
    # Currently In-Memory, but swapabble with a persistent implementation
    return InMemoryArtifactService()
Enter fullscreen mode Exit fullscreen mode

Then, in our API logic (in src/main.py) we process the upload:

async def _process_files(files: list[UploadFile],
                        user_id: str,
                        session_id: str,
                        artifact_service) -> list[Part]:
    """Helper function to process uploaded files."""

    parts = []
    if files:
        for f in files:
            file_content = await f.read()
            # Assign a handle to this file using the 'user' scope prefix
            artifact_filename = f"user:{f.filename}"

            # Create the part
            artifact_part = Part.from_bytes(
                data=file_content, 
                mime_type=f.content_type)

            # Ask the Librarian (Service) to store it
            # The 'f.content_type' as the source of truth for MIME type
            await artifact_service.save_artifact(
                app_name="rickbot_api",
                user_id=user_id,
                session_id=session_id, 
                filename=artifact_filename,
                artifact=artifact_part,
            )
            parts.append(artifact_part)

    return parts
Enter fullscreen mode Exit fullscreen mode

Note that by saving the artifact with the “user” prefix, the artifact is associated with this particular user. It can be accessed or updated from any session belonging to that user.

But… Why?

You might look at the code above and think: “Dazbo, that is 3 files and 50 lines of code. The Streamlit version was about 10 lines. Why is this better?”

Because of these advantages:

1. No Token Tax

This is a concept from Context Engineering. Because the artifact is stored in a Service, we don’t have to show it to the model every time. We can pass a Reference. The Agent sees: [User uploaded: user:budget.pdf]. If — and only if — the user asks a question about the budget, the Agent uses a tool to load that artifact into its working context. Once the query is answered, the artifact is flushed from the context. We turn a permanent 100k token tax into a temporary, on-demand cost. This is how you make an agent affordable at scale.

2. Addressability — Solving the Ghost Problem

Because the ArtifactService owns the file, we can expose it. We added a simple endpoint to our API:

@app.get("/artifacts/{filename}")
async def get_artifact(filename: str, user: AuthUser = Depends(verify_token)):
    """Retrieves a saved artifact for the user."""

    user_id = user.email
    artifact_filename = f"user:{filename}"

    # The API delegates to the Service
    # Note: We enforce Auth here! So you can't just guess the URL.
    artifact = await artifact_service.load_artifact(
        app_name=APP_NAME,
        user_id=user_id,
        session_id="none", # Ignored for user: scope
        filename=artifact_filename,
    )
    if not artifact or not artifact.inline_data:
        raise HTTPException(status_code=404, detail="Artifact not found")

    return Response(content=artifact.inline_data.data, media_type=artifact.inline_data.mime_type)
Enter fullscreen mode Exit fullscreen mode

Now, the file is a first-class citizen of the web.

  • The Agent can generate HTML that links to the image.
  • The user can download their history.
  • The UI can render the image directly from the server, rather than relying on local blob URLs.
  • Other tools can fetch the file via URL.

3. Persistence Abstraction — Future Proofing

This is a killer feature. Right now, InMemoryArtifactService stores files in memory. If I restart the backend server, Rick forgets your memes. This is fine for Dev, but bad for Prod. But let's say tomorrow I want to deploy this to production on Google Cloud Run. I can't rely on local memory if I want artifacts to persist beyond a single session.

A much better option is to persist those artifacts somewhere durable: enter Google Cloud Storage (GCS). If I wanted to implement GCS-backed storage in the Streamlit implementation (without using ADK Artifacts) I would have to rewrite my entire file handling logic. I’d need google-cloud-storage libraries, I'd need authentication handling, and so on.

But with ADK Artifacts I only need change one line of code in src/rickbot_agent/services.py to use a different Artifact Service implementation:

from google.adk.artifacts import GCSArtifactService

@cache
def get_artifact_service():
    # ONE LINE CHANGE!
    # The rest of my application (API, Agent, Frontend) has NO IDEA this changed.
    return GcsArtifactService (bucket_name="rickbot-production-uploads")
Enter fullscreen mode Exit fullscreen mode

Now each version of any created artifact is stored as a GCS object which persists across sessions and application restarts.

We can go a step further, and select the appropriate artifact service based on environment configuration. For example:

@cache
def get_artifact_service():
    """Initialise and return the artifact service. Use GcsArtifactService if bucket is set."""

    if config.artifact_bucket:
        return GcsArtifactService(config.artifact_bucket)

    return InMemoryArtifactService()
Enter fullscreen mode Exit fullscreen mode

This is Portability.

Important note: how long your artifacts are retained for depends entirely on your GCS bucket configuration. And by default, GCS buckets are configured with no retention limit. So always enforce a lifecycle policy when you create your buckets!

Let’s Try It

Okay, enough code snippets! Let’s see it working in the new Rickbot UI. Here I attach a couple of photos to the chat, and ask the Dazbot personality what he thinks of them.

Uploading images as ADK Artifacts in Rickbot

Looking good!

Conclusions

For a weekend hackathon, st.file_uploader is still King.

But “context dumping” is technical debt. It works for a 2-turn demo, but it fails for a 100-turn session. It fails when you need to work with big files. It fails when you need to audit your data.

By switching to ADK Artifacts , we acknowledged that files belong in some sort of persistence store, not clanking around in your prompt window like loose change in a washing machine.

With ADK and ADK Artifacts, we built a system that is:

  1. Stateless: The frontend and backend are decoupled.
  2. Addressable: Every file has a handle.
  3. Efficient: We can load/unload data on demand.
  4. Portable: We can switch storage backends in one line.

This is the way!

This is the Way

If You Wish To Support Me As A Creator

  • Please share this with anyone that you think will be interested. It might help them, and it really helps me!
  • Please give me many claps! (Just hold down the clap button.)
  • Feel free to leave a comment 💬.
  • Follow and subscribe on Medium.

Useful Links and References

ADK

Context Engineering

Rickbot-ADK

Other


Top comments (0)