DEV Community

Mingming Ma
Mingming Ma

Posted on

Image Compression: Part 2

In this blog post, I continue to share some discoveries about image compression. I'm quite curious about how Google Gemini handles uploaded images. So, I compared Chatcraft and Google Gemini, using a 20MB image file. After selecting the image and before sending the prompt, I observed the Network tab in the developer tools, as shown in the following images:

1

2

You can see that in the console resources, ChatCraft shows 3.2 MB, while Google Gemini displays 21.3 MB. ChatCraft shows 3.2 MB because it utilizes image compression in browser, performed locally. While Gemini displays 21.3 MB, I believe Gemini doesn't perform local image compression but instead conducts compression on the server side. I observed that after refreshing the page, the image is downloaded through a GET request, and at this point, you can find the image has been compressed to around 77KB.

31

Next, I attempted to inspect the payload through the POST request of the prompt to understand how the image is transmitted. But I spent much amount of time and still couldn't figure it out. I was able to identify the text part in the payload, but the subsequent string is encoded in a way I couldn't understand. It might involve an authentication token. You can see that the payload also automatically includes the filename of the image. I wonder if Gemini uses it to assist in generating responses.

3

If I make any new discoveries, I will update this blog post. Of course, if you know the meaning of the payload, please let me know. I would greatly appreciate it. Thank you for reading.

Top comments (2)

Collapse
 
jagadesh_padimala_3960c8c profile image
Jagadesh Padimala

This comparison is surface-level and draws the wrong conclusion. A few issues:

  1. You're comparing apples to oranges. ChatCraft compresses before upload because it's a lightweight chat wrapper — it has no server-side GPU infrastructure. Gemini uploads raw because Google wants the original pixels for their vision model. Compression before feeding an LLM would degrade the model's ability to analyze fine detail. These aren't two "approaches to image compression" — they're two completely different architectures with different goals.
  2. The 21.3 MB vs 3.2 MB comparison is misleading. You're comparing the upload payload size in the Network tab, but ChatCraft's 3.2 MB includes a lossy JPEG re-encode at reduced quality. That's not "smart compression" — that's just canvas.toBlob('image/jpeg', 0.7). Any dev can do that in 3 lines. The real question is: what's the quality loss? You never measured SSIM or PSNR between the original and compressed image. Without that, "it's smaller" means nothing.
  3. "The image has been compressed to 77KB after refreshing" — that's not compression of your upload. That's Google serving a thumbnail for the chat UI via their CDN image pipeline (probably cloud.google.com/cdn/docs/overview). The original image is still stored at full resolution on their servers for model inference. You're looking at a display-optimized variant, not the "compressed" version of your file.
  4. The payload encoding you couldn't decipher is almost certainly base64-encoded protobuf, which is Google's standard wire format for all their APIs. You can decode it with atob() + a protobuf-decoder.netlify.app/. The filename metadata is included because Gemini uses it as context for the model prompt — it literally influences the response.

If you actually want to understand browser-based image compression, look at how WebAssembly tools handle it.
mioffice.ai runs MozJPEG and OptiPNG compiled to WASM — real codec-level compression with configurable
quality, entirely client-side, zero server upload. That's actual image compression engineering, not canvas.toBlob().

Collapse
 
varshneymehul profile image
varshneymehul

Thank you for the insight. Helped.