In this blog post, I continue to share some discoveries about image compression. I'm quite curious about how Google Gemini handles uploaded images. So, I compared Chatcraft and Google Gemini, using a 20MB image file. After selecting the image and before sending the prompt, I observed the Network tab in the developer tools, as shown in the following images:
You can see that in the console resources, ChatCraft shows 3.2 MB, while Google Gemini displays 21.3 MB. ChatCraft shows 3.2 MB because it utilizes image compression in browser, performed locally. While Gemini displays 21.3 MB, I believe Gemini doesn't perform local image compression but instead conducts compression on the server side. I observed that after refreshing the page, the image is downloaded through a GET request, and at this point, you can find the image has been compressed to around 77KB.
Next, I attempted to inspect the payload through the POST request of the prompt to understand how the image is transmitted. But I spent much amount of time and still couldn't figure it out. I was able to identify the text part in the payload, but the subsequent string is encoded in a way I couldn't understand. It might involve an authentication token. You can see that the payload also automatically includes the filename of the image. I wonder if Gemini uses it to assist in generating responses.
If I make any new discoveries, I will update this blog post. Of course, if you know the meaning of the payload, please let me know. I would greatly appreciate it. Thank you for reading.
Top comments (0)