title: [Paper Review] Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases
published: false
date: 2023-12-27 00:00:00 UTC
tags:
canonical_url: http://www.evanlin.com/paper-gpt4v-vs-gemini-pro/
---
### Paper Title: Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

[https://arxiv.org/abs/2312.15011](https://arxiv.org/abs/2312.15011)
## Quick Summary
In addition to the relevant test cases from the previous Microsoft paper, this paper also adds several types of cases. tl;dr GPT-4v is more concise and accurate, but Gemini-Pro's descriptions are clearer. There are many pictures and related cases inside, which are worth reading.
## Several Interesting Cases
#### Being a Detective
Both models identified several relevant points, which are quite suitable for some side-projects. :p

### Identifying the Brand of Shoes
It's pretty impressive that it identified NIKE Air Force 1.

### Reading the First Page Image of a Paper
The results are good. If the information from arxiv cannot be extracted, this would be a method.

For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)