DEV Community

Evan Lin
Evan Lin

Posted on • Originally published at evanlin.com on

Notes on GPT-4V(ision): The Dawn of LMMs

title: [Paper Notes] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) - Notes
published: false
date: 2023-10-02 00:00:00 UTC
tags: 
canonical_url: http://www.evanlin.com/til-the-dawn-of-lmm/
---

![image-20231004092928239](http://www.evanlin.com/images/2022/image-20231004092928239.png)

(Paper: [https://arxiv.org/abs/2309.17421](https://arxiv.org/abs/2309.17421))

## Background

This paper, published by Microsoft on 09/29, preemptively tested many of GPT-4V's features. It also lists many interesting use cases, and finally used "The Dawn of LMMs" as the title of the entire paper. The paper is not short (166 pages), but it's full of application cases that make it a pleasure to read, very exciting.

### Case 1: Given a photo and a menu, it can tell you how much a can of beer should cost.

![Image](http://www.evanlin.com/images/2022/F7hPoyJaIAEWyf9.png)

### Case 2: Given an invoice (receipt), it can tell you how much tax to pay? And where?

![Image](http://www.evanlin.com/images/2022/F7hPzFxbEAABSNy.png)

### Case 3: Give it an ID, ask it to identify it and directly provide JSON

![Image](http://www.evanlin.com/images/2022/F7hQNG_aUAEx-x1.png)

### Case 4: Using different Planning (Tree of Thought) to do OCR, the results can actually be better.

![image-20231004093230728](http://www.evanlin.com/images/2022/image-20231004093230728.png)

### Case 5: Calculating basic mathematical graph problems, it seems to be able to solve the problems of Twitter users

![Image](http://www.evanlin.com/images/2022/F7hTV5oaEAAm4QP.png)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)