DEV Community

Ollama-OCR for High-Precision OCR with Ollama

Bytefer on November 25, 2024

Llama 3.2-Vision is a multimodal large language model available in 11B and 90B sizes, capable of processing both text and image inputs to generate ...

Read full post

Laxman • Mar 4

Llama 3.2-Vision mentioned in the article is indeed very powerful, especially in multimodal tasks like high-precision OCR, which is truly impressive! However, the command-line operations in Ollama can be quite cumbersome. A more intuitive user interface might be more user-friendly. Does anyone have any suggestions?

Danny • Nov 28 '24

Possible to use 90B model?

Bytefer • Nov 28 '24

ollama run llama3.2-vision:90b
ollama.com/library/llama3.2-vision...

Akshay Ballal • Nov 27 '24

Does it hallucinate with large tables with lot of numbers ?

Bytefer • Nov 27 '24

You can test the recognition of the Llama-3.2-90B-Vision at the llamaocr website.

Nov Piseth • Nov 28 '24

how to add other language like asia language in UTF8. thanks

Bytefer • Nov 28 '24

You can learn about MiniCPM-V

Nov Piseth • Nov 28 '24

thanks you so much let me try.

Bytefer • Nov 29 '24

You're welcome, for higher precision OCR scenarios, macOS platform, interested to try macos-vision-ocr.

ogodo olutayo • Nov 27 '24

Can you please what the speed is like for a image-based pdf with over 60 pages?

Bytefer • Nov 27 '24

Recognition speed depends on the performance of your current device.

Ramya Menon • Nov 27 '24

Damn needed this.

Bytefer • Nov 27 '24

Other application scenarios for Llama 3.2 Vision, if you are interested.

jitendra mohite • Feb 27

any update on when PDF OCR would be GA with this.

Tan KC • Nov 27 '24

Can local ollama minicpm-v be used

Bytefer • Nov 27 '24

The minicpm-v visual model is also supported with the following code:

  const text = await ollamaOCR({
    model: "minicpm-v",
    filePath: "./trader-joes-receipt.jpg",
    systemPrompt: DEFAULT_OCR_SYSTEM_PROMPT,
  });