DEV Community

Ollama-OCR for High-Precision OCR with Ollama

Bytefer on November 25, 2024

Llama 3.2-Vision is a multimodal large language model available in 11B and 90B sizes, capable of processing both text and image inputs to generate ...
Collapse
 
ogodo_olutayo_c2029681701 profile image
ogodo olutayo

Can you please what the speed is like for a image-based pdf with over 60 pages?

Collapse
 
bytefer profile image
Bytefer

Recognition speed depends on the performance of your current device.

Collapse
 
novpiseth profile image
Nov Piseth

how to add other language like asia language in UTF8. thanks

Collapse
 
bytefer profile image
Bytefer

You can learn about MiniCPM-V

Collapse
 
novpiseth profile image
Nov Piseth

thanks you so much let me try.

Thread Thread
 
bytefer profile image
Bytefer

You're welcome, for higher precision OCR scenarios, macOS platform, interested to try macos-vision-ocr.

Collapse
 
delivery profile image
Danny

Possible to use 90B model?

Collapse
 
bytefer profile image
Bytefer

ollama run llama3.2-vision:90b
ollama.com/library/llama3.2-vision...

Collapse
 
akshayballal profile image
Akshay Ballal

Does it hallucinate with large tables with lot of numbers ?

Collapse
 
bytefer profile image
Bytefer

You can test the recognition of the Llama-3.2-90B-Vision at the llamaocr website.

Collapse
 
raeemenon profile image
Ramya Menon

Damn needed this.

Collapse
 
bytefer profile image
Bytefer

Other application scenarios for Llama 3.2 Vision, if you are interested.

Collapse
 
tkc88888888 profile image
Tan KC

Can local ollama minicpm-v be used

Collapse
 
bytefer profile image
Bytefer

The minicpm-v visual model is also supported with the following code:

  const text = await ollamaOCR({
    model: "minicpm-v",
    filePath: "./trader-joes-receipt.jpg",
    systemPrompt: DEFAULT_OCR_SYSTEM_PROMPT,
  });
Enter fullscreen mode Exit fullscreen mode

minicpm-v-ocr-result