Llama 3.2-Vision is a multimodal large language model available in 11B and 90B sizes, capable of processing both text and image inputs to generate ...
For further actions, you may consider blocking this person and/or reporting abuse
Llama 3.2-Vision mentioned in the article is indeed very powerful, especially in multimodal tasks like high-precision OCR, which is truly impressive! However, the command-line operations in Ollama can be quite cumbersome. A more intuitive user interface might be more user-friendly. Does anyone have any suggestions?
Possible to use 90B model?
ollama run llama3.2-vision:90b
ollama.com/library/llama3.2-vision...
Does it hallucinate with large tables with lot of numbers ?
You can test the recognition of the Llama-3.2-90B-Vision at the llamaocr website.
how to add other language like asia language in UTF8. thanks
You can learn about MiniCPM-V
thanks you so much let me try.
You're welcome, for higher precision OCR scenarios, macOS platform, interested to try macos-vision-ocr.
Can you please what the speed is like for a image-based pdf with over 60 pages?
Recognition speed depends on the performance of your current device.
Damn needed this.
Other application scenarios for Llama 3.2 Vision, if you are interested.
any update on when PDF OCR would be GA with this.
Can local ollama minicpm-v be used
The minicpm-v visual model is also supported with the following code: