Llama 3.2-Vision is a multimodal large language model available in 11B and 90B sizes, capable of processing both text and image inputs to generate ...
For further actions, you may consider blocking this person and/or reporting abuse
Can you please what the speed is like for a image-based pdf with over 60 pages?
Recognition speed depends on the performance of your current device.
how to add other language like asia language in UTF8. thanks
You can learn about MiniCPM-V
thanks you so much let me try.
You're welcome, for higher precision OCR scenarios, macOS platform, interested to try macos-vision-ocr.
Possible to use 90B model?
ollama run llama3.2-vision:90b
ollama.com/library/llama3.2-vision...
Does it hallucinate with large tables with lot of numbers ?
You can test the recognition of the Llama-3.2-90B-Vision at the llamaocr website.
Damn needed this.
Other application scenarios for Llama 3.2 Vision, if you are interested.
Can local ollama minicpm-v be used
The minicpm-v visual model is also supported with the following code: