DEV Community

Discussion on: Rise of Local LLMs ?

Collapse
 
isslerman profile image
Marcos Issler

I have used llava to OCR some image documents, but without success. Sometimes works, sometimes says that is not possible, low resolution and others answers. Someone know another model that can do OCR image-to-text of the text tha has in the image, like a letter ? Tks!

Collapse
 
sarthology profile image
Sarthak Sharma

There is a way(hack) by which you can download any huggingface’s open-source image to text model locally.

youtu.be/fnvZJU5Fj3Q?si=YiiHdwRw90...

Additionally, you can also upscale an image using a different AI model and then feed it to llava for better results.

I’m sure the llava model will get better with time.

Collapse
 
isslerman profile image
Marcos Issler

Thanks Sarthak, I will try it!

Collapse
 
ranjancse profile image
Ranjan Dailata

@isslerman Please do not rely on the traditional LLMs for the OCR purposes. They are not good at it. You should be using the OCR Providers for ex: OCR API

Collapse
 
isslerman profile image
Marcos Issler

Thanks Ranjan, I will take a look it. The project itself is more than an OCR. We are using LLMs for other proposes and the OCR is a step. But yes, OCR API can be a good solution too. I will take a look if there is any open source that I can use or check the providers, because the documents in this case is sensitive to be shared.
Bests, Marcos Issler

Thread Thread
 
ranjancse profile image
Ranjan Dailata • Edited

Sure, it's understandable. There are always pros and cons when it comes to the open source options. Please do remember about the accuracy issues with the open source. However, Since you have mentioned the sensitive document OCR, I would highly recommend you to consider the public cloud options such as AWS, Azure. They provide you the ultimate solution, and they are also GDPR and SOC 2 complaint. Generally, we need to keep several things in mind. Cost, Accuracy, Security, Reliability, Scalability etc. If you can do something with the in-house open source and think that works the best, please proceed with the same. However, it's good to experiment and see the best options.