Hi everyone!
I've been working on a document processing API suite that solves a few problems I encountered in my career..
LLMs are great to provide answers on documents but not so great when you need bounding boxes for them.
OCRs give you bounding boxes but they don't understand context.
Mixing them both is a pain.
So we are using vision models to solve this. The result is Ninjadoc, an easy to use api that answers your questions with geometry data for the evidence it found.
We have a few more products lined up down the line but for now this is what you get:
Endpoints to ask a single question in natural language (with/without coordinates)
A dashboard to define a collection of questions
Ai enhanced Markdown transforms (our techniques are great for these)
More improvements will come down the line but it's looking good and I wish I had something like this before.
Let me know what you think, any feedback is appreciated.
Thanks!
Top comments (0)