I originally built an AI pdf sandbox that works with financial reports. Experienced a bunch of issues handling PDFs as privately as I can, ie. minimizing sending data over the network, user data isolation in the multi step process that it takes: extract PDF, put them in a sandbox with agents, and iterate over eval.
I moved the stack to cloudflare agents, and noticed the more colocated and secure I make the pieces around PDFs, the faster the whole UX became because the core works over bindings...So I am making an SDK that solves most of the app problems I've experienced myself
PER document, you can set
- different access token and permission
- different parsing strategy based on long-tail PDF complexity
- /api/chat/completions per document without deployment
- /ingest from providers like Google OCR, Llamaparse, Unstructured via http
- generate PDF previews on Edge
https://api.okrapdf.com/document/[docId]/...
Completion: /chat/completions
Statuses: /status
Pages: /pages
Entities: /nodes
Export: /export
Assets:
Concatenated md: /full.md
By page: /pg_1.{md,png}
Resized: /w_200,h_300/pg_1.png
Loading state: /d_shimmer/pg_1.png
You can create a collection that fans out queries like the demo table here. https://github.com/okrapdf/examples/blob/main/collection-grid/demo.gif
Because the PDF and its derived markdown and images are stored in the same worker instance, deletion wipes all of them, and optionally the chats along with it.
Can sign up here for early access and try the sdk with the API keys. Can't wait to share more https://okrapdf.com/
Top comments (0)