Discussion on: Clawshier OpenClaw Skill

View post

Honest question: what happens to the 40% that fail? In data pipelines, 60% reliability means the pipeline is broken, not "mostly working." You end up building a second system just to catch and fix the errors from the first one.

Do you have a manual review step for the receipts it botches, or do you just accept the data loss? Because that's the real design decision here, not which vision model to use.

Fernando • Apr 9 • Edited

Hey, that was more a joke than actual statistics on success/failures. Did you open the link on that quote? 😆

In reality I do sometimes encounter some issues where OCR wasn't successful though, which is understandable since most receipt pictures I upload are handheld, with poor lighting and sometimes crumbled up with partially blurred out text.

Do you have a manual review step for the receipts it botches, or do you just accept the data loss? Because that's the real design decision here, not which vision model to use.

Yes, the skill itself was built as an orchestration of other smaller/internal skills or steps executed one at a time. Here's a bit more detail in how that works:

OCR
Structuring
Validation
Posting to G Sheet

The idea behind this was for the skill to be easy to retry and for it to fail safely with the validation step before persiting. You get a summary of what was recorded too as a response so you can easily check whether the receipt was interpreted and persisted correctly (see this success response below)

This means I can quickly tell if the total amount and category are incorrect (most important details I care for at the moment). This is an example of a failure I was referring to during the OpenAI OCR step:

You can see what GTP 5.4 told me when I requested more details about these refusals:

Overall, a retry after a minute or two works... So I guess that's just part of the non-deterministic nature of these models 🤷🏻‍♂️

I do have to note I'm very happy with results so far. Despite occasional hiccups I see very good results from OpenAI OCR but would definitely prefer to use a local model or OCR process (like those mentioned in the post) instead if possible