Running reconciliation against the Daraja sandbox last week, I got this:
{"checked":3,"matched":0,"skipped":0,"mismatches":[
{"checkoutRequestId":"ws_CO_26032026133641276708729173",
"storedStatus":"PENDING","mpesaStatus":"FAILED"},
{"checkoutRequestId":"ws_CO_26032026111016899708729173",
"storedStatus":"SUCCESS","mpesaStatus":"FAILED"},
{"checkoutRequestId":"ws_CO_26032026113146397708729173",
"storedStatus":"SUCCESS","mpesaStatus":"FAILED"}
]}
The last two entries are the problem. Both have confirmed M-Pesa receipts in the database — UCQ5UAQ403 and UCQ5UAPYRY — with confirmed deductions on the test account. The STK callback delivered ResultCode: 0 for both. Money moved. Safaricom's own callback said so.
The STK Query API disagrees. It says both payments failed.
I searched Stack Overflow, the Safaricom GitHub repos, every community integration I could find. No prior documentation of this. Not a single issue or comment. It appears to be unreported.
What's actually happening
Safaricom's sandbox doesn't fully simulate the USSD network layer. This is documented behavior — it's why Pesa Playground exists. The sandbox can't reliably generate failure states. What's less documented is the inverse: the sandbox STK Query endpoint apparently cannot reliably confirm success states either. It defaults to FAILED when it can't definitively resolve a transaction, regardless of what the callback already told you.
The sandbox callback and the sandbox STK Query are not reading from the same source of truth.
How mpesa-stk@0.1.1 handled it
The library refused to act on the contradiction. matched:0 — it checked the payments, found that the STK Query response conflicted with an authoritative stored SUCCESS, and did not overwrite. The PENDING record from the orphaned payment stayed PENDING rather than being incorrectly resolved to FAILED.
That is the correct behavior. A reconciliation system that overwrites SUCCESS with a contradictory query response would be worse than one that does nothing.
What this means for your reconciliation implementation
Two things need to be true in how you handle STK Query responses:
Never overwrite a terminal SUCCESS or confirmed FAILED record based on a query response alone. The callback is the authoritative source. The query is a fallback for records that never received a callback — PENDING only.
Don't trust sandbox reconciliation results. The sandbox STK Query is not a reliable test surface for this code path. Test your reconciliation logic against a production environment, or accept that sandbox results for this specific path are noise.
The production question
I haven't run this against a live production environment. Safaricom's documentation implies the production STK Query returns accurate results — the sandbox is the broken environment, not production. If you've tested reconciliation in production and can confirm the query API behaves correctly there, I'd like to know. Leave a comment or find me on the Daraja Discord.
The finding stands regardless: if you're building reconciliation, your implementation needs to handle contradictory query responses. The sandbox will generate them. Production might too, in edge cases nobody has documented yet.
Tested on 2026-03-26, Daraja sandbox, mpesa-stk@0.1.1. Full test log in the flutter-daraja-raw repo.
Top comments (0)