The Problem
In Nigeria, SMS delivery is not guaranteed. Carriers drop messages, gateways go down, and your carefully crafted transactional alert silently vanishes into the void.
When you run a food delivery marketplace where:
- A vendor needs to know about a new order within seconds
- A rider needs pickup instructions to arrive on time
- A customer's OTP must reach them before they abandon signup
…SMS reliability isn't nice-to-have. It's table stakes.
We started with one provider. We learned the hard way.
The Architecture
We now run two independent messaging services, each with different strengths:
Transactional SMS (primary)
→ KudiSMS service with error classification
→ Critical failures → developer alerts
→ Structured validation + phone number parsing
Escalation & multi-channel (secondary)
→ Termii service (SMS + WhatsApp + Voice)
→ Configurable per-channel toggles
→ Used for time-sensitive vendor/admin alerts
How they work together
KudiSMS handles the bulk of transactional traffic. Every order confirmation, OTP, and status update flows through a dedicated notification channel backed by KudiSMS. The service has explicit error classification, retryable failures vs critical failures, and logs them differently. Critical failures trigger alerts so someone knows immediately.
Termii covers the gaps. It supports SMS, WhatsApp, and Voice calls through a single API. For order escalation, a job dispatches alerts through multiple channels:
- SMS to the vendor's phone (first attempt)
- WhatsApp if the vendor's preferred channel is configured
- Voice call as a last resort for urgent orders
- Admin escalation - if the vendor hasn't responded after N minutes, admins get WhatsApp + Voice alerts too
Each channel can be toggled independently via configuration. This means we can disable SMS for maintenance without affecting WhatsApp or Voice delivery.
The subtle design decisions that matter
Phone number normalisation is separate per service. Nigerian numbers come in different formats (080…, +23480…, 23480…). Each service normalises independently; if one has a bug, the other still works.
Validation fails fast, not silently. KudiSMS validates configuration on every request and throws immediately if credentials are missing. A misconfigured service fails loudly rather than quietly dropping messages.
Errors are classified by severity. We distinguish between retryable failures (network blips) and critical failures (auth issues, invalid responses). Only critical errors wake someone up. This prevents alert fatigue from transient network hiccups.
Escalation has state awareness. The order escalation job checks whether the order has already been accepted before sending an alert. It won't spam a vendor who already picked up the phone and accepted the order.
What this looks like in production
A vendor gets a new order alert:
- KudiSMS sends the transactional SMS → delivered, great
- If KudiSMS fails → the error is classified and logged, and a developer gets an alert
- Meanwhile, the escalation timer starts ticking
- After N minutes without acceptance → Termii sends a WhatsApp message
- After N more minutes → Termii makes a voice call
- After N more minutes → admins get notified via WhatsApp and voice The system doesn't just fail over. It escalates through increasingly intrusive channels.
Lessons Learned
- Redundancy means independent failure modes. Two SMS providers using the same carrier or the same API pattern will fail together. Our providers have different infrastructure.
- Multi-channel is better than multi-provider. WhatsApp and Voice don't compete with SMS carriers. Adding channels gives you genuinely orthogonal delivery paths.
- Alert fatigue kills incident response. We spent as much time designing when not to alert as we did building the alerting itself. Classify errors by severity and keep the noisy ones quiet.
- Test the fallback. If you've never seen your secondary provider actually handle traffic, you don't have a fallback. We periodically route test traffic through Termii to validate the path.
Top comments (0)