My first post on DEV! As I began this project, I wanted to find a place where I could gather feedback and connect with others, which brought me to this platform. Iโd be happy if this project helps someone, and I look forward to all the learning opportunities that come from interacting with you all.
I previously worked on developing an SMS Delivery Platform, a project filled with common challenges in backend and distributed systems, including:
- High throughput
- Low latency
- Fault tolerance
- High concurrency
- Tenant fairness
- Strict rate limiting
- Just-in-time processing and more
Initially, I approached this project with questions such as, "๐๐ค๐ฌ ๐๐๐จ๐ฉ ๐๐ค๐ช๐ก๐ ๐ ๐๐๐ซ๐๐ก๐ค๐ฅ ๐ฉ๐๐๐จ ๐ช๐จ๐๐ฃ๐ ๐๐ช๐ง๐ง๐๐ฃ๐ฉ ๐ผ๐ ๐พ๐ค๐๐๐ฃ๐ ๐ผ๐๐๐ฃ๐ฉ๐จ?" and "๐ผ๐ฉ ๐ฌ๐๐๐ฉ ๐จ๐๐๐ก๐ ๐ฌ๐ค๐ช๐ก๐ '๐ซ๐๐๐๐ก๐ ๐๐ค๐๐๐ฃ๐'โ๐ง๐๐ก๐ฎ๐๐ฃ๐ ๐๐ฃ๐ฉ๐๐ง๐๐ก๐ฎ ๐ค๐ฃ ๐๐๐๐ฃ๐ฉ๐จโ๐๐๐๐ค๐ข๐ ๐๐ข๐ฅ๐ค๐จ๐จ๐๐๐ก๐?" However, I soon realized that these questions were less meaningful, as the answers depend heavily on the specifications and how much of the business requirements are reproduced.
As I continued developing, I began to reflect on my earlier decisions: "๐๐๐ฎ ๐๐๐ ๐ ๐๐๐ค๐ค๐จ๐ ๐ฉ๐๐๐จ ๐จ๐ฅ๐๐๐๐๐๐ ๐๐๐จ๐๐๐ฃ ๐๐๐๐ ๐ฉ๐๐๐ฃ? ๐๐๐๐ฉ ๐ฌ๐๐ง๐ ๐ฉ๐๐ ๐ฉ๐ง๐๐๐-๐ค๐๐๐จ?" and "๐๐๐๐ฉ ๐ฌ๐ค๐ช๐ก๐ ๐ ๐๐ค ๐๐๐๐๐๐ง๐๐ฃ๐ฉ๐ก๐ฎ ๐๐ ๐ ๐๐ค๐ช๐ก๐ ๐๐ค ๐๐๐๐ ๐ฉ๐ค ๐ฉ๐๐๐ฉ ๐ฉ๐๐ข๐?" Through this process, I discovered many "better ways" to tackle these challenges today.
The real interest lies in writing the code, considering the design and trade-offs, and addressing challenges step-by-step. To start, I created a minimal and naive Go implementation. This version does not utilize goroutines; itโs a monolith running as a single process that meets only the bare minimum functional requirements. While it cannot satisfy the non-functional requirements of an early-stage product, I believe it provides a clear understanding of the system's functionality.
Moving forward, I plan to evolve this code by setting specific challenges for each iteration and sharing my reflections on the design decisions and trade-offs involved.
// Start begins the background polling loop.
func (w *Worker) Start(ctx context.Context) {
// poll every 5 seconds check for pending campaigns
// TODO: what if the worker crashes? what if we have many campaigns?
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
log.Println("Worker context cancelled, stopping...")
return
case <-ticker.C:
w.processNextCampaign(ctx)
}
}
}
func (w *Worker) processNextCampaign(ctx context.Context) {
// Find the next available campaign
// TOOD: what if company A has 1000 campaigns and company B has 1 campaign? Company A will monopolize the worker.
campaign, err := w.db.FindPendingCampaign(ctx)
if err != nil {
log.Printf("Error checking for pending campaigns: %v\n", err)
return
}
if campaign == nil {
// No pending campaign found
return
}
log.Printf("Found pending campaign %s, starting processing...\n", campaign.ID)
// Fetch the CSV file using the storage port
csvData, err := w.storage.FetchCSV(ctx, campaign.DestinationsFilePath)
if err != nil {
// TODO: what if temporary storage is down? or what if the csv file was accidentally deleted?
log.Printf("Failed to fetch CSV for campaign %s: %v\n", campaign.ID, err)
return
}
// Parse the CSV
reader := csv.NewReader(bytes.NewReader(csvData))
// TODO: what if the csv file is huge?
records, err := reader.ReadAll()
if err != nil {
log.Printf("Failed to parse CSV for campaign %s: %v\n", campaign.ID, err)
return
}
// Loop through records, create SMS messages, and send to the Carrier API
targetMessage := campaign.TemplateBody
for i, record := range records {
if len(record) == 0 {
continue
}
// Assuming the first column is the phone number
phoneNumber := record[0]
if phoneNumber == "phone_number" && i == 0 {
// Skip header row if it exists
continue
}
// to make this code readable, I'm not going to replace variables in the template body for now.
// TODO: yes, CarrerAPI has rate limits. We need to respect that.
err = w.carrier.SendSMS(ctx, phoneNumber, targetMessage)
dispatch := &domain.SmsDispatch{
SMSMessageID: uuid.New(),
CampaignID: campaign.ID,
TenantID: campaign.TenantID,
PhoneNumber: phoneNumber,
DispatchedAt: time.Now(),
IsSuccessful: err == nil,
}
if err != nil {
log.Printf("Failed to send SMS to %s: %v\n", phoneNumber, err)
} else {
log.Printf("Successfully sent SMS to %s\n", phoneNumber)
}
// record the result
// TODO: what if this worker crashes after sending some SMS of a campaign, but still have pending SMS to send?
if saveErr := w.db.SaveSmsDispatch(ctx, dispatch); saveErr != nil {
log.Printf("Failed to save SMS dispatch for %s: %v\n", phoneNumber, saveErr)
}
if err != nil {
continue
}
}
// Mark the campaign as completed in the DB
err = w.db.MarkCampaignCompleted(ctx, campaign.ID)
if err != nil {
log.Printf("Failed to mark campaign %s as completed: %v\n", campaign.ID, err)
} else {
log.Printf("Campaign %s processing completed.\n", campaign.ID)
}
}
๐ Overview
This repository demonstrates the evolutionary system design of an event-driven SMS delivery pipeline.
This project is inspired by a real-world SMS delivery system I previously worked on. I chose this specific domain because it serves as an excellent crucible for tackling the core challenges of modern backend and distributed systems. It naturally demands robust solutions for
- high throughput
- low latency
- fault tolerance
- high concurrency
- tenant fairness
- strict rate limiting
- just-in-time processing
- and more
Instead of starting with a complex distributed architecture, this project begins as a minimal, Go monolith. It progressively evolves into a decoupled system to address specific backend challenges as the hypothetical scale grows.
๐ฏ Core Service Specifications
This platform enables client businesses to reliably manage and deliver SMS communications for various use cases, such as marketing campaigns and appointment reminders. The actual physical delivery to end-user devices is handled by mobile carriers (e.g., Verizon, T-Mobileโฆ

Top comments (0)