Beyond CRUD: Building a GitHub Activity Tracker to Level Up Backend Engineering

#programming #go #architecture #learning

I am tired of CRUD apps. Spinning up a database for basic operations, the same form validation logic, picking another frontend framework and hoping that I don't end up in an npm supply chain attack. I do this at work enough already. As an engineer I wanted something that forced me to think about more than just an MVC. I wanted to think in systems. I wanted to build something with concurrency, streaming, caching, async message queues, all things I had touched before but wanted to actually understand. So I built a GitHub activity tracker. It fetches commits from a list of repos, generates a markdown report, caches it in Redis, and emails it to me via an async message queue using SMTP. Seems simple but it encapsulates a lot of the concepts needed for a scalable backend and it keeps me honest about my GitHub activity. Check it out on GitHub.

Why I chose Go

Coming from a web dev world where TS and PHP dominate, I wanted to use something different. I felt like I needed a language that took performance and concurrency more seriously. Go's goroutines and channels aren't syntax sugar; they're the core abstraction. For a project where I stream files into chunks and fetch data over HTTP concurrently it seemed like the obvious choice.

The streaming parser and the channel footgun

Recall how I just said Go was the right tool for the job? Well, it was, but like every tool, a certain amount of expertise is needed. I learnt this quickly after building my first parser that reads 8 byte chunks, splits them on newlines, and writes completed lines into a channel:

func ParseFileByLine(f *os.File) chan string {
    ch := make(chan string)
    go func() {
        defer close(ch)
        // ... read chunks, split lines, send to channel
    }()
    return ch
}

The consumer ranges over the channel reading data asynchronously. I used this to parse a list of repos and fetch data for them in one concurrent swoop. Clean and simple, right?
Except I initially passed the channel through all my different layers. The channel was created in the handler since that is where the result is consumed, then passed into the command handler in domain, which called the parser in infrastructure that wrote into the channel. The channel was then closed in the command handler via defer. That implementation probably caused a lot of Go experts to cringe, and they would be right to. Due to this sharing of the lifecycle, the goroutine hadn't even started and the channel was already closed. Leading to a complete deadlock.
At first I fixed this by spawning multiple goroutines but it didn't feel right. Then it clicked: ending a function doesn't kill the goroutine. The function that spawns the goroutine can have complete control over the channel lifecycle. Create it, pass it, close it when the goroutine exits. Coming to this realisation I cleaned up the code and made sure the infrastructure parser was solely responsible for the channel. Every other component was merely a consumer.

Next steps: completing the functionality

After that I felt confident with channels and how Go was meant to be used. Filling out the rest was straightforward. Redis and RabbitMQ have good docs. I used command/query handlers in domain and infrastructure for I/O, queues, SMTP logic and parsing. I added a /repo package for data fetching/setting and a message handler to consume the queue. The message handler used a domain handler to fetch cached markdown via the repo and send it via SMTP to myself.
The message handler taught me another channel lesson.
I wrote:

msg, _ := ch.Consume(...)
command.SendReport(msg) // passing the channel, not a message

msg is a <-chan amqp.Delivery, not a single message. I needed to range over it:

for m := range msg {
    command.SendReport(m.Body)
}

And run the consumer in a goroutine from main, blocking with select {} so the program stays alive. Another channel footgun, another lesson. I also grew to like Go's pattern of returning a pointer to a struct and attaching methods that operate on it directly. My RabbitMQ setup started with two separate structs, a Publisher and a Consumer, each declaring its own queue, each calling the same connection helper. Three queue parameters (name, durable, quorum) duplicated across two files. Change one, remember the other. Annoying to maintain and easy to drift out of sync.
The fix was one struct that owns the queue definition:

type WorkQueue struct {
    ch    *amqp091.Channel
    queue amqp091.Queue
}

func NewPublisher(conn *amqp091.Connection) *WorkQueue {
    ch, _ := conn.Channel()
    q := createQueue(ch) // declared once, used everywhere
    return &WorkQueue{ch: ch, queue: q}
}

func (r *WorkQueue) Publish(ctx context.Context, s string) error
func (r *WorkQueue) Consume() <-chan amqp091.Delivery

Producer calls Publish, consumer calls Consume. Same queue, same config, same struct. The connection is created in main.go, the struct is initialized with one function, then passed down without ever exposing the queue or channel directly. The domain interface EventPublisher hides the concrete type. If I switch to SQS tomorrow, only main.go changes. No need to pass config around on every call. The pointer receiver keeps the struct's state accessible without copying. It feels clean.

The big refactoring

Once I was happy with the functionality I decided to clean up everything. I used interfaces to invert all dependencies from domain to infra and repo so I could write unit tests and refactor more easily. The main.go then simply wired everything and took over dependency injection. The final version follows this principle, inspired by Designing Data-Intensive Applications, various articles, and my own trial and error of confusing architecture. DDIA frames good systems around reliability, scalability, and maintainability. Even at this scale those properties shaped the design: if the fetcher crashes mid-run, cached reports and queued messages survive independently. The fetcher and consumer are decoupled so either can scale on its own. And because domain interfaces define the contracts, swapping Redis for Memcached or RabbitMQ for SQS means touching only main.go. Whether it holds at real scale is a different question but I wanted to make it a habit to think specifically in those terms even on a side project.

This is what the final architecture looks like.

Handler orchestrates: calls the relevant command or query handler or handlers from domain
Domain orchestrates pure business logic and holds interfaces: FileParser, CacheRepo etc.
Infrastructure has concrete implementations: file streaming, HTTP client, Redis, RabbitMQ, SMTP

// domain
type RabbitMq interface {
        Publish(body *formatter.QueueBody, ctx context.Context) error
}

// infrastructure
type WorkQueue struct { ... }
func (r *Workqueue) Publish(...) error { ... }

// main.go
publisher := rabbitmq.NewPublisher(conn)
rclient := redis.NewCache(redisClient)
handler := handler.NewFetchHandler(publisher, rclient)

Docker

Dockerising the app reinforced the architecture decisions made earlier. Because main.go owns all dependency wiring, the container just needs to build the binary and pass an argument. I added a consumer flag: if main.go receives it, it starts the message consumer and blocks with select {}, running forever and processing every delivery. Without it, it runs the fetcher once and exits, which is exactly what a cron job needs. Redis and RabbitMQ run as their own containers. The app image handles both roles depending on the argument. A systemd timer on a Raspberry Pi triggers the fetcher. The consumer runs continuously alongside it. No orchestration overhead, no cloud bill.

Testing

Testing was also easier than expected, at first I found it strange not being able to create mocks. Coming from PHPUnit where you would make pretty much stub every dependency and then assert the call params and how often it was triggered, to just writing tests where only the input and output mattered and no mocks where available was strange but refreshing. Due to me splitting all the logic heavy funcs from the orchestration heavy funcs I was able to easily write unit tests decoupled from domain knowledge. I also came to appreciate Go's use of interfaces. My file parser took *os.File. I wanted to test the error branch. I couldn't manipulate *os.File since it is tied to the OS. The fix was changing the signature to io.ReadCloser. Same logic, same callers, but now I can inject a custom errorReader for the failure case. No mocks needed, just a smaller interface.
Normally I would also write Integration Tests for all handlers and logic that interacts with data persistence but due to this being a fun side project I think I might skip this step and move on to the next.

What's next

Finally I will want a GitHub Actions pipeline to build and test on every push. And maybe gRPC. But that's for future me.
What I learned: Channels are potential footguns. Ownership rules matter. Testability is architecture, not tooling. And if you can't test it without an exorbitant amount of mocks, the code is too coupled.
The code is on GitHub. It's not perfect, but it's honest. And it's mine. I would love some feedback and am open for further discussion :)