How I found the Project
Nowadays I've been reading and writing Go code regularly, and my Go journey started with A Tour of Go. While building my http-server-ts, I watched this video where Primeagen builds an HTTP server in Go, which showed me how Go handles network programming. After getting confident with the language specifics, I found streamplace. I tackled that issue and successfully made a PR that got merged. I'll give you the details below.
The Issue Description
When the firehose is unavailable, the stream had error-prone code that would crash the loop. After getting familiar with the codebase (without even setting it up, thanks to my C++ background), I commented on the issue saying I'd like to work on it. I mentioned that the issue was in the StartLabelerFirehose() function in labeler_firehose.go, and I was planning to replace the crash with a backoff mechanism. I got nice feedback from the maintainer saying my approach sounded reasonable and that it would be better to add a metric to spmetrics.go. Basically, my conversation with the maintainer reduced the back and forth in my pull request, which I recommend to everybody that communicate beforehand when tackling an issue.
Set Up
Even though the code structure was pretty niche, setting up this project was time-consuming because of handling dependencies. I spent way too much time on this, which I should have done a long time ago if I wanted to contribute regularly. I dealt with sqlite3 version issues when following the documentation.
The Solution
I fixed the issue by adding a streamplace_labeler_firehoses_connected metric in spmetrics.go to track labeler connections:
var LabelerFirehosesConnected = promauto.NewGaugeVec(prometheus.GaugeOpts{
Name: "streamplace_labeler_firehoses_connected",
Help: "number of currently connected labeler firehoses",
}, []string{"labeler"})
Then I imported spmetrics.go and changed the log level at the retry threshold from Error to Warn:
log.Warn(ctx, "firehose failed 3 times within a minute, backing off", "err", err)
time.Sleep(5 * time.Second)
retryCount = 0
retryWindow = time.Now()
When fixing this bug, I made a mistake that I corrected before even committing. I was unintentionally creating a race condition by changing cancel():
defer func() {
spmetrics.LabelerFirehosesConnected.WithLabelValues(did).Set(0)
cancel()
}()
This was problematic because the defer won't execute until the function returns, but I also had defer cancel() which would be called in the reverse order of declaration. This was creating confusion. When it comes to critical-performance places in the code, it's always better to be skeptical. For that reason, I didn't touch cancel() and it stayed as defer cancel().
Testing and Suggestion from Maintainer
I tested the change locally using Bluesky's official moderation labeler. I made sure that the metric reports 1 while the labeler firehose is connected. I also tested the retry path and saw that the code backs off for 5 seconds after repeated failures instead of crashing.
But I didn't consider more than one labeler firehose by writing:
spmetrics.LabelerFirehosesConnected.WithLabelValues(did).Set(1)
defer spmetrics.LabelerFirehosesConnected.WithLabelValues(did).Set(0)
Via the suggestion from the maintainer, I changed to Inc and Dec instead of Set for connecting more than one labeler firehose:
spmetrics.LabelerFirehosesConnected.WithLabelValues(did).Inc()
defer spmetrics.LabelerFirehosesConnected.WithLabelValues(did).Dec()
Eventually my pull request got merged.
Gained Knowledge
For the first time, I dealt with a well-structured Go codebase. I learned about race conditions and reminded myself how important small details are. Also, as I mentioned before, it's better to keep code changes precise and stay in touch with maintainers. I'll keep going and continue contributing to this codebase.
Top comments (0)