Sometimes, best practices, scalability and technical knowledge can backfire!
Recently I started a project with a friend of mine about finding objects within a video. After a first draft of whiteboarding the solution we got stuck on a problem that could easily have doomed our plan. Here, I'd like to share what we learned from that.
As someone with experience on handling large files in tiny servers, I knew some technical features were indispensable if we planned to get even a small amount of concurrent requests. So we went to excalidraw and started drawing some arrows and boxes. We would have a golang public api that would break up videos and store then on GCP's storage while creating a message to pub/sub or other messaging queue. Then a python service would then receive those messages, download the video fragments, break then into images and process then with a open source image recognition model. The response would be stored onto a sql database and another message would be sent, telling the golang server to send a webhook.
The final drawing was pretty reasonable! I'm sure that it would lead to a good implementation that could be able to handle a lot of traffic, and be open to be extended with additional features (that always come up). But... how much time would take to build it? How much money would we have to expend to deploy a minimal version? This kind of approach may sound reasonable, but it certainly would doom the project to the graveyard!
We got back to the drawing board... what did we really needed? What was the project about? Our goal was to get objects from a video file, how about we build that? We would handle a small file and grow from there.
Since we needed a python service to use the open source model we went with a monolith with python's FastApi. And since we still needed to respond asynchronously we built a on memory job processing queue. Instead of using google storage we got a dict to hold the images until they were processed. Quick and dirty, the result would be in a txt file!
The project wouldn't have much resilience or fault-tolerance, little to no observability, and, most importantly, it wouldn't handle big files or a lot of concurrent requests. But those features weren't part of the project's goal, the MVP.
When we talk about delivering value quickly and working within iterations, normally (and sadly), we end up building the first, more complex, project. Replacing building a MVP and iterating from there with building a component of the big design per sprint.
When planning out my next solutions, I will always check if my architecture is trying to handle more problems then we actually have. In this release, do we really need to scale? Can we forego retries for the first iteration? Can we replace our redis / rabbitmq / sqs with an in-memory approach for now? How can we test our hypothesis with client feedback more quickly?
As always, simplicity is the art of maximizing the work not done.
Top comments (0)