How I Ensure My Application Scales

#softwareengineering #scalability

During a job interview I was explaining my day-to-day responsibilities and how I ensure quality on my projects, then I mentioned - “I check that my application scales” - The interviewer then asked - “How do you make sure your application scales?” - I froze, I didn’t have a structured answer and I got blocked because I was not prepared to answer that question. So this post is a retrospective about that experience and outlines a framework for thinking about scalability when working on new features.

Why is it difficult to answer this question?

Scalability has too many dimensions like Traffic, Throughput, Data Size, Latency, etc. We can have a critical flow that has only 10 concurrent users, but these 10 work using really large datasets with terabytes of data, or we can have another flow where we have thousands to millions of requests per second. Each use case will have to optimize for different things. With that in mind, let’s explore good practices that help us build scalable systems.

1. Define the problem and the scope.

Before talking about QPS, latency, costs, etc, we need to fully understand the problem scope. This is not only telling us what is important to implement, this guides how we design the solution because there is rarely one single right answer, but during this process we define what is important and we can establish objectives and what matters the most.

Defining how success looks like, pretty much involves defining SLOs, SLAs, and KPIs, This provides clarity on what to optimize for.

2. Identify bottlenecks.

Once we understand what matters most, we start making estimations. This helps us understand the impact of our new feature and we can start verifying our systems can handle it.

Example scenarios:

Will the downstream service be able to absorb additional 10,000 QPS ?
A new spark job will create thousands of records per second. Can the datastore sustain the expected throughput ? When data size grows, does the cost of fetching a record grow with it ?
My feature will use an LLM, how can I optimize the token usage to maximize ROI?

3. Beware of premature optimization.

I know sometimes we are excited about the next unicorn idea and believe in the great potential of the things we are building, and that optimism is fine, but when building things I highly suggest that you optimize for yourself or a small number of users, test your idea and get data.

This will help us validate assumptions, understand growth patterns, and invest in scalability only when the data justifies it.

4. Analyze complexity.

When talking about Big O notation, it is hard not to think about LeetCode or Software Engineering interviews, but one of the reasons it is important to know Big O notation is scalability.

Let me explain using one example of this:
Imagine that you have a SQL database, a table to call appointments, the table that has a primary key, start and end datetimes, and other relevant information for the appointments. And you would like to bring all the appointments for next week. What would the time complexity look like?

The appointments table doesn’t have an index on the start datetimes: This search will require a full-table scan, so the time complexity for the search is O(N), where N is the size of the appointments table. At the beginning this might not be an issue, but the more data you have, it will require scanning over each appointment to evaluate the filter condition, additionally, I/O and memory usage will be impacted.
The table has a B+Tree index over the start date: This will reduce our time complexity to O(log N + K), N being the size of the dataset while K is the number of rows returned, This is usually an acceptable performance and can scale much better than not having an index.

5. Think about trade-offs.

Consider an Event-Driven architecture: Using events can help us to optimize user-facing latency by moving expensive work out of the synchronous path (for example: a request to an LLM), but comes with some complexity: Increased overall latency, network issues, lagging, dropped events, etc. So I would consider critically when it is the right time to invest in an event-driven architecture, dealing with all the trade-offs that come with it, making sure it provides a much better experience maintaining the platform. Remember point #2 (Beware of premature optimization).

6. Measure, Validate, and Iterate.

We understood the problem, defined what is most important, and implemented our solution, but we are not “firing and forget”, we need to set up metrics, alerts and dashboards, this will help us to monitor and validate whether we are meeting our SLOs though incremental rollouts of the new feature, then compare and act when necessary.

After everything is set up now, scalability becomes an ongoing process of measuring, learning, and adapting. Production data puts us in a better position to perform capacity planning, understanding the organic growth, costs, and ROI, as we now have a real perspective about the service.

Conclusion

Scaling requires critical thinking about what we are building, understanding the dimensions, and evaluating the ROI on the proposed architecture. Remember that there is not always a single right answer when designing the architecture of our projects, but we need to have clarity of what we are building, ensuring that the benefits outweigh the operational and engineering costs.

Scalability is not a feature you add at the end. It is a continuous process of understanding constraints, making trade-offs, and validating assumptions.