Varun Palaniappan

Posted on Mar 23

Caching to improve performance: Are you optimizing a bit too soon?

In this podcast episode, Krish from Snowpal discusses the crucial topic of caching and its implications for software performance. He emphasizes the importance of regular platform usage and updates to stay abreast of new features and changes. Krish delves into the intricacies of performance issues, highlighting the need to address them at various stages of development, from production to prototyping. He distinguishes between quick fixes and long-term solutions, advocating for a thorough examination of underlying design or architectural flaws. Krish urges developers to question the necessity of queries and APIs before resorting to technical fixes, emphasizing the importance of considering alternative approaches to problem-solving. Overall, the episode underscores the significance of proactive measures in optimizing software performance and user experience.

Summary

Understanding Performance Issues:

Krish explores the reasons for discussing caching, emphasizing the importance of addressing performance issues at various stages of development, including production, pre-production, and prototyping.

Hot Fixes vs. Long-Term Solutions:

Krish discusses the tendency to prioritize quick fixes for production issues and the importance of identifying underlying design or architectural issues for long-term solutions.

Identifying Fundamental Problems:

Krish urges developers to question the necessity of queries and APIs before attempting technical fixes, emphasizing the importance of considering design and architecture implications.

Exploring Alternative Solutions:

Krish suggests exploring alternative approaches to problem-solving, such as redesigning pages or reevaluating the necessity of certain API calls, before resorting to technical fixes.

Podcast

Check out on Spotify.

Transcript

0:00

Hello. Hey everyone. This is Krish and I hope you're doing well. Welcome to Snowpal's podcast. In the previous podcast, I talked a little bit about caching. I want to continue that discussion, that monologue. I should say a tiny bit more in Part 2 of this series.

0:18

But before I do that, let me ask you the customary question. When was the last time you checked out snowpal.com? You want to make sure you use that the product the platform regularly because we have changes. We deploy several times a week and we have more features show up all the time.

0:38

Look for the new features icon on the sidebar that'll tell you and call out for the newest implementations and deployments. Thank you. Now, without further ado, let's get into the podcast. In the previous podcast, I talked about caching at a reasonably high level. Different aspects of caching. What you need to cache, how you could possibly go about it. Let me pick this monologue from where I left off there, but also digress, but this time intentionally so, and talk about why we got into the discussion of caching. A lot of times when there's a performance issue, you're trying to find different solutions, right?

1:16

Your platform or the service could be in production, or you could be close enough to go into production, or you're in early stages of building your prototypes. Whatever it is, the challenges are going to be quite different depending on where you're at because the app that is in production has folks using it more people using it, it's going to have more create more bottlenecks, and it's going to test your system a whole lot more.

1:40

But that doesn't mean you only notice these in production, because a lot of times performance and scalability issues don't necessarily stem from the fact that there's a large volume of users or more folks are using it at any given point of time.

1:57

Sometimes it's, hey, if nobody else is using the product, or even on your local machine dev sandbox, you might be making a request and that could take a good 2-3 seconds to return. So that tells you that there's a fundamental problem. It's not so much as scalability or throughput issue, which actually is a better problem to have because hardware is cheaper, right?

2:19

Resources are expensive. So if you can solve a performance issue by throwing more hardware more, I mean everyone's going to jump on it because servers are cheap. A whole lot cheaper than than humans obviously, right? But you know, not every time you can horizontally scale by throwing in a bunch of servers, adding more nodes to your cluster because you might have fundamental architecture or design issues in your platform of the product.

2:43

Now identifying those, it can get pretty challenging. You know, making something if a page took 5 seconds to render, and if you want to, obviously it's a ridiculous amount of time. So if you want to get that down, see 100% improvement and bring it down to 2 1/2 seconds, that's one kind of a problem.

3:04

But then once you go to 2 1/2 and you want to see another 100% improvement and you want to go down to 1.25, it's not usually twice as challenging. It's exponential. And now as you go below the 2nd and below the half a second mark and have your pages or screens or APIs return in hundreds of milliseconds, it's gonna be it's gonna get even more challenging, right?

3:28

Because there's only so much you can possibly do given an existing architecture. There might be scenarios where you're like, hey, you know what, maybe this whole approach or design calls to be questioned, right. But that those are dramatic questions and you're not going, you're you don't want to be asking them prematurely and ignorantly as well, right?

3:49

So we all want to do our due diligence. So having given all those precursors, one of the things I notice quite a bit is when there is a problem, especially if you have a product or an application in production, the first thing I want to do is fix it as quickly as possible. That's understandable.

4:05

You have users, they're using it. You don't want them to have a diminished user experience. So you're going to do everything in your capacity to get that out the door like a hot fix as soon as possible. But a hot fix is only temporary, right? You want to get things moving along. You want to go back and look at, OK, what caused this problem to begin with and why was this not caught before it went to production?

4:23

Now again, those are completely different topics. They have nothing to do with caching. But a lot of times I think about things that are related but not exactly super related to the particular podcast and I find myself in that situation and hopefully I'll improve and not get into that. But back to what I was trying to say.

4:41

You fix this in production. Now you come back to your dev machine and you're trying to understand the source of the problem to make sure it doesn't surface again or you want to reduce the frequency of something like this happening and you figure that it's not a hardware issue. You've done all of that research and you've identified all of, you know cleared those doubts in your minds.

5:00

So now you know that it's your software it's the code that you and your team members may have your team essentially created. So you have all the control of the power to a large extent to make those changes. Now let's say you've made a query fix or an API change or some other change to get it going in production, and you come back and you look at the overall design.

5:18

And before you look at it, a lot of times what I noticed people do is you want to go back and identify a query that's taking more time or an API that's taking longer, or UI screen or a page that for whatever reason the Dom parsing on the JavaScript or the client side took a long time.

5:36

Those are all still patches, right? Monkey patches. If not anything else, You want to identify the true source of the problem. And what I mean by that is you want to ask this question, which I think is not.

5:51

It doesn't get asked often enough. Before I tune a query, before I fix an API, before I do any of those things, ask yourselves, should this query even be running? The fact that the query runs and takes long too long and should return in QuickTime?

6:07

Sure, that's a no brainer. But many times there is a more fundamental problem that you know that just goes doesn't get identified. Should this API even be called? Was this called more than once?

6:23

And even if it were not to have been called more than once in the context of this HTTP request, why was it called? Does it need to be called at all? But can I solve this problem differently? Right. So you have to wear sort of a product management hat, not so much a developer's hat at that instant of time.

6:42

And sometimes it's challenging because you want to fix the problem and you're super excited about fixing it. So you try to find a technical solution, but before you go too far deep into the technical solution, especially when it's not a production art fix, you definitely want to check to see why.

6:59

You know why this was not caught before it went to production. Is it one aspect of rights. You're saying, hey, can we improve our testing and test suites, but that's not what I'm trying to say. What I'm trying to say, not very succinctly unfortunately, is did I miss something about the?

7:15

Did we miss something about the design that this problem surfaced in production? If it's not necessarily volume-related, or even if it is, it is very, it is likely that we are addressing the problem differently. Before I fix the query one more time, sorry, I'm repeating this and before I fix the API, let me see if these calls are even being made as they're supposed to.

7:40

Maybe this page should be constructed differently, right? Maybe it should be assembled differently. Maybe these APIs have to do a, you know, have to process the whole. You know? Maybe the request response here should be different from how it actually is.