It was in a midsize application, about 60 000 lines of server code, when I was implementing the API endpoints and database logics. For new features, I initially processed one entity, such as user, a comment or file. In UI when there was a list of something, the users often can select multiple items and run an action on many together. Instead of calling the provided API endpoint multiple times, they asked me to implement a version that would accept many at once.
Now doing so there are also basically two different approaches, that at that time in 2016 have not been as obvious to me, because the backend code used node style callbacks. In the first approach you would today in server side accept many items and run the original logic just using 'promise.all()'. This is kind of, how GraphQL is doing it in a resolver.
However, this is very inefficient for the server performance, as it is executing a lot of very small SQL statements. So, I was implementing a version of that function that would really take many Items and run as few database queries as needed.
This is also, how many people do it today in GraphQL. Using the dataloader module developed by Facebook.
The impact of this is, that the code you write get more complex. Handling a list is more complex then handling a single item. that get most obviouse when you encounter a condition like this:.
This example has the same number of lines, but the code get more bigger when there are more then two different possible values for the 'prop' or when you have mode than one condition. You are likely to split functions into multiple because it get to hard to read and. Slitting a function into multiple is good, to handle more complex logic, but maybe the code don't need to be as complex in the first place. In some functions I ended up with multiple index objects or also used 'array.filter()'. This approach can definitely change the coding style for the entire project.
But what was the goal of this complex functions. It was to avoid constant called to something like 'getItemById', with a single id, and execute to many SQL statements that each only contain one id and are very costly on networking and together put a huge burden on the db.
That is when I decided to do an other approach. The idea: to do caching, but do not cache the results, but the function-calls and the callbacks to functions that do the database access.
This is what I than wrapped into the module tcacher (today it is refactored for async functions not callbacks). Having the request caching not on my API side, but on the data layer, I was able to get the gains of running few SQL queries, while still keeping code, that looks like processing a single Item. In fact, this way, even more queries have been avoided, because even queries from different API's that use the same database method, are batched together.
It was much later, at a new, my current company, that I learned about dataloader and that this functionality was not called request caching, but query batching.
Today, I think, it does not matter what package you use, dataloaderl or tcacher. The first looks more object oriented, the other more functional.
However, what I think is important, is to do the query batching, at the side where you access other resources, not where other apps call your code.
I, of course will always use tcacher. Because it does one thing and does it right. It does batching. And I have seen engineers struggling trying to figure out how to use dataloader right, together with its second feature, an actual in memory cache. Along the way loosing many of the benefits.
You see, I am not only proud of the fact that I had a the solution before I learned about a solution provided by Facebook, but also to find a way, to keep the code clean.
Top comments (5)
It was in a midsize application, about 60 000 lines of server code, when I was implementing the API endpoints and database logics. For new features, I initially processed one entity, such as user, a comment or file. In UI when there was a list of something, the users often can select multiple items and run an action on many together. Instead of calling the provided API endpoint multiple times, they asked me to implement a version that would accept many at once.
Now doing so there are also basically two different approaches, that at that time in 2016 have not been as obvious to me, because the backend code used node style callbacks. In the first approach you would today in server side accept many items and run the original logic just using 'promise.all()'. This is kind of, how GraphQL is doing it in a resolver.
However, this is very inefficient for the server performance, as it is executing a lot of very small SQL statements. So, I was implementing a version of that function that would really take many Items and run as few database queries as needed.
This is also, how many people do it today in GraphQL. Using the dataloader module developed by Facebook.
The impact of this is, that the code you write get more complex. Handling a list is more complex then handling a single item. that get most obviouse when you encounter a condition like this:.
For this situation, you have to process both cases and the do-functions need to also accept lists. I was using the underscore library at that time:
This example has the same number of lines, but the code get more bigger when there are more then two different possible values for the 'prop' or when you have mode than one condition. You are likely to split functions into multiple because it get to hard to read and. Slitting a function into multiple is good, to handle more complex logic, but maybe the code don't need to be as complex in the first place. In some functions I ended up with multiple index objects or also used 'array.filter()'. This approach can definitely change the coding style for the entire project.
But what was the goal of this complex functions. It was to avoid constant called to something like 'getItemById', with a single id, and execute to many SQL statements that each only contain one id and are very costly on networking and together put a huge burden on the db.
That is when I decided to do an other approach. The idea: to do caching, but do not cache the results, but the function-calls and the callbacks to functions that do the database access.
This is what I than wrapped into the module tcacher (today it is refactored for async functions not callbacks). Having the request caching not on my API side, but on the data layer, I was able to get the gains of running few SQL queries, while still keeping code, that looks like processing a single Item. In fact, this way, even more queries have been avoided, because even queries from different API's that use the same database method, are batched together.
It was much later, at a new, my current company, that I learned about dataloader and that this functionality was not called request caching, but query batching.
Today, I think, it does not matter what package you use, dataloaderl or tcacher. The first looks more object oriented, the other more functional.
However, what I think is important, is to do the query batching, at the side where you access other resources, not where other apps call your code.
I, of course will always use tcacher. Because it does one thing and does it right. It does batching. And I have seen engineers struggling trying to figure out how to use dataloader right, together with its second feature, an actual in memory cache. Along the way loosing many of the benefits.
You see, I am not only proud of the fact that I had a the solution before I learned about a solution provided by Facebook, but also to find a way, to keep the code clean.
Cool. Thanks for sharing
Shameless self-promotion, but it allows me extend both markdown inside it, and vice versa, which is totally helpful for my blogging.
patarapolw / hyperpug
Lightweight Pug for browser/Electron. With Pug filters' support, which can also contain indented language like markdown.
It now has no dependency.
Honestly, the real pain is not the code, but testing. Testing is always a problem when either the code gets a little complex, or subjects to change.
Shouldn't be surprising that I like it, huh?
Can't post 32043401 lines here.
If it is that long, then I don't know whether it's that good.
Also, you could post a link to e.g. you git repository