I was working in a legacy code, where hibernate and ORM dominates almost all interactions with the database, it's a good library, but when we have a code that's being passed by a lot of developers over time, the sql query starts to be a myth no one knows how to do it and leave that job to hibernate to figure out how to create the query.
An innocent story, asking to now add more data into a report. The developer goes to the query and adds the filter. From a query of 1.6sec in the worst case, it went to 50sec.
So looking to the logs of hibernate, hibernate was querying more data then it needs since it was fetching a huge model and only needed a few attributes. We can configure hibernate to do that, but the development team decided to remove and gain knowledge around that functionality.
From a simple query in hibernate to a 4 Join in a sql query, that was the result, but surprisingly the query wasn't that better. Now the problem was that in 2 tables that the query was joining, could be a list of data, instead of joining into the primary key the join was into another attribute causing the query to generate duplicate data.
To explain this join:
If you join by primary key or by a unique key in another table you are only going to get one record for each join, however, if you join within an attribute that can be more than 1 in the table, you will get more than 1 record, and if you join in more than 1 table that can do that, if both have 2 records for the same record the query will return 4, and if there is 3, the result will be 9. Of course in production, this query was against 5k records around and you can always assume the worst-case scenario that was 4 records for each row, that could result in one query returning around 20k records.
The fun part of the joins was that the query was returning 10 columns and if the 2 tables had 2 records for the same parent records, we are going to result in 4 records for the same data, in another word, 40 columns with the same data. heheheh
this query was around the same time as hibernate.
With a frustrating time trying to create one query to rule them all(hahaha), we tried to go with multi-thread approach. So the idea was to go for each table by the primary key and fetching the data that it was needed and assemble in memory.
For the first 500~1000 it was performing ok, but when we start to run a performance test with 50 calls simultaneously with a dataset of 5k it began to behave like the hibernate as well, that was because the IO part was taking at around 10~30ms per
Select by Primary Key, but we had 4 queries for each data, for 5k, so,
(5000 *10)/1000 = 50sec.
And the other problem was that depending on the container size if you are running with less memory you are going to be able to create fewer threads, so, in the end, you will need to queue the requests until a new thread will be available and this will be time-consuming, for each request we were going to create or re-use
5000 * 4 =20.000(obviously a 2Gb application can't create that much of threads, so for each request we were queuing).
In production we have a big database with
16Gb against a
k8 2Gb application, the database was managing the memory better than the application.
With another frustrating idea going to the parking lot, we decided then to understand more about the business and try to avoid or improve a few joins in the first query.
In conclusion, we were able to remove 1 big table of the
join's in the query and filter more precisely in which part of the data that we want to show in the report. By doing that we improved the query from 50sec to 14sec. Still not that good but its better (happy_face).
Since the report was only being generated once a week and used only by the admin user with at most 100 people in that role, we thought that spending more time to improve the performance was going to be a waste of time since we had more important stuff to improve that could create more impact to the business.
In the end, it was a fun journey, but understanding the business was the right solution but the most difficult since we had to discuss with a lot of people and dig into the history of the project how it was done and why compared to just go there and improve the performance only changing the technical part.
The next steps to improve that query are in enabling pagination and try to refactor the database to have a more performance way to query that data.