Discussion on: You Can Do it in SQL, Stop Writing Extra Code for it

View post

Some things to keep in mind.

Depending on the SQL logic, this could add additional CPU resource pressure on the database server, a service that is hard to scale. Web servers are significantly easier to scale to handle the processing workload.

At least one of the examples looks like it may break the SQL query optimizer in MySQL or MariaDB, causes a full table scan, significantly harming performance. If tables are small, this is no issue, but even for small business workloads, I'm generally dealing with tables on the scale of 1mil to 100mil rows.

Others have noted that this could make parts of the database harder to maintain. Personally, I use some of these methods described above, but they're handled through a SQL query generator, the logic still entirely exists in the application layer, it simply generates and then passes the required SQL query string to the database to handle the operations.

I simply make these notes as someone who has crafted web sites with data processing times in the sub-100ms range, with the fastest now averaging 6ms.

Connor McDonald 🍸☕️ • Dec 17 '18

If the concern is scalability of the database, as counter intuitive as it might sound, doing more work outside the database can make the database work harder then if that work was done inside the database.

This is not just an idle rant - here's a full exploration of that precise topic

youtube.com/watch?v=8jiJDflpw4Y

Geshan Manandhar • Dec 14 '18

I like your point of view, databases are not easily scalable that's where managed services with load balancers come into play I suppose. About the full table scan, this is an example and setting indexes correctly would help tackle the problem. One can always run an EXPLAIN and plan next steps.

About maintenance, I believe none of the code should be in the DB itself it should be in git managed repos. I am also not promoting writing triggers or unnecessary views. 6ms is impressive, at that point there should be a good amount of caching involved. Thanks!

Vincent Milum Jr • Dec 14 '18

6ms is with zero caching, that's all live database queries and html template processing.

The query in question about the SQL optimizer is the one with the HAVING statement. I'd have to check, but limiting the number of rows in a HAVING statement based on an IF statement in the column selector I believe requires a full table scan. The GROUP BY statements should all be good though, especially considering the advances in optimizing these in the past two years.

Geshan Manandhar • Dec 14 '18

Sounds interesting for 6 ms.

For the having query it was just an elaborated example, if it was a real DB query I would have surely run an explain to see if that is the way to go. Thanks!

Scott Tadman • Dec 14 '18

You'll always want to keep in mind how your application is deployed and where you have the most resources.

One application I worked on was deployed on a provider with a pricing model where you paid for web server instances but the database itself was bundled in as part of the package. This lead to a design that off-loaded as much as possible onto the database as it was basically "free" compute. The database instance was far, far faster than any of the individual web servers.

In modern configurations you're going to have a lot of web instances and a few databases that you want to keep lightly loaded so they're responsive. This is especially true if leaning hard on your database means buying more ultra-expensive database server licenses.

Geshan Manandhar • Dec 14 '18

It is a balance of trade-offs as it is usually in distributed systems. Choosing the right trade-off is one of the keys to success in our field.