DEV Community

Cover image for You Can Do it in SQL, Stop Writing Extra Code for it
Geshan Manandhar
Geshan Manandhar

Posted on • Originally published at geshan.com.np

You Can Do it in SQL, Stop Writing Extra Code for it

“SQL, Lisp, and Haskell are the only programming languages that I’ve seen where one spends more time thinking than typing.“ — Philip Greenspun

Even with thinking more than typing SQL (Structured Query Language) we software engineers use it as a way to pull data only.

We usually don’t leverage SQL’s power of data manipulation and do the needed changes in code.

This is quite prevalent in software engineers who work in web applications. Another thing we miss is, if we do the manipulation in SQL directly the pulled data will be the same format for any programming language. This post aims to enlighten you about the powers of SQL you might know but generally don’t use.

Tea Lights image from [Pixabay](https://pixabay.com/en/tea-lights-candles-light-prayer-2223898/)Tea Lights image from Pixabay

TLDR;

Use SQL to do math like sum, average etc. Utilize it for grouping one to many relational values like getting categories of product. Leverage SQL for string manipulation like using CONCAT_WS for concating first name and last name. Exploit SQL to sort by a custom priority formula. Examples below…

The Example

It will be easier to explain the superpowers of SQL putting it in action on an example. Below is a basic schema with 2 tables in MYSQL for a refunds microservice:

There are 2 refunds and 7 related payments as example data.

Some assumptions

For the refunds microservice example schema and applications following assumptions are made:

  1. Refunds microservice and data structure store the fk_item (the id of the ordered/delivered item), but it is not a hard foreign key.

  2. Item can be refunded in either cash or credit for the amount paid for the same.

  3. Items can be refunded many times as long as remaining balance can cover requested refund amount for each cash and credit. For example, item was paid 50 in cash and 50 in credit. 2 refunds of 20 cash and 20 credit can be done. So after these transactions balance will be 10 cash and 10 credit for that item (50–20–20).

  4. Each refund can have multiple items payment. Each payment can be of type either cash or credit.

  5. All amounts are stored in cents so they are integers.

Now let’s use some SQL powers. You can find the example with related queries running on SQL Fiddle.

Do the math in SQL

As software engineers, let’s say if we need to find the total cash and credit amount refunded for an item what would we do? We would run something like:

SELECT fk_item, fk_refund, amount, is_cash 
FROM payment WHERE fk_item=2001;
Enter fullscreen mode Exit fullscreen mode

With current data, it will give 3 rows like below:

With these 3 rows, we would loop over them. If it is cash accumulate it to cashBalance variable, if not sum it up to creditBalace variable. Rather than that it would be a lot easier (probably faster) to do in SQL like:

SELECT fk_item, SUM(amount) AS total_paid, IF(is_cash = 1, 'cash', 'credit') as type 
FROM payment WHERE fk_item = 2001 GROUP BY fk_item, is_cash;
Enter fullscreen mode Exit fullscreen mode

Resulting in:

The result is easy now if you need the total refund for the item just change the GROUP BY to be on fk_item and it’s done. For 2 and 3 records it won’t feel significant. If there were say 20 refunds for that item, the first solution with a loop is writing more code with no gain. Like sum, other SQL functions can be used too. Simple math operations like sum, multiply, average etc can be easy with SQL. This means no more loops.

Use GROUP_CONCAT to fetch related 1:m relation values

Group concat is a robust operation in SQL databases. It is instrumental when you need to get data from one to many relationship. For instance, you want to get all tags for a blog post or you want to get all categories of a product. Concerning this refunds example, one item can be refunded multiple times. So we will get all the refunds associated with the item id. To get this we will run only 1 query and get it without any loops in the code like below:

SELECT fk_item, GROUP_CONCAT(DISTINCT fk_refund) refund_ids 
FROM payment WHERE fk_item = 2001;
Enter fullscreen mode Exit fullscreen mode

This results in:

Now we know that item 2001 has been refunded twice for 2 refunds. It will be easy to explode the refund Ids with , and proceed with any related operation.

String manipulation

Many string manipulation tasks like substring, concatenation, change case, and string compare can be done in SQL. With this example, I am going to show the usage of CONCAT_WS. It is concat with a separator. It can also be used to select for instance first_name and last_name with space in between.

In case of having an optional middle name COALESCE can be used with CONCAT_WS. That is something for you to explore :).

In this example, I will select refund_nr with its related reason:

SELECT CONCAT_WS("-", refund_nr, reason) AS refund_nr_with_reason 
FROM refund;
Enter fullscreen mode Exit fullscreen mode

Resulting in:

If this needs to be shown on the credit note document, for example, no extra code is needed to join the values again. SQL makes it one step easier again.

Sorting with a custom formula

All software engineers know you can sort based on a column. But if you are given a custom priority formula to sort, what would you do? Probably again resort back to code and loop to sort. So lets set the priority formula rules for above example:

  1. Premium customer refunds get the highest priority (we hack it with a priority of 9999999999)

  2. Other than premium customers cash refunds get a priority of amount * 25 for credit it is amount * 20.

As per above rules it is decided that premium customers and priority above 50000 (in cents) will be processed first. Then other refunds will be processes. Let’s get the priority refunds as below:

SELECT r.refund_nr, r.reason, p.fk_item, p.amount, p.is_cash,
IF(p.premium_customer = 1, 9999999999, p.amount * (IF(is_cash = 1, 25, 20))) AS priority 
FROM refund AS r INNER JOIN payment AS p ON r.id = p.fk_refund 
HAVING priority > 50000 
ORDER BY priority DESC
Enter fullscreen mode Exit fullscreen mode

The results are below:

With proper use of IF in SQL sorting by a custom priority formula is a lot easier than trying to do it with loops in code. Notice that even smaller amounts like 7.5 (750 cents) and 9.0 (900 cents) came to highest priority as these refund payment amounts were associated with premium customers.

Use the superpowers of SQL to make your life easier as a software engineer.

You can play with the example and run your queries on SQL fiddle.

Conclusion

There are other tricks of SQL that can help you as a software engineer. Like, UPDATE with INSERT using ON DUPLICATE KEY UPDATE. Whenever you have an itch of doing some manipulation for data pulled in from database in code with loops, think again. The main takeaway from this story is:

Exploit the power of SQL to write less code because “the best code is the code that was never written”. If it is not written there is no need to maintain it.


Originally published at geshan.com.np.

Latest comments (100)

Collapse
 
anikethsdeshpande profile image
Aniketh Deshpande

Advanced concepts like SQL window functions make it very easy to fetch data, summarise group results, in a quite unique way

Collapse
 
edoardoc profile image
Edoardo Ceccarelli

good post, probably the scenario could also be bigger: I was using everything but the group_concat which looks really useful, here is a so discussion about a postgres equivalent

stackoverflow.com/a/2561297/436085

Collapse
 
djviolin profile image
István Lantos • Edited

So... Because the author of this article "recommended" on facebook to have conversation here, instead of there, I would gladly accept this and copy here my two facebook comments, for enjoyment. Probably I'm not worthy to read, because I'm never was in the top 7 badge here on dev.to, but here we go:


Just because you can, it doesn't mean you have to. Putting way too much logic in SQL means instead of spawning a new API or client-side node as a worker in your cluster with one click, now you have to tamper with the replication of your database cluster (which systems usually tide to master-slave replication, like Postgres). You will also spend sh*tload of money to give enough resources to those DBs, instead of putting those simple math functions into client side React and call it a day. Of course, you will send more data from the API to client, but you have to decide which is more painful for your pocket (spoiler alert: probably a fat and slow DB). Client-side code also can be fasten up with CDNs, which is worth pennies nowadays. Choose which solution you prefer more...


Geshan Manandhar on fb: "Also, with all due respect if you think you are good enough try writing something of this sort, get similar views, reactions and comments. Get a top 7 badge on dev to. Then I would listen to and reply to your comments. Thanks and Happy New Year!"

WTF, maaan... Don't be so triggered. This post picked by the dev.to team the second time on facebook. We comment here, beacuse your article is relevant, right now, right here, not where and whom you want.

The examples in your article (like SUM, CONCAT and DISTINCT keyword) is nothing new or putting more logic into SQL: is just the standard way how a database engineer design their queries and it is part of their everyday toolset. But I would like to add one exception: string manipulation. PLEASE-JUST-DON'T! In fact, if case sensitivity is not important (most of the time, it doesn't) try to store them insensitive, like citext type in Postgres. So basic thing, and yet, in a governmental software used by every city and village in my country, released their product with case sensitive tables, causing some people to appear many times, in every single city and village (around 2000+), because, you know, humans working in offices. Doing any string manipulation or not ignoring case sensitivity can lead to serious problems and usually the DB admin is the guy, who need to fix the colossal f*ck up with hard work later on, never the backend developer. Don't shoot yourself on the leg. Also, using string manipulation to concat strings with "-", is just tossing money out of the window. Your visitors CPU can make it for free, including converting to uppercase the first letter of every words, etc. Putting this logic into the DB, is just...

Collapse
 
geshan profile image
Geshan Manandhar

Some of the things you are pointing out are correct. Like any other tool, SQL should be used wisely and after profiling performance. Thanks!

 
jppage profile image
jppage

For Spring Boot projects you would typically run your unit tests against an in memory database so it's important that the business logic is not in the database and that the code is database agnostic. The Java 8 API, for example, provides excellent data processing features like SUM, grouping etc. Putting business logic in the database is a retrograde step and should be avoided if possible.

Collapse
 
geshan profile image
Geshan Manandhar

I am not telling to put the SQL in the db, it will be in code tracked with git and potentially unit tested. If that does not aid maintainability I am not super sure what will :).

I agree with your 2nd point generally. Thanks!

Collapse
 
itaditya profile image
Aditya Agarwal

I have used SQL database in Node.js with Knex. Knex allows us to write raw SQL queries. So we can have version control over our DB queries.

From the comments I understand testing database is hard but I don't think we need to do that. Why directly test the database. Can't we just test the APIs which are using the database.

Company I work for has exactly this issue. Literally all the logic (sometimes even presentational logic) is done in the database.

This is nightmare to maintain and now really difficult to refactor.

Also it's difficult to test SQL code due to lack of tools.

Collapse
 
itaditya profile image
Aditya Agarwal

I wrote something similar but for MongoDB. codeburst.io/things-i-wish-i-new-b...

Collapse
 
geshan profile image
Geshan Manandhar

Thanks for the comment, because it can be done != it should always be done. As commented earlier always weigh your trade-offs.

Collapse
 
danroc profile image
Daniel da Rocha

I have recently started putting logic in the DB, mostly when turning to Postgraphile which "turns" my DB into a GraphQL server.
Despite the slow start, I am starting to really feel the potential, especially when creating functions for inserts and special table joins.
Also, pgTAP makes testing quite simple.

Collapse
 
redfred7 profile image
Fred Heath • Edited

To everyone here claiming that SQL is hard to test and troubleshoot: All SQL statements return a result-set. By using SQL Views, Functions and Stored Procedures you can test SQL code as you would test any other API call your code makes. Seriously, it's not an issue.

Collapse
 
connor_mc_d profile image
Connor McDonald 🍸☕️

Agreed. A SQL statement is just code. It has an intended purpose and an expected result. That's pretty much what unit testing is designed to work with, no?

Collapse
 
keithchristian profile image
Keith Christian • Edited

Nice article but you should make it clear at the beginning that you are specifically discussing MySQL functions, the title makes it appear as though all these functions are in generic SQL.

Collapse
 
geshan profile image
Geshan Manandhar

Thanks for the comment. It's more about doing stuff in SQL than the flavor of SQL.

Collapse
 
keithchristian profile image
Keith Christian

Geshan, Good points you made, my comment was primarily to help beginners new to SQL. We all can relate.

SQLite, PostgreSQL, etc., don't use the same syntax in some examples.

Thanks for listening.

Collapse
 
diysportscar profile image
Dominic Peterson

I find the unwillingness to properly leverage the power of the database very confusing. It's like having a firetruck with big, powerful hoses that you drive up to the fire and then use those hoses to fill little individual buckets that you carry over to throw on the fire one at a time :(

I have constant conversations with (mostly) younger devs who seem unwilling to learn some database design theory and some SQL that could save them huge amounts of messing about with ORMs.

Collapse
 
geshan profile image
Geshan Manandhar

Amazing analogy!

Collapse
 
samuyi profile image
Samuyi

Are we going to state the obvious elephant in the room? Going through the comments it's obvious that many developers do not know about SQL or databases in general. Its why they are getting in their feelings. The problem isn't the SQL code, it's probably because you don't understand the underlying data structure or know SQL well enough.

Putting application logic in SQL is good practice in my opinion. Maybe not all logic, but at least a good chunk. This makes your application faster and scale well. Adding more application servers would not make it faster or scale well for that matter. Understanding the underlying database and data structure would go a long way in scaling that application.

As for testing it's possible to test SQL code. I haven't seen people do autonomous testing of SQL code, but it's possible.

Collapse
 
connor_mc_d profile image
Connor McDonald 🍸☕️

Agreed. Perhaps the biggest obstacle to any technology (SQL included) is rarely the technology itself, but people's comfort factor with it.

I'm just as guilty of that myself. I know that the best applications use database access (SQL) intelligently, they use middle tier functionality intelligently, and client tier functionality intelligently. But most apps have a strong bias in just one of these areas, simply because of the biases of the developer/development team involved.

Collapse
 
divakarraj profile image
Divakar Rajashekaran

Well articulated and I share the same view.

Collapse
 
geshan profile image
Geshan Manandhar

Thanks for your viewpoint!

Collapse
 
ericsgagnon profile image
ericsgagnon

I've moved from ORM's to SQL guery generators. It gives a lot more freedom than most ORM's, let's you leverage the power of the DBMS, and promotes maintainability. I find it's a reasonable compromise for environments where the data isn't entirely stable and no REST api's are supported.

Collapse
 
geshan profile image
Geshan Manandhar

Good path taken, kudos!

Collapse
 
thiagoa profile image
Thiago Araújo Silva • Edited

GROUP_CONCAT is cool, but I stopped using it in MySQL 5.6. Unfortunately, it has a maximum_length setting which by default limits the return value to 1024 characters. It's the kind of bug that can go unnoticed for months, or only be evident with table growth. I wish it didn't have a limit (or that you could set no limit on it) and did not require fiddling with the DB settings. So beware! :)

I'm not sure about newer versions though.

Collapse
 
geshan profile image
Geshan Manandhar

Good warning sign to take of.