Have you ever come across a pesky bug caused by leading, trailing, or otherwise unwanted spaces? Perhaps a lowercase "l" masquerading as a big "I". How long did you sit staring at the expected output and the corrupted value before you noticed the intruder? Probably too long.
Don't worry, we can sulk together.
The better question is, how did you fix it? You likely wrote tests and merged a fix into your ETL pipeline. But there are always straggler records in need of the almighty and powerful raw SQL query.
I present to you two such cases along with viable solutions.
Removing unwanted characters with TRIM()
The TRIM() function removes the space character OR other specified characters from the start or end of a string.
This function should look familiar to experienced programmers. You will find it built-in to most languages; in JavaScript, for example:
' Am I in the front?'.trim()
// => 'Am I in the front?'
It works mostly the same way in SQL. Assume we have ingested the following data
id | product | price | market | size |
---|---|---|---|---|
1 | 100 | 45.99 | amazon | a |
2 | 100 | 49.00 | walmart | a |
3 | 200 | 15.00 | walmart | a |
4 | 200 | 26.50 | amazon | a |
While it all looks above aboard, some records aren't being returned when we think they should
SELECT * FROM mytable WHERE size = 'a'
only returns
id | product | price | market | size |
---|---|---|---|---|
1 | 100 | 45.99 | amazon | a |
4 | 200 | 26.50 | amazon | a |
Weird. So we try a LIKE
search
SELECT * FROM mytable WHERE size LIKE '%a%'
We get back all four results!
To save time, let's assume you print and/or inspect the suspicious records using your favorite GUI and discover that the actual value of size for those records is
' a'
instead of'a'
. Let's also assume you debugged your ETL pipeline and shipped a fix.
Now, let's update the corrupted data...
UPDATE mytable
SET size = TRIM(size)
WHERE size LIKE '%a%'
Boom! Our corrupted data is so fresh and so clean now.
Replacing unwanted parts of a string with REPLACE()
The REPLACE() function replaces all occurrences of a substring within a string, with a new substring.
Hello, old friend.
Replace is another familiar function. In JavaScript, replace looks like this:
'My name is Bob'.replace('Bob', 'Dave')
// => 'My name is Dave'
We have a few shoes for sale and for some reason, one of our suppliers likes to add decimals to whole numbers:
id | shoe | price | market | size |
---|---|---|---|---|
1 | 100 | 45.99 | amazon | 10 |
2 | 100 | 49.00 | walmart | 10.0 |
3 | 200 | 15.00 | walmart | 11.5 |
4 | 200 | 26.50 | amazon | 11 |
This isn't good, and like the problem we TRIM
ed away, this makes it difficult to query our data correctly. First, we go and fix our ETL pipeline. Next, we come back to SQL for additional cleanup.
We could totally use a feature of TRIM
to fix our latest issue, but we're learning and using REPLACE
feels more familiar.
Our new query goes like this
UPDATE mytable
SET size = REPLACE(size, '.0', '')
WHERE size LIKE '%.0'
Take a look at our data, now, it cleans up so well:
id | shoe | price | market | size |
---|---|---|---|---|
1 | 100 | 45.99 | amazon | 10 |
2 | 100 | 49.00 | walmart | 10 |
3 | 200 | 15.00 | walmart | 11.5 |
4 | 200 | 26.50 | amazon | 11 |
Woo, hopefully you learned something neat about SQL. Like I said in my previous post, using raw SQL gives me a boost of confidence; the more I do it, the more I enjoy it.
Do you have any memorable cases solved by an interesting SQL query? I'd love to hear about it below!
Feel the SQL ❤️ !!
Top comments (1)
For me is the funniest part non-breaking space. Not able to clean it with trim. You need to use replace with char(160).