DEV Community

Cover image for Using SQL's TRIM and REPLACE to tidy up your data
David Hunt
David Hunt

Posted on

Using SQL's TRIM and REPLACE to tidy up your data

Have you ever come across a pesky bug caused by leading, trailing, or otherwise unwanted spaces? Perhaps a lowercase "l" masquerading as a big "I". How long did you sit staring at the expected output and the corrupted value before you noticed the intruder? Probably too long.

Don't worry, we can sulk together.

The better question is, how did you fix it? You likely wrote tests and merged a fix into your ETL pipeline. But there are always straggler records in need of the almighty and powerful raw SQL query.

I present to you two such cases along with viable solutions.

Removing unwanted characters with TRIM()

The TRIM() function removes the space character OR other specified characters from the start or end of a string.

This function should look familiar to experienced programmers. You will find it built-in to most languages; in JavaScript, for example:

'  Am I in the front?'.trim()
// => 'Am I in the front?'

It works mostly the same way in SQL. Assume we have ingested the following data

id product price market size
1 100 45.99 amazon a
2 100 49.00 walmart a
3 200 15.00 walmart a
4 200 26.50 amazon a

While it all looks above aboard, some records aren't being returned when we think they should

SELECT * FROM mytable WHERE size = 'a'

only returns

id product price market size
1 100 45.99 amazon a
4 200 26.50 amazon a

Weird. So we try a LIKE search

SELECT * FROM mytable WHERE size LIKE '%a%'

We get back all four results!

To save time, let's assume you print and/or inspect the suspicious records using your favorite GUI and discover that the actual value of size for those records is ' a' instead of 'a'. Let's also assume you debugged your ETL pipeline and shipped a fix.

Now, let's update the corrupted data...

UPDATE mytable

SET size = TRIM(size)

WHERE size LIKE '%a%'

Boom! Our corrupted data is so fresh and so clean now.

Replacing unwanted parts of a string with REPLACE()

The REPLACE() function replaces all occurrences of a substring within a string, with a new substring.

Hello, old friend.

Replace is another familiar function. In JavaScript, replace looks like this:

'My name is Bob'.replace('Bob', 'Dave')
// => 'My name is Dave'

We have a few shoes for sale and for some reason, one of our suppliers likes to add decimals to whole numbers:

id shoe price market size
1 100 45.99 amazon 10
2 100 49.00 walmart 10.0
3 200 15.00 walmart 11.5
4 200 26.50 amazon 11

This isn't good, and like the problem we TRIMed away, this makes it difficult to query our data correctly. First, we go and fix our ETL pipeline. Next, we come back to SQL for additional cleanup.

We could totally use a feature of TRIM to fix our latest issue, but we're learning and using REPLACE feels more familiar.

Our new query goes like this

UPDATE mytable

SET size = REPLACE(size, '.0', '')

WHERE size LIKE '%.0'

Take a look at our data, now, it cleans up so well:

id shoe price market size
1 100 45.99 amazon 10
2 100 49.00 walmart 10
3 200 15.00 walmart 11.5
4 200 26.50 amazon 11

Woo, hopefully you learned something neat about SQL. Like I said in my previous post, using raw SQL gives me a boost of confidence; the more I do it, the more I enjoy it.

Do you have any memorable cases solved by an interesting SQL query? I'd love to hear about it below!

Feel the SQL ❤️ !!

Top comments (1)

Collapse
 
johntik profile image
Janci

For me is the funniest part non-breaking space. Not able to clean it with trim. You need to use replace with char(160).