DEV Community

Samuel Rouse
Samuel Rouse

Posted on • Edited on

Use Slice, not Substring

JavaScript String.prototype.substring() and it's confusingly similar yet deprecated cousin .substr() have been around a long time, but so has the better solution: .slice().

Slice is most compatible with modern JavaScript. It accepts one or two indices, supports negative indices, and operates predictably.

Of Indexes and Indices

Indexes and indices are interchangeable plurals for index. Databases usually use indexes to refer to optimized lookup tables. Array operations usually refers to indices. Math often uses indices and while much of programming follows from math, there is a mix. The MDN article on substring uses both, for instance.

Substr: Right Out

Substr is deprecated. It was never part of the core spec, and unlike most other string operations it takes index and length. I've added it to the example comparison because it lives in many older codebases, but it's not recommended.

Honestly, substr is "closer to the metal", meaning it's more closely aligned with the computer hardware than the conceptual logic being executed. The strncopy method in C accepts a size similar to the length parameter. This is essentially replicating how memory is copied at the hardware level; a starting point and a number of bytes.

When we work in applications, we are often less concerned with how the underlying hardware will handle a task. We might need to find the start and end, and have to perform math to know how many characters to copy. So we let the computer do that math for us, and just provide the indices.

Substring: Just Trying to Help

Substring has two "helpful" quirks to be aware of: clamped values and index swapping. These aren't bad, but can have unexpected results if you aren't aware of them.

Parameter Swapping

Let's say you have a little program that calculates the start and end of a substring you need to extract, and you get back two indices: start and end. Sadly, there's a simple error in your logic, and the end is smaller than the start value! As explained in the substr section above, the low-level logic is converting the pair of indices to a size for string copies. Subtracting the indices would give us a negative number, which doesn't work as a length.

So substring assumes you made a mistake, and swaps the order for you! No matter how you pass your arguments, it will figure out which is lower, and put it first.

'abcde'.substring(4, 2); // 'cd'
Enter fullscreen mode Exit fullscreen mode

In most places this would either error, or return an empty string. The latter is what .slice() does.

'abcde'.slice(4, 2); // ''
Enter fullscreen mode Exit fullscreen mode

Substrings parameter swapping might seem like a helpful choice, but it has the possibility to hurt more than help. If our logic is wrong, it can return values that could confuse or delay identifying the defect.

Clamped Values

Substring doesn't support negative numbers. Many of the more recent JavaScript additions that accept indices allow you to provide negative numbers as a representation of "from the end", where positive numbers are "from the beginning".

'abcde'.at(1);  // 'a'
'abcde'.at(-1); // 'e'
Enter fullscreen mode Exit fullscreen mode

Substring assumes you must have been mistaken, and clamps values to zero if they are negative. You can't start before zero, right? That's why indexOf returns -1 if something is not found. So substring won't go below zero.

Now, all of the methods will convert non-numeric values to zero, but substring not supporting negatives means it works differently.

// Bad values
'abcde'.substring({}, 3); // 'abc'
'abcde'.substr('cat', 3); // 'abc'
'abcde'.slice([], 3);     // 'abc'

// Negative values
'abcde'.substring(-2); // 'abcde'
'abcde'.substr(-2);    // 'de'
'abcde'.slice(-2);     // 'de'
Enter fullscreen mode Exit fullscreen mode

It gets especially complicated if you thought you could use negative indices to get several characters near the end of a string.

'abcde'.slice(-3, -1);     // 'cd'
'abcde'.substring(-3, -1); // ''
Enter fullscreen mode Exit fullscreen mode

And mixing positive and negative numbers is even worse.

'!Hola¡'.slice(1, -1);     // 'Hola'
'!Hola¡'.substring(1, -1); // '!'
Enter fullscreen mode Exit fullscreen mode

Substring clamps this as (1, 0), and then swaps the order for you.

Examples

Here's a small chart of different inputs and what the functions produce. This is all run against a string that contains the alphabet.

Args substring substr slice
fn(20) 'uvwxyz' 'uvwxyz' 'uvwxyz'
fn(-5) 'abcdefghijklmnopqrstuvwxyz' 'vwxyz' 'vwxyz'
fn(5, 10) 'fghij' 'fghijklmno' 'fghij'
fn(5, -5) 'abcde' '' 'fghijklmnopqrstu'
fn(-10, -5) '' '' 'qrstu'
fn(-5, -10) '' '' ''
fn(10, 5) 'fghij' 'klmno' ''

The last two entries return an empty string for .slice(), but they are the "incorrect" arguments where the index of the second argument is lower than the first.

Summary

substring has some unusual conditions that could be helpful but are incompatible with more modern use of negative indices. substr is deprecated. slice is where you want to be.

What do you think?

Top comments (3)

Collapse
 
moopet profile image
Ben Sinclair

I actually never realised you could use slice on strings!

Collapse
 
oculus42 profile image
Samuel Rouse

Same here, even though it's apparently been around forever.

Collapse
 
xwero profile image
david duymelinck • Edited

I think all functions have inconsistent outcomes.
For me the outcomes should be:

  • 20: uvwxyz (get all characters starting from the twentieth character)
  • -5: vwxyz (get all characters starting form the fifth character at the end)
  • 5, 10: fghijklmno (get no more than ten characters starting from the fifth character)
  • 5, -5: vwxyz (get no more than five characters from the end starting from the fifth character)
  • -10, -5: vwxyz (get no more than five characters from the end starting from the tenth character at the end)
  • -5, -10: vwxyz (get no more than ten characters from the end starting from the fifth character at the end)
  • 10, 5: klmno (get no more than five characters starting with the tenth character)

So for me the first argument is the start, either from the beginning or the end based on the number polarity.
And the second argument is the maximum characters of the substring. And the polarity of the number means the same as for the first argument.

You can use other logic, but the main thing is that the function should give a consistent result.