loading...

String.prototype.search(): a method I wish I knew about a long time ago

kenbellows profile image Ken Bellows ・7 min read

tl;dr: String.prototype.search() is basically .indexOf() but with regexes. It's been supported in every browser since IE 4, but ES6 made it more powerful with Symbol.search.


I have been writing JavaScript for just about 18 years. I started sometime in 2002, when IE 6 was king, Firefox was just being released, and Chrome did not exist.

I've been writing JavaScript for almost two decades, and I've always been someone who loves digging into the docs, learning every feature available, every method of every object in the browser. But sometimes... sometimes I still, after all this time, find something that's been around for a long time and I just didn't know about.

Today I discovered one such method: String.prototype.search(). And man, I wish I had known about this one a loooong time ago.

What it does

The .search() string method is pretty straightforward: as I mentioned in the tl;dr, it's basically .indexOf(), but with one crucial difference: it uses regular expressions!

Here's the demo from the MDN page. It demonstrates how you would find the first non-whitespace, non-alphanumeric character in a string:

const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?';

// any character that is not a word character or whitespace
const regex = /[^\w\s]/g;

console.log(paragraph.search(regex));
// expected output: 43

console.log(paragraph[paragraph.search(regex)]);
// expected output: "."

This blew my mind when I saw it. Not because it's necessarily that crazy, but just because I never knew it was available to me. I have hacked together this method countless times over the years using the clunkier, less readable String.prototype.match(). This method works, and it's my go-to solution when I want capture groups and all that, but for simply finding the index of the first instance of a certain pattern in a string, .search(regex) is just so clean. For one, to me at least, it's immediately obvious what's happening here, whereas the .match() method always took me a minute to understand. For another, .match() requires extra processing, because it has three kinds of return values:

  • if it doesn't find a match, it returns null
  • if it finds a match:
    • if your regex had the global flag (/.../g, like in MDN's example above), it returns an array of all matches, and there's no way to get their indices
    • if your regex did not have the global flag, it returns an object with an index property

So .match() gets complicated.

Another option that I sometimes use is RegExp.prototype.exec(). This has the advantage that it always returns an object with an index property when it finds a match, regardless of the global flag, but the disadvantage that you still need to be careful about the global flag if you want to run it on multiple strings, because it starts searching from the index of the previous match. This can be useful sometimes, but isn't great in the simple case.

Just to drive this point home, here's the side-by-side comparison:

// old way
const match = paragraph.match(regex)
const index = match ? match.index : -1

// new way
const index = paragraph.search(regex)

I don't know. I get really excited about stuff like this. Maybe you don't. But if that didn't excite you, maybe this will:

How ES6 made it even more powerful

The way I came across String.prototype.search() was kind of funny. I was looking over the README for Paul Miller's fantastic polyfill library, ES6 Shim, and I noticed this in the "Caveats" section at the bottom:

  • Well-known Symbols
    • In order to make them work cross-realm, these are created with the global Symbol registry via Symbol.for. This does not violate the spec, but it does mean that Symbol.for('Symbol.search') === Symbol.search will be true, which it would not by default in a fresh compliant realm.

If that makes no sense to you, let's do a 30-second crash course on Symbols. If it did make sense, skip the next section.

A brief aside about Symbols

This will be a very quick overview, so if Symbols still don't make a ton of sense to you after this, I highly recommend doing some googling, because they're pretty important for leveling up in JS (IMHO).

Symbols are a new primitive type introduced to JavaScript in ECMAScript 2015, a.k.a. ES6. The basic idea behind them is to create a perfectly unique key to use as an object property name, so that it's impossible for someone else to accidentally clobber your property later by using the same name, especially on shared objects and global window properties. Before Symbols, it was common to see keys on shared objects with lots of leading underscores, stuff like ___myThing, or with a randomly generated prefix, like 142857_myThing. This may seem like a rare edge case if you haven't encountered it, but trust me, this has been a source of frustration many times in JS history.

For your standard, garden-variety Symbols, created with Symbol('foo'), no one but you has access to them unless you pass them around. However, there is a special set of so-called "well-known Symbols" that everyone has access to. You can create your own by registering a name in the global Symbol registry with Symbol.for(), as mentioned in the quote above, but there are also several well-known symbols defined by the browser as properties on the Symbol object. These are used as special property names that enable certain functionality for objects.

Perhaps the most famous is Symbol.iterator, which lets us define custom iteration behavior for our classes, which is then used by the spread syntax and the [for ... of loop] to iterate over our object. I wrote a whole post about ES6 iterators and how they relate to generators a while back, if you're interested in a deep dive on this topic (it gets pretty wild when you dig deep):

Okay, hopefully we all have at least enough understanding to follow the rest of the story here.

Back to the story

After reading the note in the Caveats section of ES6 Shim, my question was, "What the heck is Symbol.search for?" I had never encountered this particular well-known Symbol before, so I read the MDN page on Symbol.search, which in turn led me to String.prototype.search.

I've already gotten a bit long-winded here, so to wrap up quickly, the bottom line is this: when you call myString.seach(x), the engine checks whether the thing you passed in, x, has a method defined under the key [Symbol.search]. If not, it tries to convert to a RegExp by calling new RegExp(x), which only works for strings.

(Side note: The MDN page is misleading here. It says: "If a non-RegExp object regexp is passed, it is implicitly converted to a RegExp with new RegExp(regexp)." But as we'll see next, this is not strictly true; it will not convert to a RegExp if you pass an object with a [Symbol.search] property.)

So what this means for us is that we can write a custom string search function and wrap it in an object. This may seem niche, since you can always just pass the string to the function, and this is certainly true. But something about the syntax feels nice to me:

// Find the index of the first character following a string like:
//    "Name:\t"
const nameFinder = {
  [Symbol.search](s) {
    const result = /Name:\s*/.exec(s)
    if (result) {
      const {0: label, index} = result
      return index + label.length
    }
    else {
      return -1
    }
  }
}

// imagine this was read in from a file
const doc = `Customer Information
ID: 11223344
Name:   John Smith
Address:    123 Main Street
...`

const customerNameStart = doc.search(nameFinder)
const customerName = doc.slice(customerNameStart, doc.indexOf('\n', customerNameStart))

Imagine looping over a directory of customer info files in a Node script trying to extract their names, reusing this same search object each time, even storing the name finder and similar finders for other fields in a separate module and importing them. I think it could be neat! (Just me?)

Conclusion

Honestly, I recognize that this is not super revolutionary or anything, and it probably won't change a lot of workflows. But to me, that isn't the important thing; what's most important to me is to know what tools are available. I honestly don't know when I would use a customer search object like the one above, but I think it's very cool that it's an option. And now that that I know about it, if I ever come across a situation where it really is useful, I'll have it in the back of my head. It's another Batarang on my utility belt.

(Also, I just think metaprogramming stuff like this is really cool 😎)


Endnote

Thanks if you read all this! It's niche, I know, and I get more excited than most devs I know about little things like this. But if you got excited about this article, let me know in the comments, or shoot me a DM!

Posted on Jun 23 by:

kenbellows profile

Ken Bellows

@kenbellows

Full-time web dev; JS lover since 2002; CSS fanatic. #CSSIsAwesome I try to stay up with new web platform features. Web feature you don't understand? Tell me! I'll write an article! He/him

Discussion

markdown guide
 

Hidden gem 💎 for sure, thanks! 🚀

 

if ye want another exciting fact about browser js:

element.append is a thing and accepts multiple dom elements.

(me being one of the people who started with js when netscape had that nice brushed aluminium theme literally missed the addition of this neat little function)

 
 

Thanks for sharing Ken. It's very useful. Learnt something new today :-)

 

So you can make extra wrapper for exec / match method to simplify code in the userland.

 

Yeah, you definitely could do it that way, utilize myregex.exec() or myregex.match() inside of your method. Could be a nice way to encapsulate some ugly, repetitive logic, especially if you did this sort of thing all the time and wanted to write a factory function to take a regex and generate these search objects. Again, I don't really know if it's any cleaner in the end than just defining a findIndex(str, pattern) function, especially if not everyone reading your code will know about these symbols and understand how you're using them, but when I have the choice, I usually prefer dealing with native methods like str.search(pattern), since they have well-defined behavior and I know what to expect. Personal preference, I guess

 

Wow, I learnt way more than just the search method. What an article!

 

Symbol.search huh - I do like that syntax.

 

wow, thank you!
I dont know about "data-*" in html, when do use it?

 

Hey! That's a bit off-topic for this article, so I don't want to go over it in this comments section, but I love talking about that sort of thing, so DM me if you want to talk about it!